You probably found yourself in a situation where you need to control the bot crawling activity of your API, possibly disable it entirely. In such a case you need to set up the robots.txt resource on your API Gateway. The setup of the resource is fairly straightforward, however, there are few quirks that one needs to remember. This guide will help you to set up the resource fast and easily.
Note: The robots.txt resource needs to be defined on the root path of your API. If, for example, your API is having URL https://api.example.com
, the robots.txt resource
needs to be defined at https://api.example.com/robots.txt
Note 2: If you have your API Gateway defined with the base path mapping of custom domain, e.g. https://api.example.com/petstore
(petstore
is the base path mapping), you need to create the robots.txt
resource on the API that is having base path mapping to the root of the custom domain, i.e. https://api.example.com
Integration
An easy way to return the robots.txt content from API Gateway is by using the API Gateway mock integration. The definition of the integration can look like
The request mapping template is required to propagate the 200 status code to the mock endpoint. The response template for the 200 status code needs
to be for the content type of text/plain
. The response payload itself needs to be a valid robots.txt payload, the example demonstrates a denial of the API crawling for all user agents (explicitly for the bingbot - see the note).
Note: In the example above, there is a specific rule for the bingbot to disable the crawling of the API. After discussions with the Bing support team
I learned that bingbot is ignoring the *
user-agent rule and requires the specific rule for the bingbot
user-agent.
Maybe this information will come handy!
Method Options
The method options definition can look like
Notice that the response model 200 content type is also text/plain
. We can use the Model.EMPTY_MODEL
constant as
we do not need to define any response model.
Putting It Together
Having the mock integration and method response, we can create the actual robots.txt resource and it's methods. It can look like
You need to define both GET
and HEAD
methods of the robotx.txt resource, as some crawlers might first execute the HEAD method to see content length of the response payload.
Further Reading
https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-mock-integration.html
https://docs.aws.amazon.com/cdk/api/latest/docs/aws-apigateway-readme.html
https://developers.google.com/search/docs/advanced/robots/intro
https://developer.mozilla.org/en-US/docs/Web/HTTP/Methods/HEAD