An extension to retry failed requests that are potentially caused by temporary problems such as a connection timeout or HTTP 500 error.
You can change the behaviour of this middleware by modifing the scraping settings: RETRY_TIMES - how many times to retry a failed page RETRY_HTTP_CODES - which HTTP response codes to retry
Failed pages are collected on the scraping process and rescheduled at the end, once the spider has finished crawling all regular (non failed) pages.
Class | RetryMiddleware |
Undocumented |
Function | get_retry_request |
Returns a new ~scrapy.Request object to retry the specified request, or None if retries of the specified request have been exhausted. |
Variable | retry_logger |
Undocumented |
Returns a new ~scrapy.Request
object to retry the specified
request, or None if retries of the specified request have been
exhausted.
For example, in a ~scrapy.Spider
callback, you could use it as
follows:
def parse(self, response): if not response.text: new_request_or_none = get_retry_request( response.request, spider=self, reason='empty', ) return new_request_or_none
spider is the ~scrapy.Spider
instance which is asking for the
retry request. It is used to access the :ref:`settings <topics-settings>`
and :ref:`stats <topics-stats>`, and to provide extra logging context (see
logging.debug
).
reason is a string or an Exception
object that indicates the
reason why the request needs to be retried. It is used to name retry stats.
max_retry_times is a number that determines the maximum number of times that request can be retried. If not specified or None, the number is read from the :reqmeta:`max_retry_times` meta key of the request. If the :reqmeta:`max_retry_times` meta key is not defined or None, the number is read from the :setting:`RETRY_TIMES` setting.
priority_adjust is a number that determines how the priority of the new request changes in relation to request. If not specified, the number is read from the :setting:`RETRY_PRIORITY_ADJUST` setting.
logger is the logging.Logger object to be used when logging messages
stats_base_key is a string to be used as the base key for the retry-related job stats
Parameters | |
request:Request | Undocumented |
spider:Spider | Undocumented |
reason:Union[ | Undocumented |
max_retry_times:Optional[ | Undocumented |
priority_adjust:Optional[ | Undocumented |
logger:Logger | Undocumented |
stats_base_key:str | Undocumented |