class documentation

class SitemapSpider(Spider):

View In Hierarchy

Undocumented

Method __init__ Undocumented
Method ​_get​_sitemap​_body Return the sitemap body contained in the given response, or None if the response is not a sitemap.
Method ​_parse​_sitemap Undocumented
Method sitemap​_filter This method can be used to filter sitemap entries by their attributes, for example, you can filter locs with lastmod greater than a given date (see docs).
Method start​_requests Undocumented
Class Variable sitemap​_alternate​_links Undocumented
Class Variable sitemap​_follow Undocumented
Class Variable sitemap​_rules Undocumented
Class Variable sitemap​_urls Undocumented
Instance Variable ​_cbs Undocumented
Instance Variable ​_follow Undocumented

Inherited from Spider:

Class Method from​_crawler Undocumented
Class Method handles​_request Undocumented
Class Method update​_settings Undocumented
Static Method close Undocumented
Method __str__ Undocumented
Method ​_parse Undocumented
Method ​_set​_crawler Undocumented
Method log Log the given message at the given log level
Method make​_requests​_from​_url This method is deprecated.
Method parse Undocumented
Class Variable custom​_settings Undocumented
Instance Variable crawler Undocumented
Instance Variable name Undocumented
Instance Variable settings Undocumented
Instance Variable start​_urls Undocumented
Property logger Undocumented

Inherited from object_ref (via Spider):

Method __new__ Undocumented
Class Variable __slots__ Undocumented
def __init__(self, *a, **kw):

Undocumented

def _get_sitemap_body(self, response):
Return the sitemap body contained in the given response, or None if the response is not a sitemap.
def _parse_sitemap(self, response):

Undocumented

def sitemap_filter(self, entries):
This method can be used to filter sitemap entries by their attributes, for example, you can filter locs with lastmod greater than a given date (see docs).
def start_requests(self):

Undocumented

sitemap_alternate_links: bool =

Undocumented

sitemap_follow: list[str] =

Undocumented

sitemap_rules: list =

Undocumented

sitemap_urls: tuple =

Undocumented

_cbs: list =

Undocumented

_follow =

Undocumented