class XMLFeedSpider(Spider):
This class intends to be the base class for spiders that scrape from XML feeds.
You can choose whether to parse the file using the 'iternodes' iterator, an 'xml' selector, or an 'html' selector. In most cases, it's convenient to use iternodes, since it's a faster and cleaner.
Method | _iternodes |
Undocumented |
Method | _parse |
Undocumented |
Method | _register_namespaces |
Undocumented |
Method | adapt_response |
You can override this function in order to make any changes you want to into the feed before parsing it. This function must return a response. |
Method | parse_node |
This method must be overriden with your custom spider functionality |
Method | parse_nodes |
No summary |
Method | process_results |
No summary |
Class Variable | iterator |
Undocumented |
Class Variable | itertag |
Undocumented |
Class Variable | namespaces |
Undocumented |
Inherited from Spider
:
Class Method | from_crawler |
Undocumented |
Class Method | handles_request |
Undocumented |
Class Method | update_settings |
Undocumented |
Static Method | close |
Undocumented |
Method | __init__ |
Undocumented |
Method | __str__ |
Undocumented |
Method | _set_crawler |
Undocumented |
Method | log |
Log the given message at the given log level |
Method | make_requests_from_url |
This method is deprecated. |
Method | parse |
Undocumented |
Method | start_requests |
Undocumented |
Class Variable | custom_settings |
Undocumented |
Instance Variable | crawler |
Undocumented |
Instance Variable | name |
Undocumented |
Instance Variable | settings |
Undocumented |
Instance Variable | start_urls |
Undocumented |
Property | logger |
Undocumented |
Inherited from object_ref
(via Spider
):
Method | __new__ |
Undocumented |
Class Variable | __slots__ |
Undocumented |