class documentation

class CrawlerRunner:

Known subclasses: scrapy.crawler.CrawlerProcess

View In Hierarchy

This is a convenient helper class that keeps track of, manages and runs crawlers inside an already setup ~twisted.internet.reactor.

The CrawlerRunner object must be instantiated with a ~scrapy.settings.Settings object.

This class shouldn't be needed (since Scrapy is responsible of using it accordingly) unless writing scripts that manually handle the crawling process. See :ref:`run-from-script` for an example.

Method crawl Run a crawler with the provided arguments.
Method create​_crawler Return a ~scrapy.crawler.Crawler object.
Method join join()
Method stop Stops simultaneously all the crawling jobs taking place.
Static Method ​_get​_spider​_loader Get SpiderLoader instance from settings
Method __init__ Undocumented
Method ​_crawl Undocumented
Method ​_create​_crawler Undocumented
Method ​_handle​_twisted​_reactor Undocumented
Class Variable crawlers Undocumented
Instance Variable ​_active Undocumented
Instance Variable ​_crawlers Undocumented
Instance Variable bootstrap​_failed Undocumented
Instance Variable settings Undocumented
Instance Variable spider​_loader Undocumented
Property spiders Undocumented
def crawl(self, crawler_or_spidercls, *args, **kwargs):

Run a crawler with the provided arguments.

It will call the given Crawler's ~Crawler.crawl method, while keeping track of it so it can be stopped later.

If crawler_or_spidercls isn't a ~scrapy.crawler.Crawler instance, this method will try to create one using this parameter as the spider class given to it.

Returns a deferred that is fired when the crawling is finished.

Parameters
crawler​_or​_spidercls:~scrapy.crawler.Crawler instance, ~scrapy.spiders.Spider subclass or stringalready created crawler, or a spider class or spider's name inside the project to create it
*argsarguments to initialize the spider
**kwargskeyword arguments to initialize the spider
def create_crawler(self, crawler_or_spidercls):

Return a ~scrapy.crawler.Crawler object.

  • If crawler_or_spidercls is a Crawler, it is returned as-is.
  • If crawler_or_spidercls is a Spider subclass, a new Crawler is constructed for it.
  • If crawler_or_spidercls is a string, this function finds a spider with this name in a Scrapy project (using spider loader), then creates a Crawler instance for it.
@defer.inlineCallbacks
def join(self):

join()

Returns a deferred that is fired when all managed crawlers have completed their executions.

def stop(self):

Stops simultaneously all the crawling jobs taking place.

Returns a deferred that is fired when they all have ended.

@staticmethod
def _get_spider_loader(settings):
Get SpiderLoader instance from settings
def __init__(self, settings=None):

Undocumented

def _crawl(self, crawler, *args, **kwargs):

Undocumented

def _create_crawler(self, spidercls):

Undocumented

def _handle_twisted_reactor(self):

Undocumented

crawlers =

Undocumented

_active: set =

Undocumented

_crawlers: set =

Undocumented

bootstrap_failed: bool =

Undocumented

settings =

Undocumented

spider_loader =

Undocumented

@property
spiders =

Undocumented