scrapy.crawler.CrawlerRunner

class documentation

class CrawlerRunner:

Known subclasses: scrapy.crawler.CrawlerProcess

This is a convenient helper class that keeps track of, manages and runs crawlers inside an already setup ~twisted.internet.reactor.

The CrawlerRunner object must be instantiated with a ~scrapy.settings.Settings object.

This class shouldn't be needed (since Scrapy is responsible of using it accordingly) unless writing scripts that manually handle the crawling process. See :ref:`run-from-script` for an example.

Method	`crawl`	Run a crawler with the provided arguments.
Method	`create_crawler`	Return a `~scrapy.crawler.Crawler` object.
Method	`join`	join()
Method	`stop`	Stops simultaneously all the crawling jobs taking place.
Static Method	`_get_spider_loader`	Get SpiderLoader instance from settings
Method	`__init__`	Undocumented
Method	`_crawl`	Undocumented
Method	`_create_crawler`	Undocumented
Method	`_handle_twisted_reactor`	Undocumented
Class Variable	`crawlers`	Undocumented
Instance Variable	`_active`	Undocumented
Instance Variable	`_crawlers`	Undocumented
Instance Variable	`bootstrap_failed`	Undocumented
Instance Variable	`settings`	Undocumented
Instance Variable	`spider_loader`	Undocumented
Property	`spiders`	Undocumented

def crawl(self, crawler_or_spidercls, *args, **kwargs):

Run a crawler with the provided arguments.

It will call the given Crawler's ~Crawler.crawl method, while keeping track of it so it can be stopped later.

If crawler_or_spidercls isn't a ~scrapy.crawler.Crawler instance, this method will try to create one using this parameter as the spider class given to it.

Returns a deferred that is fired when the crawling is finished.

Parameters
crawler_or_spidercls:`~scrapy.crawler.Crawler` instance, `~scrapy.spiders.Spider` subclass or string	already created crawler, or a spider class or spider's name inside the project to create it
*args	arguments to initialize the spider
**kwargs	keyword arguments to initialize the spider

def create_crawler(self, crawler_or_spidercls):

Return a ~scrapy.crawler.Crawler object.

If crawler_or_spidercls is a Crawler, it is returned as-is.
If crawler_or_spidercls is a Spider subclass, a new Crawler is constructed for it.
If crawler_or_spidercls is a string, this function finds a spider with this name in a Scrapy project (using spider loader), then creates a Crawler instance for it.

@defer.inlineCallbacks
def join(self):

join()

Returns a deferred that is fired when all managed crawlers have completed their executions.

def stop(self):

Stops simultaneously all the crawling jobs taking place.

Returns a deferred that is fired when they all have ended.

@staticmethod
def _get_spider_loader(settings):

Get SpiderLoader instance from settings

def __init__(self, settings=None):

overridden in scrapy.crawler.CrawlerProcess

Undocumented

def _crawl(self, crawler, *args, **kwargs):

Undocumented

def _create_crawler(self, spidercls):

Undocumented

def _handle_twisted_reactor(self):

overridden in scrapy.crawler.CrawlerProcess

Undocumented

crawlers =

Undocumented

_active: set =

Undocumented

_crawlers: set =

Undocumented

bootstrap_failed: bool =

Undocumented

settings =

Undocumented

spider_loader =

Undocumented

@property
spiders =

Undocumented