class documentation

class CrawlerProcess(CrawlerRunner):

View In Hierarchy

A class to run multiple scrapy crawlers in a process simultaneously.

This class extends ~scrapy.crawler.CrawlerRunner by adding support for starting a ~twisted.internet.reactor and handling shutdown signals, like the keyboard interrupt command Ctrl-C. It also configures top-level logging.

This utility should be a better fit than ~scrapy.crawler.CrawlerRunner if you aren't running another ~twisted.internet.reactor within your application.

The CrawlerProcess object must be instantiated with a ~scrapy.settings.Settings object.

This class shouldn't be needed (since Scrapy is responsible of using it accordingly) unless writing scripts that manually handle the crawling process. See :ref:`run-from-script` for an example.

Parameters
install​_root​_handlerwhether to install root logging handler (default: True)
Method start This method starts a ~twisted.internet.reactor, adjusts its pool size to :setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache based on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`.
Method __init__ Undocumented
Method ​_graceful​_stop​_reactor Undocumented
Method ​_handle​_twisted​_reactor Undocumented
Method ​_signal​_kill Undocumented
Method ​_signal​_shutdown Undocumented
Method ​_stop​_reactor Undocumented

Inherited from CrawlerRunner:

Method crawl Run a crawler with the provided arguments.
Method create​_crawler Return a ~scrapy.crawler.Crawler object.
Method join join()
Method stop Stops simultaneously all the crawling jobs taking place.
Static Method ​_get​_spider​_loader Get SpiderLoader instance from settings
Method ​_crawl Undocumented
Method ​_create​_crawler Undocumented
Class Variable crawlers Undocumented
Instance Variable ​_active Undocumented
Instance Variable ​_crawlers Undocumented
Instance Variable bootstrap​_failed Undocumented
Instance Variable settings Undocumented
Instance Variable spider​_loader Undocumented
Property spiders Undocumented
def start(self, stop_after_crawl=True):

This method starts a ~twisted.internet.reactor, adjusts its pool size to :setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache based on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`.

If stop_after_crawl is True, the reactor will be stopped after all crawlers have finished, using join.

Parameters
stop​_after​_crawlUndocumented
bool stop​_after​_crawlstop or not the reactor when all crawlers have finished
def __init__(self, settings=None, install_root_handler=True):

Undocumented

def _graceful_stop_reactor(self):

Undocumented

def _handle_twisted_reactor(self):
def _signal_kill(self, signum, _):

Undocumented

def _signal_shutdown(self, signum, _):

Undocumented

def _stop_reactor(self, _=None):

Undocumented