class CrawlerProcess(CrawlerRunner):
A class to run multiple scrapy crawlers in a process simultaneously.
This class extends ~scrapy.crawler.CrawlerRunner
by adding support
for starting a ~twisted.internet.reactor
and handling shutdown
signals, like the keyboard interrupt command Ctrl-C. It also configures
top-level logging.
This utility should be a better fit than
~scrapy.crawler.CrawlerRunner
if you aren't running another
~twisted.internet.reactor
within your application.
The CrawlerProcess object must be instantiated with a
~scrapy.settings.Settings
object.
This class shouldn't be needed (since Scrapy is responsible of using it accordingly) unless writing scripts that manually handle the crawling process. See :ref:`run-from-script` for an example.
Parameters | |
install_root_handler | whether to install root logging handler (default: True) |
Method | start |
This method starts a ~twisted.internet.reactor , adjusts its pool size to :setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache based on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`. |
Method | __init__ |
Undocumented |
Method | _graceful_stop_reactor |
Undocumented |
Method | _handle_twisted_reactor |
Undocumented |
Method | _signal_kill |
Undocumented |
Method | _signal_shutdown |
Undocumented |
Method | _stop_reactor |
Undocumented |
Inherited from CrawlerRunner
:
Method | crawl |
Run a crawler with the provided arguments. |
Method | create_crawler |
Return a ~scrapy.crawler.Crawler object. |
Method | join |
join() |
Method | stop |
Stops simultaneously all the crawling jobs taking place. |
Static Method | _get_spider_loader |
Get SpiderLoader instance from settings |
Method | _crawl |
Undocumented |
Method | _create_crawler |
Undocumented |
Class Variable | crawlers |
Undocumented |
Instance Variable | _active |
Undocumented |
Instance Variable | _crawlers |
Undocumented |
Instance Variable | bootstrap_failed |
Undocumented |
Instance Variable | settings |
Undocumented |
Instance Variable | spider_loader |
Undocumented |
Property | spiders |
Undocumented |
This method starts a ~twisted.internet.reactor
, adjusts its pool
size to :setting:`REACTOR_THREADPOOL_MAXSIZE`, and installs a DNS cache
based on :setting:`DNSCACHE_ENABLED` and :setting:`DNSCACHE_SIZE`.
If stop_after_crawl is True, the reactor will be stopped after all
crawlers have finished, using join
.
Parameters | |
stop_after_crawl | Undocumented |
bool stop_after_crawl | stop or not the reactor when all crawlers have finished |
scrapy.crawler.CrawlerRunner.__init__
Undocumented