Module Index

scrapy - Scrapy - a web crawling and web scraping framework written for Python
- __main__ - Undocumented
- cmdline - Undocumented
- commands - Base class for Scrapy commands
  - bench - No module docstring; 1/3 class documented
  - check - Undocumented
  - crawl - Undocumented
  - edit - Undocumented
  - fetch - Undocumented
  - genspider - No module docstring; 1/1 function, 0/1 class documented
  - list - Undocumented
  - parse - Undocumented
  - runspider - Undocumented
  - settings - Undocumented
  - shell - Scrapy Shell
  - startproject - Undocumented
  - version - Undocumented
  - view - Undocumented
- contracts - No package docstring; 0/1 function, 1/2 class, 0/1 module documented
  - default - No module docstring; 4/4 classes documented
- core - Scrapy core library classes and functions.
  - downloader - No package docstring; 0/1 function, 1/2 class, 1/4 module, 1/1 package documented
    - contextfactory - No module docstring; 0/1 function, 3/3 classes documented
    - handlers - Download handlers for different schemes
      - datauri - Undocumented
      - file - Undocumented
      - ftp - An asynchronous FTP file download handler for scrapy which somehow emulates an http response.
      - http - Undocumented
      - http10 - Download handlers for http and https schemes
      - http11 - Download handlers for http and https schemes
      - http2 - Undocumented
      - s3 - Undocumented
    - middleware - Downloader Middleware manager
    - tls - No module docstring; 0/6 variable, 0/2 constant, 1/1 class documented
    - webclient - No module docstring; 1/2 function, 0/2 class documented
  - engine - This is the Scrapy engine which controls the Scheduler, Downloader and Spiders.
  - http2 - Undocumented
    - agent - Undocumented
    - protocol - Undocumented
    - stream - No module docstring; 0/1 variable, 2/4 classes documented
  - scheduler - No module docstring; 0/1 variable, 1/1 class documented
  - scraper - This module implements the Scraper component which parses responses and extracts information from them
  - spidermw - Spider Middleware manager
- crawler - No module docstring; 0/2 variable, 2/3 classes documented
- downloadermiddlewares - No package docstring; 7/14 modules documented
  - ajaxcrawl - No module docstring; 0/2 variable, 1/1 function, 1/1 class documented
  - cookies - No module docstring; 0/1 variable, 1/1 class documented
  - decompression - This module implements the DecompressionMiddleware which tries to recognise and extract the potentially compressed responses that may arrive.
  - defaultheaders - DefaultHeaders downloader middleware
  - downloadtimeout - Download timeout middleware
  - httpauth - HTTP basic auth downloader middleware
  - httpcache - Undocumented
  - httpcompression - No module docstring; 0/1 constant, 1/1 class documented
  - httpproxy - Undocumented
  - redirect - No module docstring; 0/1 variable, 1/3 class documented
  - retry - An extension to retry failed requests that are potentially caused by temporary problems such as a connection timeout or HTTP 500 error.
  - robotstxt - This is a middleware to respect robots.txt policies. To activate it you must enable this middleware and enable the ROBOTSTXT_OBEY setting.
  - stats - Undocumented
  - useragent - Set User-Agent header per spider or use a default value from settings
- dupefilters - No module docstring; 1/2 class documented
- exceptions - Scrapy core exceptions
- exporters - Item Exporters are used to export/serialize items into different formats.
- extension - The Extension Manager
- extensions - No package docstring; 8/12 modules documented
  - closespider - CloseSpider is an extension that forces spiders to be closed after certain conditions are met.
  - corestats - Extension for collecting core stats like items scraped and start/finish times
  - debug - Extensions for debugging Scrapy
  - feedexport - Feed Exports extension
  - httpcache - No module docstring; 0/1 variable, 1/2 function, 0/4 class documented
  - logstats - No module docstring; 0/1 variable, 1/1 class documented
  - memdebug - MemoryDebugger extension
  - memusage - MemoryUsage extension
  - spiderstate - No module docstring; 1/1 class documented
  - statsmailer - StatsMailer extension sends an email when a spider finishes scraping.
  - telnet - Scrapy Telnet Console extension
  - throttle - Undocumented
- http - Module containing all HTTP related classes
  - common - Undocumented
  - cookies - No module docstring; 0/1 constant, 1/1 function, 1/4 class documented
  - headers - No module docstring; 1/1 class documented
  - request - This module implements the Request class which is used to represent HTTP requests in Scrapy.
    - form - This module implements the FormRequest class which is a more convenient class (than Request) to generate Requests based on form data.
    - json_request - This module implements the JsonRequest class which is a more convenient class (than Request) to generate JSON Requests.
    - rpc - This module implements the XmlRpcRequest class which is a more convenient class (that Request) to generate xml-rpc requests.
  - response - This module implements the Response class which is used to represent HTTP responses in Scrapy.
    - html - This module implements the HtmlResponse class which adds encoding discovering through HTML encoding declarations to the TextResponse class.
    - text - This module implements the TextResponse class which adds encoding handling and discovering (through HTTP headers) to base Response class.
    - xml - This module implements the XmlResponse class which adds encoding discovering through XML encoding declarations to the TextResponse class.
- interfaces - Undocumented
- item - Scrapy Item
- link - This module defines the Link object used in Link extractors.
- linkextractors - scrapy.linkextractors
  - lxmlhtml - Link extractor based on lxml.html
- loader - Item Loader
  - common - Common functions used in Item Loaders code
  - processors - This module provides some commonly used processors for Item Loaders.
- logformatter - No module docstring; 0/7 constant, 1/1 class documented
- mail - Mail sending helpers
- middleware - No module docstring; 0/1 variable, 1/1 class documented
- pipelines - Item pipeline
  - files - Files Pipeline
  - images - Images Pipeline
  - media - Undocumented
- pqueues - No module docstring; 0/1 variable, 1/1 function, 2/3 classes documented
- resolver - No module docstring; 0/1 variable, 2/4 classes documented
- responsetypes - This module implements a class which returns the appropriate Response class based on different criteria.
- robotstxt - Undocumented
- selector - Selectors
  - unified - XPath selectors based on lxml
- settings - No package docstring; 0/1 constant, 3/3 functions, 3/4 classes, 1/1 module documented
  - default_settings - This module contains the default values for all settings used by Scrapy.
- shell - Scrapy Shell
- signalmanager - Undocumented
- signals - Scrapy signals
- spiderloader - No module docstring; 1/1 class documented
- spidermiddlewares - No package docstring; 5/5 modules documented
  - depth - Depth Spider Middleware
  - httperror - HttpError Spider Middleware
  - offsite - Offsite Spider Middleware
  - referer - RefererMiddleware: populates Request referer field, based on the Response which originated it.
  - urllength - Url Length Spider Middleware
- spiders - Base class for Scrapy spiders
  - crawl - This modules implements the CrawlSpider which is the recommended spider to use for scraping typical web sites that requires crawling pages.
  - feed - This module implements the XMLFeedSpider which is the recommended spider to use for scraping from an XML feed.
  - init - No module docstring; 1/1 class documented
  - sitemap - Undocumented
- squeues - Scheduler queues
- statscollectors - Scrapy extension for collecting scraping stats
- utils - No package docstring; 18/39 modules documented
  - asyncgen - Undocumented
  - benchserver - Undocumented
  - boto - Boto/botocore helpers
  - conf - No module docstring; 6/8 functions documented
  - console - No module docstring; 0/1 constant, 6/6 functions documented
  - curl - No module docstring; 0/2 variable, 1/2 function, 0/1 class documented
  - datatypes - This module contains data types used by Scrapy which are not included in the Python Standard Library.
  - decorators - No module docstring; 3/3 functions documented
  - defer - Helper functions for dealing with Twisted deferreds
  - deprecate - Some helpers for deprecation messages
  - display - pprint and pformat wrappers with colorization support
  - engine - Some debugging functions for working with the Scrapy engine
  - ftp - No module docstring; 2/2 functions documented
  - gz - No module docstring; 1/3 function documented
  - httpobj - Helper functions for scrapy.http objects (Request, Response)
  - iterators - No module docstring; 0/1 variable, 2/4 functions, 0/1 class documented
  - job - Undocumented
  - log - No module docstring; 0/2 variable, 0/1 constant, 4/7 functions, 3/3 classes documented
  - misc - Helper functions which don't fit anywhere else
  - ossignal - No module docstring; 0/2 variable, 1/1 function documented
  - project - No module docstring; 0/2 constant, 2/4 functions documented
  - py36 - Undocumented
  - python - This module contains essential stuff that should've come with Python itself ;)
  - reactor - No module docstring; 3/4 functions, 1/1 class documented
  - reqser - Helper functions for serializing (and deserializing) requests.
  - request - This module provides some useful functions for working with scrapy.http.Request objects
  - response - This module provides some useful functions for working with scrapy.http.Response objects
  - serialize - Undocumented
  - signal - Helper functions for working with signals
  - sitemap - Module for processing Sitemaps.
  - spider - No module docstring; 0/1 variable, 2/3 functions, 0/1 class documented
  - ssl - Undocumented
  - template - Helper functions for working with templates
  - test - This module contains some assorted functions used in tests
  - testproc - Undocumented
  - testsite - Undocumented
  - trackref - This module provides some functions and classes to record and report references to live object instances.
  - url - This module contains general purpose URL functions not found in the standard library.
  - versions - Undocumented