scrapy
- Scrapy - a web crawling and web scraping framework written for Python__main__
- Undocumentedcmdline
- Undocumentedcommands
- Base class for Scrapy commandsbench
- No module docstring; 1/3 class documentedcheck
- Undocumentedcrawl
- Undocumentededit
- Undocumentedfetch
- Undocumentedgenspider
- No module docstring; 1/1 function, 0/1 class documentedlist
- Undocumentedparse
- Undocumentedrunspider
- Undocumentedsettings
- Undocumentedshell
- Scrapy Shellstartproject
- Undocumentedversion
- Undocumentedview
- Undocumentedcontracts
- No package docstring; 0/1 function, 1/2 class, 0/1 module documenteddefault
- No module docstring; 4/4 classes documentedcore
- Scrapy core library classes and functions.downloader
- No package docstring; 0/1 function, 1/2 class, 1/4 module, 1/1 package documentedcontextfactory
- No module docstring; 0/1 function, 3/3 classes documentedhandlers
- Download handlers for different schemesdatauri
- Undocumentedfile
- Undocumentedftp
- An asynchronous FTP file download handler for scrapy which somehow emulates an http response.http
- Undocumentedhttp10
- Download handlers for http and https schemeshttp11
- Download handlers for http and https schemeshttp2
- Undocumenteds3
- Undocumentedmiddleware
- Downloader Middleware managertls
- No module docstring; 0/6 variable, 0/2 constant, 1/1 class documentedwebclient
- No module docstring; 1/2 function, 0/2 class documentedengine
- This is the Scrapy engine which controls the Scheduler, Downloader and Spiders.http2
- Undocumentedscheduler
- No module docstring; 0/1 variable, 1/1 class documentedscraper
- This module implements the Scraper component which parses responses and extracts information from themspidermw
- Spider Middleware managercrawler
- No module docstring; 0/2 variable, 2/3 classes documenteddownloadermiddlewares
- No package docstring; 7/14 modules documentedajaxcrawl
- No module docstring; 0/2 variable, 1/1 function, 1/1 class documentedcookies
- No module docstring; 0/1 variable, 1/1 class documenteddecompression
- This module implements the DecompressionMiddleware which tries to recognise and extract the potentially compressed responses that may arrive.defaultheaders
- DefaultHeaders downloader middlewaredownloadtimeout
- Download timeout middlewarehttpauth
- HTTP basic auth downloader middlewarehttpcache
- Undocumentedhttpcompression
- No module docstring; 0/1 constant, 1/1 class documentedhttpproxy
- Undocumentedredirect
- No module docstring; 0/1 variable, 1/3 class documentedretry
- An extension to retry failed requests that are potentially caused by temporary problems such as a connection timeout or HTTP 500 error.robotstxt
- This is a middleware to respect robots.txt policies. To activate it you must enable this middleware and enable the ROBOTSTXT_OBEY setting.stats
- Undocumenteduseragent
- Set User-Agent header per spider or use a default value from settingsdupefilters
- No module docstring; 1/2 class documentedexceptions
- Scrapy core exceptionsexporters
- Item Exporters are used to export/serialize items into different formats.extension
- The Extension Managerextensions
- No package docstring; 8/12 modules documentedclosespider
- CloseSpider is an extension that forces spiders to be closed after certain conditions are met.corestats
- Extension for collecting core stats like items scraped and start/finish timesdebug
- Extensions for debugging Scrapyfeedexport
- Feed Exports extensionhttpcache
- No module docstring; 0/1 variable, 1/2 function, 0/4 class documentedlogstats
- No module docstring; 0/1 variable, 1/1 class documentedmemdebug
- MemoryDebugger extensionmemusage
- MemoryUsage extensionspiderstate
- No module docstring; 1/1 class documentedstatsmailer
- StatsMailer extension sends an email when a spider finishes scraping.telnet
- Scrapy Telnet Console extensionthrottle
- Undocumentedhttp
- Module containing all HTTP related classescommon
- Undocumentedcookies
- No module docstring; 0/1 constant, 1/1 function, 1/4 class documentedheaders
- No module docstring; 1/1 class documentedrequest
- This module implements the Request class which is used to represent HTTP requests in Scrapy.form
- This module implements the FormRequest class which is a more convenient class (than Request) to generate Requests based on form data.json_request
- This module implements the JsonRequest class which is a more convenient class (than Request) to generate JSON Requests.rpc
- This module implements the XmlRpcRequest class which is a more convenient class (that Request) to generate xml-rpc requests.response
- This module implements the Response class which is used to represent HTTP responses in Scrapy.html
- This module implements the HtmlResponse class which adds encoding discovering through HTML encoding declarations to the TextResponse class.text
- This module implements the TextResponse class which adds encoding handling and discovering (through HTTP headers) to base Response class.xml
- This module implements the XmlResponse class which adds encoding discovering through XML encoding declarations to the TextResponse class.interfaces
- Undocumenteditem
- Scrapy Itemlink
- This module defines the Link object used in Link extractors.linkextractors
- scrapy.linkextractorslxmlhtml
- Link extractor based on lxml.htmlloader
- Item Loadercommon
- Common functions used in Item Loaders codeprocessors
- This module provides some commonly used processors for Item Loaders.logformatter
- No module docstring; 0/7 constant, 1/1 class documentedmail
- Mail sending helpersmiddleware
- No module docstring; 0/1 variable, 1/1 class documentedpipelines
- Item pipelinepqueues
- No module docstring; 0/1 variable, 1/1 function, 2/3 classes documentedresolver
- No module docstring; 0/1 variable, 2/4 classes documentedresponsetypes
- This module implements a class which returns the appropriate Response class based on different criteria.robotstxt
- Undocumentedselector
- Selectorsunified
- XPath selectors based on lxmlsettings
- No package docstring; 0/1 constant, 3/3 functions, 3/4 classes, 1/1 module documenteddefault_settings
- This module contains the default values for all settings used by Scrapy.shell
- Scrapy Shellsignalmanager
- Undocumentedsignals
- Scrapy signalsspiderloader
- No module docstring; 1/1 class documentedspidermiddlewares
- No package docstring; 5/5 modules documentedspiders
- Base class for Scrapy spiderscrawl
- This modules implements the CrawlSpider which is the recommended spider to use for scraping typical web sites that requires crawling pages.feed
- This module implements the XMLFeedSpider which is the recommended spider to use for scraping from an XML feed.init
- No module docstring; 1/1 class documentedsitemap
- Undocumentedsqueues
- Scheduler queuesstatscollectors
- Scrapy extension for collecting scraping statsutils
- No package docstring; 18/39 modules documentedasyncgen
- Undocumentedbenchserver
- Undocumentedboto
- Boto/botocore helpersconf
- No module docstring; 6/8 functions documentedconsole
- No module docstring; 0/1 constant, 6/6 functions documentedcurl
- No module docstring; 0/2 variable, 1/2 function, 0/1 class documenteddatatypes
- This module contains data types used by Scrapy which are not included in the Python Standard Library.decorators
- No module docstring; 3/3 functions documenteddefer
- Helper functions for dealing with Twisted deferredsdeprecate
- Some helpers for deprecation messagesdisplay
- pprint and pformat wrappers with colorization supportengine
- Some debugging functions for working with the Scrapy engineftp
- No module docstring; 2/2 functions documentedgz
- No module docstring; 1/3 function documentedhttpobj
- Helper functions for scrapy.http objects (Request, Response)iterators
- No module docstring; 0/1 variable, 2/4 functions, 0/1 class documentedjob
- Undocumentedlog
- No module docstring; 0/2 variable, 0/1 constant, 4/7 functions, 3/3 classes documentedmisc
- Helper functions which don't fit anywhere elseossignal
- No module docstring; 0/2 variable, 1/1 function documentedproject
- No module docstring; 0/2 constant, 2/4 functions documentedpy36
- Undocumentedpython
- This module contains essential stuff that should've come with Python itself ;)reactor
- No module docstring; 3/4 functions, 1/1 class documentedreqser
- Helper functions for serializing (and deserializing) requests.request
- This module provides some useful functions for working with scrapy.http.Request objectsresponse
- This module provides some useful functions for working with scrapy.http.Response objectsserialize
- Undocumentedsignal
- Helper functions for working with signalssitemap
- Module for processing Sitemaps.spider
- No module docstring; 0/1 variable, 2/3 functions, 0/1 class documentedssl
- Undocumentedtemplate
- Helper functions for working with templatestest
- This module contains some assorted functions used in teststestproc
- Undocumentedtestsite
- Undocumentedtrackref
- This module provides some functions and classes to record and report references to live object instances.url
- This module contains general purpose URL functions not found in the standard library.versions
- Undocumented