class documentation

class Scheduler:

View In Hierarchy

Scrapy Scheduler. It allows to enqueue requests and then get a next request to download. Scheduler is also handling duplication filtering, via dupefilter.

Prioritization and queueing is not performed by the Scheduler. User sets priority field for each Request, and a PriorityQueue (defined by :setting:`SCHEDULER_PRIORITY_QUEUE`) uses these priorities to dequeue requests in a desired order.

Scheduler uses two PriorityQueue instances, configured to work in-memory and on-disk (optional). When on-disk queue is present, it is used by default, and an in-memory queue is used as a fallback for cases where a disk queue can't handle a request (can't serialize it).

:setting:`SCHEDULER_MEMORY_QUEUE` and :setting:`SCHEDULER_DISK_QUEUE` allow to specify lower-level queue classes which PriorityQueue instances would be instantiated with, to keep requests on disk and in memory respectively.

Overall, Scheduler is an object which holds several PriorityQueue instances (in-memory and on-disk) and implements fallback logic for them. Also, it handles dupefilters.

Class Method from​_crawler Undocumented
Method __init__ Undocumented
Method __len__ Undocumented
Method ​_dq Create a new priority queue instance, with disk storage
Method ​_dqdir Return a folder name to keep disk queue state at
Method ​_dqpop Undocumented
Method ​_dqpush Undocumented
Method ​_mq Create a new priority queue instance, with in-memory storage
Method ​_mqpush Undocumented
Method ​_read​_dqs​_state Undocumented
Method ​_write​_dqs​_state Undocumented
Method close Undocumented
Method enqueue​_request Undocumented
Method has​_pending​_requests Undocumented
Method next​_request Undocumented
Method open Undocumented
Instance Variable crawler Undocumented
Instance Variable df Undocumented
Instance Variable dqclass Undocumented
Instance Variable dqdir Undocumented
Instance Variable dqs Undocumented
Instance Variable logunser Undocumented
Instance Variable mqclass Undocumented
Instance Variable mqs Undocumented
Instance Variable pqclass Undocumented
Instance Variable spider Undocumented
Instance Variable stats Undocumented
@classmethod
def from_crawler(cls, crawler):

Undocumented

def __init__(self, dupefilter, jobdir=None, dqclass=None, mqclass=None, logunser=False, stats=None, pqclass=None, crawler=None):

Undocumented

def __len__(self):

Undocumented

def _dq(self):
Create a new priority queue instance, with disk storage
def _dqdir(self, jobdir):
Return a folder name to keep disk queue state at
def _dqpop(self):

Undocumented

def _dqpush(self, request):

Undocumented

def _mq(self):
Create a new priority queue instance, with in-memory storage
def _mqpush(self, request):

Undocumented

def _read_dqs_state(self, dqdir):

Undocumented

def _write_dqs_state(self, dqdir, state):

Undocumented

def close(self, reason):

Undocumented

def enqueue_request(self, request):

Undocumented

def has_pending_requests(self):

Undocumented

def next_request(self):

Undocumented

def open(self, spider):

Undocumented

crawler =

Undocumented

df =

Undocumented

dqclass =

Undocumented

dqdir =

Undocumented

dqs =

Undocumented

logunser: bool =

Undocumented

mqclass =

Undocumented

mqs =

Undocumented

pqclass =

Undocumented

spider =

Undocumented

stats =

Undocumented