class documentation

class DataSource:

Known subclasses: numpy.lib._datasource.Repository

View In Hierarchy

DataSource(destpath='.')

A generic data source file (file, http, ftp, ...).

DataSources can be local files or remote files/URLs. The files may also be compressed or uncompressed. DataSource hides some of the low-level details of downloading the file, allowing you to simply pass in a valid file path (or URL) and obtain a file object.

Parameters

destpath : str or None, optional
Path to the directory where the source file gets downloaded to for use. If destpath is None, a temporary directory will be created. The default path is the current directory.

Notes

URLs require a scheme string (http://) to be used, without it they will fail:

>>> repos = np.DataSource()
>>> repos.exists('www.google.com/index.html')
False
>>> repos.exists('http://www.google.com/index.html')
True

Temporary directories are deleted when the DataSource is deleted.

Examples

>>> ds = np.DataSource('/home/guido')
>>> urlname = 'http://www.google.com/'
>>> gfile = ds.open('http://www.google.com/')
>>> ds.abspath(urlname)
'/home/guido/www.google.com/index.html'

>>> ds = np.DataSource(None)  # use with temporary file
>>> ds.open('/home/guido/foobar.txt')
<open file '/home/guido.foobar.txt', mode 'r' at 0x91d4430>
>>> ds.abspath('/home/guido/foobar.txt')
'/tmp/.../home/guido/foobar.txt'
Method __del__ Undocumented
Method __init__ Create a DataSource with a local path at destpath.
Method ​_cache Cache the file specified by path.
Method ​_findfile Searches for path and returns full path if found.
Method ​_isurl Test if path is a net location. Tests the scheme and netloc.
Method ​_iswritemode Test if the given mode will open a file for writing.
Method ​_iszip Test if the filename is a zip file by looking at the file extension.
Method ​_possible​_names Return a tuple containing compressed filename variations.
Method ​_sanitize​_relative​_path Return a sanitised relative path for which os.path.abspath(os.path.join(base, path)).startswith(base)
Method ​_splitzipext Split zip extension from filename and return filename.
Method abspath Return absolute path of file in the DataSource directory.
Method exists Test if path exists.
Method open Open and return file-like object.
Instance Variable ​_destpath Undocumented
Instance Variable ​_istmpdest Undocumented
def __del__(self):

Undocumented

def __init__(self, destpath=os.curdir):
Create a DataSource with a local path at destpath.
def _cache(self, path):

Cache the file specified by path.

Creates a copy of the file in the datasource cache.

def _findfile(self, path):

Searches for path and returns full path if found.

If path is an URL, _findfile will cache a local copy and return the path to the cached file. If path is a local file, _findfile will return a path to that local file.

The search will include possible compressed versions of the file and return the first occurrence found.

def _isurl(self, path):
Test if path is a net location. Tests the scheme and netloc.
def _iswritemode(self, mode):
Test if the given mode will open a file for writing.
def _iszip(self, filename):
Test if the filename is a zip file by looking at the file extension.
def _possible_names(self, filename):
Return a tuple containing compressed filename variations.
def _sanitize_relative_path(self, path):
Return a sanitised relative path for which os.path.abspath(os.path.join(base, path)).startswith(base)
def _splitzipext(self, filename):

Split zip extension from filename and return filename.

Returns:
base, zip_ext : {tuple}
def abspath(self, path):

Return absolute path of file in the DataSource directory.

If path is an URL, then abspath will return either the location the file exists locally or the location it would exist when opened using the open method.

Parameters

path : str
Can be a local file or a remote URL.

Returns

out : str
Complete path, including the DataSource destination directory.

Notes

The functionality is based on os.path.abspath.

def exists(self, path):

Test if path exists.

Test if path exists as (and in this order):

  • a local file.
  • a remote URL that has been downloaded and stored locally in the DataSource directory.
  • a remote URL that has not been downloaded, but is valid and accessible.

Parameters

path : str
Can be a local file or a remote URL.

Returns

out : bool
True if path exists.

Notes

When path is an URL, exists will return True if it's either stored locally in the DataSource directory, or is a valid remote URL. DataSource does not discriminate between the two, the file is accessible if it exists in either location.

def open(self, path, mode='r', encoding=None, newline=None):

Open and return file-like object.

If path is an URL, it will be downloaded, stored in the DataSource directory and opened from there.

Parameters

path : str
Local file path or URL to open.
mode : {'r', 'w', 'a'}, optional
Mode to open path. Mode 'r' for reading, 'w' for writing, 'a' to append. Available modes depend on the type of object specified by path. Default is 'r'.
encoding : {None, str}, optional
Open text file with given encoding. The default encoding will be what io.open uses.
newline : {None, str}, optional
Newline to use when reading text file.

Returns

out : file object
File object.
_destpath =

Undocumented

_istmpdest: bool =

Undocumented