This module contains a set of functions for vectorized string operations and methods.
Note
The chararray
class exists for backwards compatibility with
Numarray, it is not recommended for new development. Starting from numpy
1.4, if one needs arrays of strings, it is recommended to use arrays of
dtype
object_
, string_
or unicode_
, and use the free functions
in the numpy.char
module for fast vectorized string operations.
Some methods will only be available if the corresponding string method is available in your version of Python.
The preferred alias for defchararray
is numpy.char
.
Variable | array_function_dispatch |
Undocumented |
Class | chararray |
chararray(shape, itemsize=1, unicode=False, buffer=None, offset=0, strides=None, order=None) |
Function | _binary_op_dispatcher |
Undocumented |
Function | _center_dispatcher |
Undocumented |
Function | _clean_args |
Helper function for delegating arguments to Python string functions. |
Function | _code_dispatcher |
Undocumented |
Function | _count_dispatcher |
Undocumented |
Function | _endswith_dispatcher |
Undocumented |
Function | _expandtabs_dispatcher |
Undocumented |
Function | _get_num_chars |
Helper function that returns the number of characters per field in a string or unicode array. This is to abstract out the fact that for a unicode array this is itemsize / 4. |
Function | _join_dispatcher |
Undocumented |
Function | _just_dispatcher |
Undocumented |
Function | _mod_dispatcher |
Undocumented |
Function | _multiply_dispatcher |
Undocumented |
Function | _partition_dispatcher |
Undocumented |
Function | _replace_dispatcher |
Undocumented |
Function | _split_dispatcher |
Undocumented |
Function | _splitlines_dispatcher |
Undocumented |
Function | _startswith_dispatcher |
Undocumented |
Function | _strip_dispatcher |
Undocumented |
Function | _to_string_or_unicode_array |
Helper function to cast a result back into a string or unicode array if an object array must be used as an intermediary. |
Function | _translate_dispatcher |
Undocumented |
Function | _unary_op_dispatcher |
Undocumented |
Function | _use_unicode |
Helper function for determining the output type of some string operations. |
Function | _zfill_dispatcher |
Undocumented |
Function | add |
Return element-wise string concatenation for two arrays of str or unicode. |
Function | array |
Create a chararray . |
Function | asarray |
Convert the input to a chararray , copying the data only if necessary. |
Function | capitalize |
Return a copy of a with only the first character of each element capitalized. |
Function | center |
Return a copy of a with its elements centered in a string of length width . |
Function | count |
Returns an array with the number of non-overlapping occurrences of substring sub in the range [start , end ]. |
Function | decode |
Calls str.decode element-wise. |
Function | encode |
Calls str.encode element-wise. |
Function | endswith |
Returns a boolean array which is True where the string element in a ends with suffix , otherwise False . |
Function | equal |
Return (x1 == x2) element-wise. |
Function | expandtabs |
Return a copy of each string element where all tab characters are replaced by one or more spaces. |
Function | find |
For each element, return the lowest index in the string where substring sub is found. |
Function | greater |
Return (x1 > x2) element-wise. |
Function | greater_equal |
Return (x1 >= x2) element-wise. |
Function | index |
Like find , but raises ValueError when the substring is not found. |
Function | isalnum |
Returns true for each element if all characters in the string are alphanumeric and there is at least one character, false otherwise. |
Function | isalpha |
Returns true for each element if all characters in the string are alphabetic and there is at least one character, false otherwise. |
Function | isdecimal |
For each element, return True if there are only decimal characters in the element. |
Function | isdigit |
Returns true for each element if all characters in the string are digits and there is at least one character, false otherwise. |
Function | islower |
Returns true for each element if all cased characters in the string are lowercase and there is at least one cased character, false otherwise. |
Function | isnumeric |
For each element, return True if there are only numeric characters in the element. |
Function | isspace |
Returns true for each element if there are only whitespace characters in the string and there is at least one character, false otherwise. |
Function | istitle |
Returns true for each element if the element is a titlecased string and there is at least one character, false otherwise. |
Function | isupper |
Returns true for each element if all cased characters in the string are uppercase and there is at least one character, false otherwise. |
Function | join |
Return a string which is the concatenation of the strings in the sequence seq . |
Function | less |
Return (x1 < x2) element-wise. |
Function | less_equal |
Return (x1 <= x2) element-wise. |
Function | ljust |
Return an array with the elements of a left-justified in a string of length width . |
Function | lower |
Return an array with the elements converted to lowercase. |
Function | lstrip |
For each element in a , return a copy with the leading characters removed. |
Function | mod |
Return (a % i), that is pre-Python 2.6 string formatting (interpolation), element-wise for a pair of array_likes of str or unicode. |
Function | multiply |
Return (a * i), that is string multiple concatenation, element-wise. |
Function | not_equal |
Return (x1 != x2) element-wise. |
Function | partition |
Partition each element in a around sep . |
Function | replace |
For each element in a , return a copy of the string with all occurrences of substring old replaced by new . |
Function | rfind |
For each element in a , return the highest index in the string where substring sub is found, such that sub is contained within [start , end ]. |
Function | rindex |
Like rfind , but raises ValueError when the substring sub is not found. |
Function | rjust |
Return an array with the elements of a right-justified in a string of length width . |
Function | rpartition |
Partition (split) each element around the right-most separator. |
Function | rsplit |
For each element in a , return a list of the words in the string, using sep as the delimiter string. |
Function | rstrip |
For each element in a , return a copy with the trailing characters removed. |
Function | split |
For each element in a , return a list of the words in the string, using sep as the delimiter string. |
Function | splitlines |
For each element in a , return a list of the lines in the element, breaking at line boundaries. |
Function | startswith |
Returns a boolean array which is True where the string element in a starts with prefix , otherwise False . |
Function | str_len |
Return len(a) element-wise. |
Function | strip |
For each element in a , return a copy with the leading and trailing characters removed. |
Function | swapcase |
Return element-wise a copy of the string with uppercase characters converted to lowercase and vice versa. |
Function | title |
Return element-wise title cased version of string or unicode. |
Function | translate |
No summary |
Function | upper |
Return an array with the elements converted to uppercase. |
Function | zfill |
Return the numeric string left-filled with zeros |
Variable | _globalvar |
Undocumented |
Helper function for delegating arguments to Python string functions.
Many of the Python string operations that have optional arguments do not use 'None' to indicate a default value. In these cases, we need to remove all None arguments, and those following them.
Helper function for determining the output type of some string operations.
For an operation on two ndarrays, if at least one is unicode, the result should be unicode.
Return element-wise string concatenation for two arrays of str or unicode.
Arrays x1
and x2
must have the same shape.
string_
or unicode_
, depending on input types
of the same shape as x1
and x2
.Create a chararray
.
Note
This class is provided for numarray backward-compatibility.
New code (not concerned with numarray compatibility) should use
arrays of type string_
or unicode_
and use the free functions
in numpy.char
for fast
vectorized string operations instead.
Versus a regular NumPy array of type str
or unicode
, this
class adds the following functionality:
- values automatically have whitespace removed from the end when indexed
- comparison operators automatically remove whitespace from the end when comparing values
- vectorized string operations are provided as methods (e.g.
str.endswith
) and infix operators (e.g. +, *, %)
obj : array of str or unicode-like
itemsize
is the number of characters per scalar in the
resulting array. If itemsize
is None, and obj
is an
object array or a Python list, the itemsize
will be
automatically determined. If itemsize
is provided and obj
is of type str or unicode, then the obj
string will be
chunked into itemsize
pieces.itemsize
, unicode, order
, etc.).When true, the resulting chararray
can contain Unicode
characters, when false only 8-bit characters. If unicode is
None and obj
is one of the following:
- a
chararray
,- an ndarray of type
str
orunicode
- a Python str or unicode object,
then the unicode setting of the output array will be automatically determined.
Convert the input to a chararray
, copying the data only if
necessary.
Versus a regular NumPy array of type str
or unicode
, this
class adds the following functionality:
- values automatically have whitespace removed from the end when indexed
- comparison operators automatically remove whitespace from the end when comparing values
- vectorized string operations are provided as methods (e.g.
str.endswith
) and infix operators (e.g. +, *,``%``)
obj : array of str or unicode-like
itemsize
is the number of characters per scalar in the
resulting array. If itemsize
is None, and obj
is an
object array or a Python list, the itemsize
will be
automatically determined. If itemsize
is provided and obj
is of type str or unicode, then the obj
string will be
chunked into itemsize
pieces.When true, the resulting chararray
can contain Unicode
characters, when false only 8-bit characters. If unicode is
None and obj
is one of the following:
- a
chararray
,- an ndarray of type
str
or 'unicode`- a Python str or unicode object,
then the unicode setting of the output array will be automatically determined.
Return a copy of a
with only the first character of each element
capitalized.
Calls str.capitalize
element-wise.
For 8-bit strings, this method is locale-dependent.
str.capitalize
>>> c = np.array(['a1b2','1b2a','b2a1','2a1b'],'S4'); c array(['a1b2', '1b2a', 'b2a1', '2a1b'], dtype='|S4') >>> np.char.capitalize(c) array(['A1b2', '1b2a', 'B2a1', '2a1b'], dtype='|S4')
Return a copy of a
with its elements centered in a string of
length width
.
Calls str.center
element-wise.
a : array_like of str or unicode
str.center
Returns an array with the number of non-overlapping occurrences of
substring sub
in the range [start
, end
].
Calls str.count
element-wise.
a : array_like of str or unicode
start
and end
are interpreted as slice
notation to specify the range in which to count.str.count
>>> c = np.array(['aAaAaA', ' aA ', 'abBABba']) >>> c array(['aAaAaA', ' aA ', 'abBABba'], dtype='<U7') >>> np.char.count(c, 'A') array([3, 1, 1]) >>> np.char.count(c, 'aA') array([3, 1, 0]) >>> np.char.count(c, 'A', start=1, end=4) array([2, 1, 1]) >>> np.char.count(c, 'A', start=1, end=3) array([1, 0, 0])
Calls str.decode
element-wise.
The set of available codecs comes from the Python standard library,
and may be extended at runtime. For more information, see the
codecs
module.
a : array_like of str or unicode
out : ndarray
str.decode
The type of the result will depend on the encoding specified.
>>> c = np.array(['aAaAaA', ' aA ', 'abBABba']) >>> c array(['aAaAaA', ' aA ', 'abBABba'], dtype='<U7') >>> np.char.encode(c, encoding='cp037') array(['\x81\xc1\x81\xc1\x81\xc1', '@@\x81\xc1@@', '\x81\x82\xc2\xc1\xc2\x82\x81'], dtype='|S7')
Calls str.encode
element-wise.
The set of available codecs comes from the Python standard library, and may be extended at runtime. For more information, see the codecs module.
a : array_like of str or unicode
out : ndarray
str.encode
The type of the result will depend on the encoding specified.
Returns a boolean array which is True
where the string element
in a
ends with suffix
, otherwise False
.
Calls str.endswith
element-wise.
a : array_like of str or unicode
suffix : str
start
, test beginning at that position. With
optional end
, stop comparing at that position.str.endswith
>>> s = np.array(['foo', 'bar']) >>> s[0] = 'foo' >>> s[1] = 'bar' >>> s array(['foo', 'bar'], dtype='<U3') >>> np.char.endswith(s, 'ar') array([False, True]) >>> np.char.endswith(s, 'a', start=1, end=2) array([False, True])
Return (x1 == x2) element-wise.
Unlike numpy.equal
, this comparison is performed by first
stripping whitespace characters from the end of the string. This
behavior is provided for backward-compatibility with numarray.
not_equal, greater_equal, less_equal, greater, less
Return a copy of each string element where all tab characters are replaced by one or more spaces.
Calls str.expandtabs
element-wise.
Return a copy of each string element where all tab characters are
replaced by one or more spaces, depending on the current column
and the given tabsize
. The column number is reset to zero after
each newline occurring in the string. This doesn't understand other
non-printing characters or escape sequences.
tabsize
number of spaces. If not given defaults
to 8 spaces.str.expandtabs
For each element, return the lowest index in the string where
substring sub
is found.
Calls str.find
element-wise.
For each element, return the lowest index in the string where
substring sub
is found, such that sub
is contained in the
range [start
, end
].
a : array_like of str or unicode
sub : str or unicode
start
and end
are interpreted as in
slice notation.sub
is not found.str.find
Return (x1 > x2) element-wise.
Unlike numpy.greater
, this comparison is performed by first
stripping whitespace characters from the end of the string. This
behavior is provided for backward-compatibility with numarray.
equal, not_equal, greater_equal, less_equal, less
Return (x1 >= x2) element-wise.
Unlike numpy.greater_equal
, this comparison is performed by
first stripping whitespace characters from the end of the string.
This behavior is provided for backward-compatibility with
numarray.
equal, not_equal, less_equal, greater, less
Like find
, but raises ValueError
when the substring is not found.
Calls str.index
element-wise.
a : array_like of str or unicode
sub : str or unicode
start, end : int, optional
sub
is not found.find, str.find
Returns true for each element if all characters in the string are alphanumeric and there is at least one character, false otherwise.
Calls str.isalnum
element-wise.
For 8-bit strings, this method is locale-dependent.
a : array_like of str or unicode
str.isalnum
Returns true for each element if all characters in the string are alphabetic and there is at least one character, false otherwise.
Calls str.isalpha
element-wise.
For 8-bit strings, this method is locale-dependent.
a : array_like of str or unicode
str.isalpha
For each element, return True if there are only decimal characters in the element.
Calls unicode.isdecimal
element-wise.
Decimal characters include digit characters, and all characters that can be used to form decimal-radix numbers, e.g. U+0660, ARABIC-INDIC DIGIT ZERO.
a
.unicode.isdecimal
Returns true for each element if all characters in the string are digits and there is at least one character, false otherwise.
Calls str.isdigit
element-wise.
For 8-bit strings, this method is locale-dependent.
a : array_like of str or unicode
str.isdigit
Returns true for each element if all cased characters in the string are lowercase and there is at least one cased character, false otherwise.
Calls str.islower
element-wise.
For 8-bit strings, this method is locale-dependent.
a : array_like of str or unicode
str.islower
For each element, return True if there are only numeric characters in the element.
Calls unicode.isnumeric
element-wise.
Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH.
a
.unicode.isnumeric
Returns true for each element if there are only whitespace characters in the string and there is at least one character, false otherwise.
Calls str.isspace
element-wise.
For 8-bit strings, this method is locale-dependent.
a : array_like of str or unicode
str.isspace
Returns true for each element if the element is a titlecased string and there is at least one character, false otherwise.
Call str.istitle
element-wise.
For 8-bit strings, this method is locale-dependent.
a : array_like of str or unicode
str.istitle
Returns true for each element if all cased characters in the string are uppercase and there is at least one character, false otherwise.
Call str.isupper
element-wise.
For 8-bit strings, this method is locale-dependent.
a : array_like of str or unicode
str.isupper
Return a string which is the concatenation of the strings in the
sequence seq
.
Calls str.join
element-wise.
sep : array_like of str or unicode seq : array_like of str or unicode
str.join
Return (x1 < x2) element-wise.
Unlike numpy.greater
, this comparison is performed by first
stripping whitespace characters from the end of the string. This
behavior is provided for backward-compatibility with numarray.
equal, not_equal, greater_equal, less_equal, greater
Return (x1 <= x2) element-wise.
Unlike numpy.less_equal
, this comparison is performed by first
stripping whitespace characters from the end of the string. This
behavior is provided for backward-compatibility with numarray.
equal, not_equal, greater_equal, greater, less
Return an array with the elements of a
left-justified in a
string of length width
.
Calls str.ljust
element-wise.
a : array_like of str or unicode
str.ljust
Return an array with the elements converted to lowercase.
Call str.lower
element-wise.
For 8-bit strings, this method is locale-dependent.
str.lower
>>> c = np.array(['A1B C', '1BCA', 'BCA1']); c array(['A1B C', '1BCA', 'BCA1'], dtype='<U5') >>> np.char.lower(c) array(['a1b c', '1bca', 'bca1'], dtype='<U5')
For each element in a
, return a copy with the leading characters
removed.
Calls str.lstrip
element-wise.
chars
argument is a string specifying the set of
characters to be removed. If omitted or None, the chars
argument defaults to removing whitespace. The chars
argument
is not a prefix; rather, all combinations of its values are
stripped.str.lstrip
>>> c = np.array(['aAaAaA', ' aA ', 'abBABba']) >>> c array(['aAaAaA', ' aA ', 'abBABba'], dtype='<U7')
The 'a' variable is unstripped from c[1] because whitespace leading.
>>> np.char.lstrip(c, 'a') array(['AaAaA', ' aA ', 'bBABba'], dtype='<U7')
>>> np.char.lstrip(c, 'A') # leaves c unchanged array(['aAaAaA', ' aA ', 'abBABba'], dtype='<U7') >>> (np.char.lstrip(c, ' ') == np.char.lstrip(c, '')).all() ... # XXX: is this a regression? This used to return True ... # np.char.lstrip(c,'') does not modify c at all. False >>> (np.char.lstrip(c, ' ') == np.char.lstrip(c, None)).all() True
Return (a % i), that is pre-Python 2.6 string formatting (interpolation), element-wise for a pair of array_likes of str or unicode.
a : array_like of str or unicode
str.__mod__
Return (a * i), that is string multiple concatenation, element-wise.
Values in i
of less than 0 are treated as 0 (which yields an
empty string).
a : array_like of str or unicode
i : array_like of ints
Return (x1 != x2) element-wise.
Unlike numpy.not_equal
, this comparison is performed by first
stripping whitespace characters from the end of the string. This
behavior is provided for backward-compatibility with numarray.
equal, greater_equal, less_equal, greater, less
Partition each element in a
around sep
.
Calls str.partition
element-wise.
For each element in a
, split the element as the first
occurrence of sep
, and return 3 strings containing the part
before the separator, the separator itself, and the part after
the separator. If the separator is not found, return 3 strings
containing the string itself, followed by two empty strings.
a
.str.partition
For each element in a
, return a copy of the string with all
occurrences of substring old
replaced by new
.
Calls str.replace
element-wise.
a : array-like of str or unicode
old, new : str or unicode
str.replace
For each element in a
, return the highest index in the string
where substring sub
is found, such that sub
is contained
within [start
, end
].
Calls str.rfind
element-wise.
a : array-like of str or unicode
sub : str or unicode
start
and end
are interpreted as in
slice notation.str.rfind
Like rfind
, but raises ValueError
when the substring sub
is
not found.
Calls str.rindex
element-wise.
a : array-like of str or unicode
sub : str or unicode
start, end : int, optional
rfind, str.rindex
Return an array with the elements of a
right-justified in a
string of length width
.
Calls str.rjust
element-wise.
a : array_like of str or unicode
str.rjust
Partition (split) each element around the right-most separator.
Calls str.rpartition
element-wise.
For each element in a
, split the element as the last
occurrence of sep
, and return 3 strings containing the part
before the separator, the separator itself, and the part after
the separator. If the separator is not found, return 3 strings
containing the string itself, followed by two empty strings.
str.rpartition
For each element in a
, return a list of the words in the
string, using sep
as the delimiter string.
Calls str.rsplit
element-wise.
Except for splitting from the right, rsplit
behaves like split
.
a : array_like of str or unicode
sep
is not specified or None, any whitespace string
is a separator.maxsplit
is given, at most maxsplit
splits are done,
the rightmost ones.str.rsplit, split
For each element in a
, return a copy with the trailing
characters removed.
Calls str.rstrip
element-wise.
a : array-like of str or unicode
str.rstrip
>>> c = np.array(['aAaAaA', 'abBABba'], dtype='S7'); c array(['aAaAaA', 'abBABba'], dtype='|S7') >>> np.char.rstrip(c, b'a') array(['aAaAaA', 'abBABb'], dtype='|S7') >>> np.char.rstrip(c, b'A') array(['aAaAa', 'abBABba'], dtype='|S7')
For each element in a
, return a list of the words in the
string, using sep
as the delimiter string.
Calls str.split
element-wise.
a : array_like of str or unicode
sep
is not specified or None, any whitespace string is a
separator.maxsplit
is given, at most maxsplit
splits are done.str.split, rsplit
For each element in a
, return a list of the lines in the
element, breaking at line boundaries.
Calls str.splitlines
element-wise.
a : array_like of str or unicode
str.splitlines
Returns a boolean array which is True
where the string element
in a
starts with prefix
, otherwise False
.
Calls str.startswith
element-wise.
a : array_like of str or unicode
prefix : str
start
, test beginning at that position. With
optional end
, stop comparing at that position.str.startswith
Return len(a) element-wise.
a : array_like of str or unicode
builtins.len
For each element in a
, return a copy with the leading and
trailing characters removed.
Calls str.strip
element-wise.
a : array-like of str or unicode
str.strip
>>> c = np.array(['aAaAaA', ' aA ', 'abBABba']) >>> c array(['aAaAaA', ' aA ', 'abBABba'], dtype='<U7') >>> np.char.strip(c) array(['aAaAaA', 'aA', 'abBABba'], dtype='<U7') >>> np.char.strip(c, 'a') # 'a' unstripped from c[1] because whitespace leads array(['AaAaA', ' aA ', 'bBABb'], dtype='<U7') >>> np.char.strip(c, 'A') # 'A' unstripped from c[1] because (unprinted) ws trails array(['aAaAa', ' aA ', 'abBABba'], dtype='<U7')
Return element-wise a copy of the string with uppercase characters converted to lowercase and vice versa.
Calls str.swapcase
element-wise.
For 8-bit strings, this method is locale-dependent.
str.swapcase
>>> c=np.array(['a1B c','1b Ca','b Ca1','cA1b'],'S5'); c array(['a1B c', '1b Ca', 'b Ca1', 'cA1b'], dtype='|S5') >>> np.char.swapcase(c) array(['A1b C', '1B cA', 'B cA1', 'Ca1B'], dtype='|S5')
Return element-wise title cased version of string or unicode.
Title case words start with uppercase characters, all remaining cased characters are lowercase.
Calls str.title
element-wise.
For 8-bit strings, this method is locale-dependent.
str.title
>>> c=np.array(['a1b c','1b ca','b ca1','ca1b'],'S5'); c array(['a1b c', '1b ca', 'b ca1', 'ca1b'], dtype='|S5') >>> np.char.title(c) array(['A1B C', '1B Ca', 'B Ca1', 'Ca1B'], dtype='|S5')
For each element in a
, return a copy of the string where all
characters occurring in the optional argument deletechars
are
removed, and the remaining characters have been mapped through the
given translation table.
Calls str.translate
element-wise.
a : array-like of str or unicode
table : str of length 256
deletechars : str
str.translate
Return an array with the elements converted to uppercase.
Calls str.upper
element-wise.
For 8-bit strings, this method is locale-dependent.
str.upper
>>> c = np.array(['a1b c', '1bca', 'bca1']); c array(['a1b c', '1bca', 'bca1'], dtype='<U5') >>> np.char.upper(c) array(['A1B C', '1BCA', 'BCA1'], dtype='<U5')
Return the numeric string left-filled with zeros
Calls str.zfill
element-wise.
a
.str.zfill