arkouda.pdarraycreation¶
Functions¶
|
arange([start,] stop[, stride,] dtype=int64) |
|
Convert a Python or Numpy Iterable to a pdarray or Strings object, sending |
|
Create a bigint pdarray from an iterable of uint pdarrays. |
|
Converts a Pandas Series to an Arkouda pdarray or Strings object. If |
|
Create a pdarray filled with fill_value. |
|
Create a pdarray filled with fill_value of the same size and dtype as an existing |
|
Create a pdarray of linearly-spaced floats in a closed interval. |
|
Create a pdarray filled with ones. |
|
Create a one-filled pdarray of the same size and dtype as an existing |
|
Promote a list of pdarrays to a common dtype. |
|
Generate a pdarray of randomized int, float, or bool values in a |
|
Generate random strings with log-normally distributed lengths and |
|
Generate random strings with lengths uniformly distributed between |
|
Create a pdarray from a single scalar value. |
|
Draw real numbers from the standard normal distribution. |
|
Generate a pdarray with uniformly distributed random float values |
|
Create a pdarray filled with zeros. |
|
Create a zero-filled pdarray of the same size and dtype as an existing |
Module Contents¶
- arkouda.pdarraycreation.arange(*args, **kwargs) arkouda.pdarrayclass.pdarray [source]¶
arange([start,] stop[, stride,] dtype=int64)
Create a pdarray of consecutive integers within the interval [start, stop). If only one arg is given then arg is the stop parameter. If two args are given, then the first arg is start and second is stop. If three args are given, then the first arg is start, second is stop, third is stride.
The return value is cast to type dtype
- Parameters:
start (int_scalars, optional) – Starting value (inclusive)
stop (int_scalars) – Stopping value (exclusive)
stride (int_scalars, optional) – The difference between consecutive elements, the default stride is 1, if stride is specified then start must also be specified.
dtype (np.dtype, type, or str) – The target dtype to cast values to
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
- Returns:
Integers from start (inclusive) to stop (exclusive) by stride
- Return type:
pdarray, dtype
- Raises:
TypeError – Raised if start, stop, or stride is not an int object
ZeroDivisionError – Raised if stride == 0
Notes
Negative strides result in decreasing values. Currently, only int64 pdarrays can be created with this method. For float64 arrays, use the linspace method.
Examples
>>> ak.arange(0, 5, 1) array([0, 1, 2, 3, 4])
>>> ak.arange(5, 0, -1) array([5, 4, 3, 2, 1])
>>> ak.arange(0, 10, 2) array([0, 2, 4, 6, 8])
>>> ak.arange(-5, -10, -1) array([-5, -6, -7, -8, -9])
- arkouda.pdarraycreation.array(a: arkouda.pdarrayclass.pdarray | numpy.ndarray | Iterable, dtype: numpy.dtype | type | str | None = None, max_bits: int = -1) arkouda.pdarrayclass.pdarray | arkouda.strings.Strings [source]¶
Convert a Python or Numpy Iterable to a pdarray or Strings object, sending the corresponding data to the arkouda server.
- Parameters:
- Returns:
A pdarray instance stored on arkouda server or Strings instance, which is composed of two pdarrays stored on arkouda server
- Return type:
- Raises:
TypeError – Raised if a is not a pdarray, np.ndarray, or Python Iterable such as a list, array, tuple, or deque
RuntimeError – Raised if a is not one-dimensional, nbytes > maxTransferBytes, a.dtype is not supported (not in DTypes), or if the product of a size and a.itemsize > maxTransferBytes
ValueError – Raised if the returned message is malformed or does not contain the fields required to generate the array.
See also
pdarray.to_ndarray
Notes
The number of bytes in the input array cannot exceed ak.client.maxTransferBytes, otherwise a RuntimeError will be raised. This is to protect the user from overwhelming the connection between the Python client and the arkouda server, under the assumption that it is a low-bandwidth connection. The user may override this limit by setting ak.client.maxTransferBytes to a larger value, but should proceed with caution.
If the pdrray or ndarray is of type U, this method is called twice recursively to create the Strings object and the two corresponding pdarrays for string bytes and offsets, respectively.
Examples
>>> ak.array(np.arange(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> ak.array(range(1,10)) array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> strings = ak.array([f'string {i}' for i in range(0,5)]) >>> type(strings) <class 'arkouda.strings.Strings'>
- arkouda.pdarraycreation.bigint_from_uint_arrays(arrays, max_bits=-1)[source]¶
Create a bigint pdarray from an iterable of uint pdarrays. The first item in arrays will be the highest 64 bits and the last item will be the lowest 64 bits.
- Parameters:
arrays (Sequence[pdarray]) – An iterable of uint pdarrays used to construct the bigint pdarray. The first item in arrays will be the highest 64 bits and the last item will be the lowest 64 bits.
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
- Returns:
bigint pdarray constructed from uint arrays
- Return type:
- Raises:
TypeError – Raised if any pdarray in arrays has a dtype other than uint or if the pdarrays are not the same size.
RuntimeError – Raised if there is a server-side error thrown
See also
pdarray.bigint_to_uint_arrays
Examples
>>> a = ak.bigint_from_uint_arrays([ak.ones(5, dtype=ak.uint64), ak.arange(5, dtype=ak.uint64)]) >>> a array(["18446744073709551616" "18446744073709551617" "18446744073709551618" "18446744073709551619" "18446744073709551620"])
>>> a.dtype dtype(bigint)
>>> all(a[i] == 2**64 + i for i in range(5)) True
- arkouda.pdarraycreation.from_series(series: pandas.Series, dtype: type | str | None = None) arkouda.pdarrayclass.pdarray | arkouda.strings.Strings [source]¶
Converts a Pandas Series to an Arkouda pdarray or Strings object. If dtype is None, the dtype is inferred from the Pandas Series. Otherwise, the dtype parameter is set if the dtype of the Pandas Series is to be overridden or is unknown (for example, in situations where the Series dtype is object).
- Parameters:
series (Pandas Series) – The Pandas Series with a dtype of bool, float64, int64, or string
dtype (Optional[type]) – The valid dtype types are np.bool, np.float64, np.int64, and np.str
- Return type:
- Raises:
TypeError – Raised if series is not a Pandas Series object
ValueError – Raised if the Series dtype is not bool, float64, int64, string, datetime, or timedelta
Examples
>>> ak.from_series(pd.Series(np.random.randint(0,10,5))) array([9, 0, 4, 7, 9])
>>> ak.from_series(pd.Series(['1', '2', '3', '4', '5']),dtype=np.int64) array([1, 2, 3, 4, 5])
>>> ak.from_series(pd.Series(np.random.uniform(low=0.0,high=1.0,size=3))) array([0.57600036956445599, 0.41619265571741659, 0.6615356693784662])
>>> ak.from_series(pd.Series(['0.57600036956445599', '0.41619265571741659', '0.6615356693784662']), dtype=np.float64) array([0.57600036956445599, 0.41619265571741659, 0.6615356693784662])
>>> ak.from_series(pd.Series(np.random.choice([True, False],size=5))) array([True, False, True, True, True])
>>> ak.from_series(pd.Series(['True', 'False', 'False', 'True', 'True']), dtype=np.bool) array([True, True, True, True, True])
>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e'], dtype="string")) array(['a', 'b', 'c', 'd', 'e'])
>>> ak.from_series(pd.Series(['a', 'b', 'c', 'd', 'e']),dtype=np.str) array(['a', 'b', 'c', 'd', 'e'])
>>> ak.from_series(pd.Series(pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01')]))) array([1514764800000000000, 1514764800000000000])
Notes
The supported datatypes are bool, float64, int64, string, and datetime64[ns]. The data type is either inferred from the the Series or is set via the dtype parameter.
Series of datetime or timedelta are converted to Arkouda arrays of dtype int64 (nanoseconds)
A Pandas Series containing strings has a dtype of object. Arkouda assumes the Series contains strings and sets the dtype to str
- arkouda.pdarraycreation.full(size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] | str, fill_value: arkouda.numpy.dtypes.numeric_scalars | str, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint = float64, max_bits: int | None = None) arkouda.pdarrayclass.pdarray | arkouda.strings.Strings [source]¶
Create a pdarray filled with fill_value.
- Parameters:
size (int_scalars) – Size of the array (only rank-1 arrays supported)
fill_value (int_scalars) – Value with which the array will be filled
dtype (all_scalars) – Resulting array type, default float64
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
- Returns:
array of the requested size and dtype filled with fill_value
- Return type:
- Raises:
TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int.
Examples
>>> ak.full(5, 7, dtype=ak.int64) array([7, 7, 7, 7, 7])
>>> ak.full(5, 9, dtype=ak.float64) array([9, 9, 9, 9, 9])
>>> ak.full(5, 5, dtype=ak.bool_) array([True, True, True, True, True])
- arkouda.pdarraycreation.full_like(pda: arkouda.pdarrayclass.pdarray, fill_value: arkouda.numpy.dtypes.numeric_scalars) arkouda.pdarrayclass.pdarray | arkouda.strings.Strings [source]¶
Create a pdarray filled with fill_value of the same size and dtype as an existing pdarray.
- Parameters:
pda (pdarray) – Array to use for size and dtype
fill_value (int_scalars) – Value with which the array will be filled
- Returns:
Equivalent to ak.full(pda.size, fill_value, pda.dtype)
- Return type:
- Raises:
TypeError – Raised if the pda parameter is not a pdarray.
See also
Notes
Logic for generating the pdarray is delegated to the ak.full method. Accordingly, the supported dtypes match are defined by the ak.full method.
Examples
>>> full = ak.full(5, 7, dtype=ak.int64) >>> ak.full_like(full) array([7, 7, 7, 7, 7])
>>> full = ak.full(5, 9, dtype=ak.float64) >>> ak.full_like(full) array([9, 9, 9, 9, 9])
>>> full = ak.full(5, 5, dtype=ak.bool_) >>> ak.full_like(full) array([True, True, True, True, True])
- arkouda.pdarraycreation.linspace(start: arkouda.numpy.dtypes.numeric_scalars, stop: arkouda.numpy.dtypes.numeric_scalars, length: arkouda.numpy.dtypes.int_scalars) arkouda.pdarrayclass.pdarray [source]¶
Create a pdarray of linearly-spaced floats in a closed interval.
- Parameters:
start (numeric_scalars) – Start of interval (inclusive)
stop (numeric_scalars) – End of interval (inclusive)
length (int_scalars) – Number of points
- Returns:
Array of evenly spaced float values along the interval
- Return type:
- Raises:
TypeError – Raised if start or stop is not a float or int or if length is not an int
See also
Notes
If that start is greater than stop, the pdarray values are generated in descending order.
Examples
>>> ak.linspace(0, 1, 5) array([0, 0.25, 0.5, 0.75, 1])
>>> ak.linspace(start=1, stop=0, length=5) array([1, 0.75, 0.5, 0.25, 0])
>>> ak.linspace(start=-5, stop=0, length=5) array([-5, -3.75, -2.5, -1.25, 0])
- arkouda.pdarraycreation.ones(size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] | str, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint = float64, max_bits: int | None = None) arkouda.pdarrayclass.pdarray [source]¶
Create a pdarray filled with ones.
- Parameters:
size (int_scalars) – Size of the array (only rank-1 arrays supported)
dtype (Union[float64, int64, bool]) – Resulting array type, default float64
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
- Returns:
Ones of the requested size and dtype
- Return type:
- Raises:
TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int.
Examples
>>> ak.ones(5, dtype=ak.int64) array([1, 1, 1, 1, 1])
>>> ak.ones(5, dtype=ak.float64) array([1, 1, 1, 1, 1])
>>> ak.ones(5, dtype=ak.bool_) array([True, True, True, True, True])
- arkouda.pdarraycreation.ones_like(pda: arkouda.pdarrayclass.pdarray) arkouda.pdarrayclass.pdarray [source]¶
Create a one-filled pdarray of the same size and dtype as an existing pdarray.
- Parameters:
pda (pdarray) – Array to use for size and dtype
- Returns:
Equivalent to ak.ones(pda.size, pda.dtype)
- Return type:
- Raises:
TypeError – Raised if the pda parameter is not a pdarray.
See also
Notes
Logic for generating the pdarray is delegated to the ak.ones method. Accordingly, the supported dtypes match are defined by the ak.ones method.
Examples
>>> ones = ak.ones(5, dtype=ak.int64) >>> ak.ones_like(ones) array([1, 1, 1, 1, 1])
>>> ones = ak.ones(5, dtype=ak.float64) >>> ak.ones_like(ones) array([1, 1, 1, 1, 1])
>>> ones = ak.ones(5, dtype=ak.bool_) >>> ak.ones_like(ones) array([True, True, True, True, True])
- arkouda.pdarraycreation.promote_to_common_dtype(arrays: List[arkouda.pdarrayclass.pdarray]) Tuple[Any, List[arkouda.pdarrayclass.pdarray]] [source]¶
Promote a list of pdarrays to a common dtype.
- Parameters:
arrays (List[pdarray]) – List of pdarrays to promote
- Returns:
The common dtype of the pdarrays and the list of pdarrays promoted to that dtype
- Return type:
dtype, List[pdarray]
- Raises:
TypeError – Raised if the pdarrays are not all of the same dtype
See also
pdarray.promote_dtype
Examples
>>> a = ak.arange(5) >>> b = ak.ones(5, dtype=ak.float64) >>> dtype, promoted = promote_to_common_dtype([a, b]) >>> dtype dtype(float64) >>> all(isinstance(p, pdarray) and p.dtype == dtype for p in promoted) True
- arkouda.pdarraycreation.randint(low: arkouda.numpy.dtypes.numeric_scalars, high: arkouda.numpy.dtypes.numeric_scalars, size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] = 1, dtype=akint64, seed: arkouda.numpy.dtypes.int_scalars | None = None) arkouda.pdarrayclass.pdarray [source]¶
Generate a pdarray of randomized int, float, or bool values in a specified range bounded by the low and high parameters.
- Parameters:
low (numeric_scalars) – The low value (inclusive) of the range
high (numeric_scalars) – The high value (exclusive for int, inclusive for float) of the range
size (int_scalars) – The length of the returned array
dtype (Union[int64, float64, bool]) – The dtype of the array
seed (int_scalars, optional) – Index for where to pull the first returned value
- Returns:
Values drawn uniformly from the specified range having the desired dtype
- Return type:
- Raises:
TypeError – Raised if dtype.name not in DTypes, size is not an int, low or high is not an int or float, or seed is not an int
ValueError – Raised if size < 0 or if high < low
Notes
Calling randint with dtype=float64 will result in uniform non-integral floating point values.
Ranges >= 2**64 in size is undefined behavior because it exceeds the maximum value that can be stored on the server (uint64)
Examples
>>> ak.randint(0, 10, 5) array([5, 7, 4, 8, 3])
>>> ak.randint(0, 1, 3, dtype=ak.float64) array([0.92176432277231968, 0.083130710959903542, 0.68894208386667544])
>>> ak.randint(0, 1, 5, dtype=ak.bool_) array([True, False, True, True, True])
>>> ak.randint(1, 5, 10, seed=2) array([4, 3, 1, 3, 4, 4, 2, 4, 3, 2])
>>> ak.randint(1, 5, 3, dtype=ak.float64, seed=2) array([2.9160772326374946, 4.353429832157099, 4.5392023718621486])
>>> ak.randint(1, 5, 10, dtype=ak.bool, seed=2) array([False, True, True, True, True, False, True, True, True, True])
- arkouda.pdarraycreation.random_strings_lognormal(logmean: arkouda.numpy.dtypes.numeric_scalars, logstd: arkouda.numpy.dtypes.numeric_scalars, size: arkouda.numpy.dtypes.int_scalars, characters: str = 'uppercase', seed: arkouda.numpy.dtypes.int_scalars | None = None) arkouda.strings.Strings [source]¶
Generate random strings with log-normally distributed lengths and with characters drawn from a specified set.
- Parameters:
logmean (numeric_scalars) – The log-mean of the length distribution
logstd (numeric_scalars) – The log-standard-deviation of the length distribution
size (int_scalars) – The number of strings to generate
characters ((uppercase, lowercase, numeric, printable, binary)) – The set of characters to draw from
seed (int_scalars, optional) – Value used to initialize the random number generator
- Returns:
The Strings object encapsulating a pdarray of random strings
- Return type:
- Raises:
TypeError – Raised if logmean is neither a float nor a int, logstd is not a float, size is not an int, or if characters is not a str
ValueError – Raised if logstd <= 0 or size < 0
See also
Notes
The lengths of the generated strings are distributed $Lognormal(mu, sigma^2)$, with \(\mu = logmean\) and \(\sigma = logstd\). Thus, the strings will have an average length of \(exp(\mu + 0.5*\sigma^2)\), a minimum length of zero, and a heavy tail towards longer strings.
Examples
>>> ak.random_strings_lognormal(2, 0.25, 5, seed=1) array(['TVKJTE', 'ABOCORHFM', 'LUDMMGTB', 'KWOQNPHZ', 'VSXRRL'])
>>> ak.random_strings_lognormal(2, 0.25, 5, seed=1, characters='printable') array(['+5"fp-', ']3Q4kC~HF', '=F=`,IE!', 'DjkBa'9(', '5oZ1)='])
- arkouda.pdarraycreation.random_strings_uniform(minlen: arkouda.numpy.dtypes.int_scalars, maxlen: arkouda.numpy.dtypes.int_scalars, size: arkouda.numpy.dtypes.int_scalars, characters: str = 'uppercase', seed: None | arkouda.numpy.dtypes.int_scalars = None) arkouda.strings.Strings [source]¶
Generate random strings with lengths uniformly distributed between minlen and maxlen, and with characters drawn from a specified set.
- Parameters:
minlen (int_scalars) – The minimum allowed length of string
maxlen (int_scalars) – The maximum allowed length of string
size (int_scalars) – The number of strings to generate
characters ((uppercase, lowercase, numeric, printable, binary)) – The set of characters to draw from
seed (Union[None, int_scalars], optional) – Value used to initialize the random number generator
- Returns:
The array of random strings
- Return type:
- Raises:
ValueError – Raised if minlen < 0, maxlen < minlen, or size < 0
See also
Examples
>>> ak.random_strings_uniform(minlen=1, maxlen=5, seed=1, size=5) array(['TVKJ', 'EWAB', 'CO', 'HFMD', 'U'])
>>> ak.random_strings_uniform(minlen=1, maxlen=5, seed=1, size=5, ... characters='printable') array(['+5"f', '-P]3', '4k', '~HFF', 'F'])
- arkouda.pdarraycreation.scalar_array(value: arkouda.numpy.dtypes.numeric_scalars, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint | None = None) arkouda.pdarrayclass.pdarray [source]¶
Create a pdarray from a single scalar value.
- Parameters:
value (numeric_scalars) – Value to create pdarray from
- Returns:
pdarray with a single element
- Return type:
- arkouda.pdarraycreation.standard_normal(size: arkouda.numpy.dtypes.int_scalars, seed: None | arkouda.numpy.dtypes.int_scalars = None) arkouda.pdarrayclass.pdarray [source]¶
Draw real numbers from the standard normal distribution.
- Parameters:
size (int_scalars) – The number of samples to draw (size of the returned array)
seed (int_scalars) – Value used to initialize the random number generator
- Returns:
The array of random numbers
- Return type:
- Raises:
TypeError – Raised if size is not an int
ValueError – Raised if size < 0
See also
Notes
For random samples from \(N(\mu, \sigma^2)\), use:
(sigma * standard_normal(size)) + mu
Examples
>>> ak.standard_normal(3,1) array([-0.68586185091150265, 1.1723810583573375, 0.567584107142031])
- arkouda.pdarraycreation.uniform(size: arkouda.numpy.dtypes.int_scalars, low: arkouda.numpy.dtypes.numeric_scalars = float(0.0), high: arkouda.numpy.dtypes.numeric_scalars = 1.0, seed: None | arkouda.numpy.dtypes.int_scalars = None) arkouda.pdarrayclass.pdarray [source]¶
Generate a pdarray with uniformly distributed random float values in a specified range.
- Parameters:
low (float_scalars) – The low value (inclusive) of the range, defaults to 0.0
high (float_scalars) – The high value (inclusive) of the range, defaults to 1.0
size (int_scalars) – The length of the returned array
seed (int_scalars, optional) – Value used to initialize the random number generator
- Returns:
Values drawn uniformly from the specified range
- Return type:
- Raises:
TypeError – Raised if dtype.name not in DTypes, size is not an int, or if either low or high is not an int or float
ValueError – Raised if size < 0 or if high < low
Notes
The logic for uniform is delegated to the ak.randint method which is invoked with a dtype of float64
Examples
>>> ak.uniform(3) array([0.92176432277231968, 0.083130710959903542, 0.68894208386667544])
>>> ak.uniform(size=3,low=0,high=5,seed=0) array([0.30013431967121934, 0.47383036230759112, 1.0441791878997098])
- arkouda.pdarraycreation.zeros(size: arkouda.numpy.dtypes.int_scalars | Tuple[arkouda.numpy.dtypes.int_scalars, Ellipsis] | str, dtype: numpy.dtype | type | str | arkouda.numpy.dtypes.bigint = float64, max_bits: int | None = None) arkouda.pdarrayclass.pdarray [source]¶
Create a pdarray filled with zeros.
- Parameters:
size (int_scalars) – Size of the array (only rank-1 arrays supported)
dtype (all_scalars) – Type of resulting array, default float64
max_bits (int) – Specifies the maximum number of bits; only used for bigint pdarrays
- Returns:
Zeros of the requested size and dtype
- Return type:
- Raises:
TypeError – Raised if the supplied dtype is not supported or if the size parameter is neither an int nor a str that is parseable to an int.
See also
Examples
>>> ak.zeros(5, dtype=ak.int64) array([0, 0, 0, 0, 0])
>>> ak.zeros(5, dtype=ak.float64) array([0, 0, 0, 0, 0])
>>> ak.zeros(5, dtype=ak.bool_) array([False, False, False, False, False])
- arkouda.pdarraycreation.zeros_like(pda: arkouda.pdarrayclass.pdarray) arkouda.pdarrayclass.pdarray [source]¶
Create a zero-filled pdarray of the same size and dtype as an existing pdarray.
- Parameters:
pda (pdarray) – Array to use for size and dtype
- Returns:
Equivalent to ak.zeros(pda.size, pda.dtype)
- Return type:
- Raises:
TypeError – Raised if the pda parameter is not a pdarray.
Examples
>>> zeros = ak.zeros(5, dtype=ak.int64) >>> ak.zeros_like(zeros) array([0, 0, 0, 0, 0])
>>> zeros = ak.zeros(5, dtype=ak.float64) >>> ak.zeros_like(zeros) array([0, 0, 0, 0, 0])
>>> zeros = ak.zeros(5, dtype=ak.bool_) >>> ak.zeros_like(zeros) array([False, False, False, False, False])