s

Task2



Task 2. Explain the use of the “Simple random data” and “Permutations” functions.

The Numpy random module contains some simple random data functions for random sampling of both discrete and continuous data from the uniform, normal or standard normal probability distributions and for randomly sampling from a sequence or set of elements. It also has two permutations function for shuffling arrays. These functions all return either arrays of random values where the number of dimensions and the number of elements can be specified or scalar values. For this assignment, I have separated the ten simple random data functions into groups according to the type of distribution they draw from and the type of number returned. Some of these functions appear to do the same thing and some are variants of the distribution functions for the Uniform and Normal distributions which are covered in more detail in Task 3 below.

Simple random data functions

Permutations functions

Random samples from the Standard Normal Distributions (floats)

The numpy.random randn() function returns a sample (or samples) of random values from the standard normal distribution, also known as the univariate normal Gaussian distribution, that has a mean of 0 and variance of 1. This
The size of the sample returned and the shape of the can be specified or else a single float randomly sampled from the distribution is returned. You can also get random samples from other Normal distributions with a specified mean and variance $N(\mu, \sigma^2)$ by scaling the values by multiplying the output of the rand function by $\sigma$ and adding $\mu$ where the standard deviation $\sigma$ is the square root of the variance $\sigma^2$. numpy.random.randn() is a convenience function for the numpy.random.standard_normal function which takes a tuple for the size and shape dimensions and is covered in more detail in Task 3.

Using numpy.random.randn function.

rand can be used to create various sized arrays. Used without a size argument (a positive integer) a single float value will be returned. If two integers are provided to rand then the returned ndarray has two dimensions and the number of samples is the product of these two values.

I have written a small function below to print the array attributes rather than repeating myself over and over! I will use my function arrayinfo to print the resulting arrays with their dimensions.

def arrayinfo(myarray):  # pass the array to the function to print the attributes
    """print the attributes of the numpy array passed to this function"""
    
    print(f"A {myarray.ndim} dimensional array, with a total number of {myarray.size} {myarray.dtype} elements in  a{myarray.shape} shape ")
    print('The values sampled range from the smallest value of %.3f, to the largest value of %.3f' % (np.min(myarray),np.max(myarray)))
    print('The mean of the samples is %.3f, Standard Deviation is %.3f, and Variance is %.3f' % (np.mean(myarray), np.std(myarray), np.var(myarray)))
    print('\n',myarray, '\n')
    #return myarray  # don't want to print it out
np.random.rand #<function RandomState.rand>
<function RandomState.rand>
arrayinfo(np.random.randn(10))  # creates a one dimensional array of 10 values
arrayinfo(np.random.randn(4,3))# a 4 by 3 array with 12 values samples from standard normal distribution
arrayinfo(np.random.randn(2,2,2)) # a 3D array containing 2*2*2 = 8 values
#arrayinfo(np.random.randn(2,2,2,2)) # a 4D array containing 2*2*2*2 = 16 values
A 1 dimensional array, with a total number of 10 float64 elements in  a(10,) shape 
The values sampled range from the smallest value of -1.496, to the largest value of 0.809
The mean of the samples is -0.113, Standard Deviation is 0.617, and Variance is 0.381

 [-0.1255  0.8092 -0.2683 ... -1.4961  0.2836 -0.1401] 

A 2 dimensional array, with a total number of 12 float64 elements in  a(4, 3) shape 
The values sampled range from the smallest value of -1.414, to the largest value of 1.485
The mean of the samples is 0.440, Standard Deviation is 0.774, and Variance is 0.599

 [[ 0.2002 -0.455   0.8126]
 [ 1.0789  0.5534 -1.4143]
 [ 0.5053 -0.1848  1.2723]
 [ 0.7508  1.4852  0.6738]] 

A 3 dimensional array, with a total number of 8 float64 elements in  a(2, 2, 2) shape 
The values sampled range from the smallest value of -1.233, to the largest value of 1.092
The mean of the samples is -0.276, Standard Deviation is 0.810, and Variance is 0.657

 [[[-0.3624 -1.2329]
  [ 1.092   0.5775]]

 [[-1.056  -0.6674]
  [-0.9753  0.4165]]] 

Random samples from the $N(\mu, \sigma^2)$ Distribution

You can get random samples from a normal distribution with a mean $\mu$ and variance $\sigma^2$ for a $N(\mu, \sigma^2)$ distribution by scaling the values from the numpy.random.randn function. Multiply the values by $\sigma$ and add $\mu$.

x = 2.5 * np.random.randn(2, 4) + 3 # # A 2 by 4 array of samples from N(3, 6.25) distribution:  
arrayinfo(x) # use function to see the array
A 2 dimensional array, with a total number of 8 float64 elements in  a(2, 4) shape 
The values sampled range from the smallest value of -1.413, to the largest value of 5.920
The mean of the samples is 3.431, Standard Deviation is 2.168, and Variance is 4.699

 [[ 3.3249  4.7656  3.728   5.9201]
 [ 5.6765  2.6773  2.7666 -1.4127]] 

The Python stdlib random module uses gauss() function to sample random floats from a Gaussian distribution. Unlike the numpy.random.randn() or numpy.numpy.random.standard_normal you do need to supply the parameters here for the mean and the standard deviation of the distribution from which the random values will be drawn. Here I use a loop to generate the sequence of random numbers, unlike numpy.random function which is a simple one liner.

import random # import random module from python standard library
myseq =[] # create an empty list
for i in range(10):  ## use a loop to generate a sequence of random numbers
    random_gauss = round(random.gauss(0,1),3) # mean is 0, standard deviation is 10
    myseq.append(random_gauss)
print(*myseq, sep = ", ") # print elements of the list without the brackets
-0.749, -0.871, 0.298, -0.17, -0.577, -0.433, -1.265, -0.437, 0.436, -0.748

Random samples from the Continuous Standard Uniform Distribution (floats)

The numpy.random.rand function generates floating point value(s) in an interval between a start and finish point where any single number is as likely to occur as any other number as there is uniform probability across the interval. In a uniform distribution, you would expect to see a rectangular shaped distribution as the sample size gets bigger compared to the bell shaped curve of the normal distribution above. The distribution is constant in the interval.

To use this function you can simply pass it some positive integer(s) to determine the dimensions of the array. Without passing any arguments a single float is returned (similarly to the random function in the stdlib random). numpy.random.rand returns an array of floating point values (or a single random value) from a uniform distribution in the half-open interval $[0,1.)$ where the half-open interval means that the values returned can be any number from 0.0 up to 1.0 but not including 1.0.

There are four other functions that also return random floats from the continuous uniform distribution in the half-open interval between 0.0 and 1.0. The inputs and outputs to these four numpy.random functions(random_sample(), random(),ranf() and sample()) appear to be the same for this interval and the examples in the numpy documentation actually use the numpy.random.random_sample function in all cases! The only difference I noticed between the rand function and these four other functions is that rand takes integers directly as the arguments while the other 4 functions take a tuple when using multiple arguments.

The following are some of the properties of a continuous uniform distribution which are covered in more detail in Task 3. The continuous uniform distribution takes values in the specified range (a,b) and the standard uniform distribution is where a is 0 and b is 1. The expected value of the uniform distribution is the midpoint of the interval $\frac{b-a}{2}$. The variance $\sigma^2$ is $\frac{1}{12}(b-a)^2$ which is $\frac{1}{12}$ and the standard deviation (square root of the variance) is $\sigma = \frac{(b-a)^2}{12}$

The expected value of the continuous uniform distribution in the interval between 0 and 1 is 0.5, the variance of the continuous uniform distribution in the interval between 0 and 1 is $\frac{1}{12}$ and the standard deviation is $\frac{1}{sqrt(12)}$

The 4 numpy.random functions, random_sample(), random(),ranf() and sample(), functions sample from the continuous uniform distribution over the stated interval of $[0.0,1.0)$ but they can all also be used to sample from a continuous uniform distribution $Unif[a,b), b>a$ by multiplying the output by $(b-a)$ and adding $a$ to get $(b - a)$ * random_sample() + $a$.

Using the numpy.random rand, random_sample,ranf, random and sample functions.

Using these 5 functions to return samples from the continuous uniform distribution between 0.0 and 1.0 can show the similarities in the outputs. When the functions are given the same size and shape arguments, and using the random seed function (as explained in Task 4) the exact same output is produced. They actually appear to be the exact same function which I show below.

np.random.rand # <function RandomState.rand>
<function RandomState.rand>
np.random.rand()  ## without any arguments, a scalar is returned
np.random.rand(100)  # a 1-d array with 100 elements from uniform distribution
np.random.rand(3,2) ##a 2-d array with 6 elements from uniform distribution
np.random.rand(4,1,2)  # a 3 dimensional array, with a total number of 8 float64 elements in  a(4, 1, 2) shape
np.random.rand(1,2,3,4) # a 4 dimensional array with 24 elements from the uniform distribution
array([[[[0.3808, 0.4352, 0.7458, 0.1992],
         [0.4684, 0.7517, 0.2548, 0.4227],
         [0.1476, 0.2386, 0.1859, 0.8024]],

        [[0.4614, 0.9924, 0.6736, 0.597 ],
         [0.366 , 0.9123, 0.7549, 0.5501],
         [0.8801, 0.8776, 0.4383, 0.7547]]]])
np.random.ranf  # <function RandomState.random_sample>
np.random.random_sample # <function RandomState.random_sample>
np.random.sample # <function RandomState.random_sample>
np.random.random # <function RandomState.random_sample>
<function RandomState.random_sample>

Randomly sampling 10 floats from the continuous uniform distribution in the $[0.0,1.0)$ interval. In each case a one-dimensional array of 10 floats is returned. (Using numpy.random.seed will generate the same random samples each time.)

np.random.seed(1)
np.random.random_sample(10) # numpy.random_sample to return 10 random floats in the half-open interval `[0.0, 1.0)`.s
array([0.417 , 0.7203, 0.0001, ..., 0.3456, 0.3968, 0.5388])
np.random.seed(1)
np.random.random(10) # numpy.random.random to return 10 random floats in the half-open interval `[0.0, 1.0)`.
array([0.417 , 0.7203, 0.0001, ..., 0.3456, 0.3968, 0.5388])
np.random.seed(1)
np.random.ranf(10) # numpy.random.ranf to return 10 random floats in the half-open interval `[0.0, 1.0)`.
array([0.417 , 0.7203, 0.0001, ..., 0.3456, 0.3968, 0.5388])
np.random.seed(1)
np.random.sample(10) # numpy.random.sample to return 10 random floats in the half-open interval `[0.0, 1.0)`.
array([0.417 , 0.7203, 0.0001, ..., 0.3456, 0.3968, 0.5388])

The number of samples returned is the product of the integer arguments and the number of arguments determine the dimensions of the array.

Random samples from the continuous uniform distribution in $[0.0,1.0)$ interval and some statistics.

I can use my arrayinfo function to print the attributes of the arrays of samples rather than printing all the samples returned. The samples and the statistics for each set of samples returned from each function are the same using the random seed function. Without setting a random seed, the actual random samples generated would be different as a different sample is selected each time, but their means, variances and standard deviations should be very similar as will be outlined in the Distributions section below for task 3.

The expected value of the uniform distribution is $\frac{b-a}{2}$, the midpoint of the interval so the expected value of the continuous uniform distribution in the interval between 0 and 1 is $0.5$ while the variance is $\frac{1}{12}$ and the standard deviation is $\frac{1}{sqrt(12)}$. For smaller samples the statistics may not be the exact 0.5, 1/12 and 1/sqrt(12) but as the sample size gets bigger then this is what you would expect from a continuous uniform distribution.

a,b =0,1
 
print("The expected value is",(a+b)/2) # this is using the formula for the midpoint of a uniform interval
print('The variance is ',((b-a)**2)/12) # the formula for variance of a uniform interval
print("The standard deviation is",(b-a)/(np.sqrt(12))) #formuls for standard deviation of a uniform interval
The expected value is 0.5
The variance is  0.08333333333333333
The standard deviation is 0.2886751345948129
#np.random.seed(1)
np.random.rand
np.random.rand(1000,5)
np.random.seed(1)
np.random.sample((1000,5))
np.random.seed(1)
np.random.random_sample((1000,5))
np.random.seed(1)
arrayinfo(np.random.ranf((1000,5)))
np.random.seed(1)
arrayinfo(np.random.sample((1000,5)))

A 2 dimensional array, with a total number of 5000 float64 elements in  a(1000, 5) shape 
The values sampled range from the smallest value of 0.000, to the largest value of 1.000
The mean of the samples is 0.500, Standard Deviation is 0.288, and Variance is 0.083

 [[0.417  0.7203 0.0001 0.3023 0.1468]
 [0.0923 0.1863 0.3456 0.3968 0.5388]
 [0.4192 0.6852 0.2045 0.8781 0.0274]
 ...
 [0.8602 0.773  0.776  0.5287 0.4502]
 [0.2266 0.8087 0.9659 0.5435 0.9022]
 [0.2824 0.5178 0.3282 0.9918 0.2503]] 

A 2 dimensional array, with a total number of 5000 float64 elements in  a(1000, 5) shape 
The values sampled range from the smallest value of 0.000, to the largest value of 1.000
The mean of the samples is 0.500, Standard Deviation is 0.288, and Variance is 0.083

 [[0.417  0.7203 0.0001 0.3023 0.1468]
 [0.0923 0.1863 0.3456 0.3968 0.5388]
 [0.4192 0.6852 0.2045 0.8781 0.0274]
 ...
 [0.8602 0.773  0.776  0.5287 0.4502]
 [0.2266 0.8087 0.9659 0.5435 0.9022]
 [0.2824 0.5178 0.3282 0.9918 0.2503]] 

Random samples from the Continuous Uniform Distribution (floats)

The 4 functions just mentioned above can also be used to sample from a continuous uniform distribution over a different interval other than the standard uniform $[0.0,1.0)$ interval. To sample from $Unif[a, b), b > a$ you simply multiply the output of random_sample function by $(b-a)$ and add a:
(b - a) * random_sample() + a For example, to get a 3 by 2 array of random numbers from the uniform distribution over the interval $[-5, 0)$ where a is -5 and b is 0, use 5 * np.random.random_sample((3, 2)) - 5

x = 4 * np.random.random_sample((100, 4)) - 4 # returns a 100-by-4 array of random numbers from [-4, 0). 
arrayinfo(x) # using function defined earlier
a,b =-4,0 # setting the lower and upper bounds of the interval
print("b-a =",b-a,", a = ",a, "so the expected value is the midpoint of a and b which is",(a+b)/2)
print('The expected value is the midpoint of a and b which is',(a+b)/2, 'while the mean of the sample is:',np.mean(x))
print('The variance using  the formula is:', ((b-a)**2)/12, 'while the variance of the actual sample is %.3f' %np.var(x))
print('The standard deviation is %.3f' %((b-a)/(np.sqrt(12))))
A 2 dimensional array, with a total number of 400 float64 elements in  a(100, 4) shape 
The values sampled range from the smallest value of -3.998, to the largest value of -0.007
The mean of the samples is -2.000, Standard Deviation is 1.150, and Variance is 1.323

 [[-1.285  -0.9432 -3.8711 -0.0709]
 [-3.6347 -1.5437 -3.4026 -1.9168]
 [-3.6266 -0.8752 -0.9083 -1.6074]
 ...
 [-0.8655 -3.6473 -2.5772 -2.8012]
 [-1.724  -1.9594 -0.8715 -3.2518]
 [-1.694  -1.4225 -2.1605 -1.7681]] 

b-a = 4 , a =  -4 so the expected value is the midpoint of a and b which is -2.0
The expected value is the midpoint of a and b which is -2.0 while the mean of the sample is: -1.9998929565292054
The variance using  the formula is: 1.3333333333333333 while the variance of the actual sample is 1.323
The standard deviation is 1.155

The four functions np.random.random_sample, np.random.random, np.random.ranf and np.random.sample all appear to again return the exact same results as each other when using the same size arguments and the same intervals. This is shown by using the numpy.random.seed function to set the seed to get the same pseudo random numbers generated. See Section 4.

Using numpy.random random_sample and random functions to sample from $Unif[-4,0)$

a,b = -4,0 # setting the lower and upper bounds of the interval
print("b-a =",b-a, ", a =",a)
np.random.seed(1)
# can also use `arrayinfo` function to print out
(b-a) * np.random.random_sample((10, 3)) +a # returns a 10-by-3 array of random numbers from [-4, 0).
np.random.seed(1)
arrayinfo((b-a) * np.random.random((10, 3)) +a )# returns a 10-by-3 array of random numbers from [-4, 0). 
b-a = 4 , a = -4
A 2 dimensional array, with a total number of 30 float64 elements in  a(10, 3) shape 
The values sampled range from the smallest value of -4.000, to the largest value of -0.127
The mean of the samples is -2.258, Standard Deviation is 1.203, and Variance is 1.448

 [[-2.3319 -1.1187 -3.9995]
 [-2.7907 -3.413  -3.6306]
 [-3.255  -2.6178 -2.4129]
 ...
 [-0.127  -2.7463 -1.2307]
 [-0.4944 -0.4216 -3.6598]
 [-3.8438 -3.3207 -0.4874]] 

Using numpy.random random_sample and random functions with same parameters

a,b = 2,8 # setting the bounds of the interval
print("b-a =",b-a, ", a =",a)
np.random.seed(1) # setting the seed
(b-a) * np.random.ranf((10, 3)) +a # returns a 3-by-2 array of random numbers from [-4, 0). 
# can print info using the `arrayinfo` function defined earlier
np.random.seed(1)
arrayinfo((b-a) * np.random.sample((10, 3))+a) # returns a 3-by-2 array of random numbers from [-4, 0).

b-a = 6 , a = 2
A 2 dimensional array, with a total number of 30 float64 elements in  a(10, 3) shape 
The values sampled range from the smallest value of 2.001, to the largest value of 7.810
The mean of the samples is 4.613, Standard Deviation is 1.805, and Variance is 3.258

 [[4.5021 6.3219 2.0007]
 [3.814  2.8805 2.554 ]
 [3.1176 4.0734 4.3806]
 ...
 [7.8096 3.8805 6.1539]
 [7.2583 7.3676 2.5103]
 [2.2343 3.019  7.2689]] 
np.random.ranf
<function RandomState.random_sample>
np.random.ranf == np.random.random_sample ==np.random.sample ==np.random.random

True

The four functions ranf, random_sample,sample and random all appear to be the exact same asnp.random_sample!

Random samples from the Discrete Uniform Distribution (integers)

The numpy.random.randint(low, high=None, size=None, dtype='l') function returns random integers from the discrete uniform distribution in the interval from low (inclusive) to high (exclusive). ($[low, high)$).

The discrete uniform distribution is a very simple probability distribution that can only take on a finite set of possible values. numpy.random.randint()is used to generate an array of random integers from an interval where all the integers in the interval have uniform probability.

There are four possible parameters you can give to this function but at least one is required. Size, for the number of random numbers to sample is the key word argument and can be a single integer or a tuple for higher dimensional arrays.

If only one parameter is entered, this is taken to be the upper end of the range (exclusive) and a single random integer is returned in the range from 0 up to the number supplied so np.random.randint(10) samples a single integer from 0 up to an including 9. When a second value is supplied to the function, the first is treated as the low range parameter and the second is treated as the high range parameter so np.random.randint(5,10) would output a single integer between 5 inclusive and 9. These two arguments together specify the range of values that could possibly be drawn.

You can set the dtype to specify the dtype of integer output but it’s not usually required as the default value is np.int.

The numpy.random.random_integers function is similar to numpy.random.randint but it is now deprecated. It used a closed interval [low, high] with 1 being the lowest value if high is omitted and was used to generate uniformly distributed discrete non-integers. The manual advises to use numpy.random.randint instead but by adding 1 to high to get a closed interval.

np.random.random_integers(5) results in the following message:

DeprecationWarning: This function is deprecated. Please call randint(1, 5 + 1) instead

np.random.randint(5,10) # a single integer value in the interval [5,10)
np.random.randint(20, size=(2,4)) # a 2-dimensional array with 4 times 2 = 8 values from range of 0 to 19
np.random.randint(5,10,3) # a 1-d array with 3 values from the range [5,10)
np.random.randint(5,10,(4,3)) # a 2-d array  with 4 by 3 values in range [5,10)
array([[5, 9, 6],
       [7, 7, 6],
       [5, 6, 8],
       [9, 8, 6]])
np.random.randint(0,27,(2,3,4)) # a 3-d array with 24 values from range [0,27)
np.random.randint(5, size=(2, 3, 4)) # Generate a 3 dimensional array of  24 integers
array([[[0, 4, 2, 4],
        [3, 3, 0, 3],
        [4, 3, 4, 4]],

       [[4, 1, 0, 4],
        [2, 0, 2, 4],
        [1, 1, 0, 2]]])
np.random.random_integers #<function RandomState.random_integers>
np.random.randint #<function RandomState.randint>
<function RandomState.randint>

Randomly sampling from a given array of elements.

The numpy.random.choice(a, size=None, replace=True, p=None) function random samplea from a given 1-D array. It differs to the other numpy.random simple random data functions mentioned above in that instead of randomly sampling from a uniform or normal distribution, it randomly samples from an array that you provide to it. It can also be somewhat similar to the numpy.random.permutation function depending on its use. This function therefore requires an array from which to sample which you can provide or it can be created by the function using numpy’s arange function if you simply provide an integer. The number of elements sampled depends on the size you specify, otherwise just a single value is returned.

There are various methods of sampling random elements from an array in general. With replacement means the elements drawn would be placed back in and could be selected again whereas without replacement means that once a element is chosen, it can no longer be chosen again in the same sample as it is not placed back in the pot. In this way there can be no duplicates as elements cannot be selected again in the same sample.

The numpy.random.choice function defaults to using the with replacement sampling method but you can specify that the random sample is without replacement so that the same element cannot be chosen again.

You can also provide the probabilities for each element of the input array but otherwise a uniform distribution over all the elements is assumed where each elements is equally likely to be chosen. By providing the p probabilites, a non-uniform random sample will be returned. The number of probabilities should always equal the size parameter and the total probabilities must sum to 1.0.

The numpy.random.choice function can also be used on an array of elements other than integers. It is somewhat similar to the choice() function in the python random standard library which randomly select items from a list.

np.random.choice #<function RandomState.choice>
<function RandomState.choice>

Using numpy.random.choice function.

Randomly select from an array of integers created by the function itself

np.random.choice(10) # select a single element from an array created from np.arange(10)
np.random.choice(5, 3)  # select 3 elements from an array of integers from 0 up to but not including 5
np.random.choice(100, 12) #  select 12 elements from an array from 0 to 99 inclusive.
array([94, 60, 24, ..., 54, 96, 82])

With replacement vs without replacement.

By setting replace= False means an element cannot be chosen twice.

Here I generate 4 samples from an array using with replacement being true. I use a for loop here in case the concept is not evident in the first sample. When setting replace = False the same element is never repeated in the output array. Note that using without replacement means you cannot sample more elements than the number of elements in the array you are sampling.

print("With replacement")
for i in range(4):
    x = np.random.choice(10, 6) # sample 6 elements from the array
    print(x)
    
print("\n Without replacement")
# Then set replace = False and observing that the same element is never repeated in the same array. 
for i in range(4):
    x = np.random.choice(10, 6, replace = False) # sample 6 elements from the array 
    print(x)
With replacement
[6 6 2 7 7 0]
[6 5 1 4 6 0]
[6 5 1 2 1 5]
[4 0 7 8 9 5]

 Without replacement
[6 8 4 2 5 9]
[2 9 1 7 8 3]
[2 3 6 1 0 9]
[3 8 2 4 6 0]

Sampling from arrays containing other data types

The numpy.random.choice function can also be used to sample from array of other types of elements. To show this I will create an array with a list of letters from which to randomly sample elements.

import string  # import string module from standard library
letters_lower =np.array(list(string.ascii_lowercase)) # create an array of lowercase letters
np.random.choice(letters_lower, 4) # randomly select 4 elements with equal probability
array(['d', 'u', 'c', 'o'], dtype='<U1')

sampling from an array of letters with defined probabilities with replacement

np.random.choice(['a','b','c','d','e','f'], 5, p=[0.2, 0.15, 0.15, 0.3, 0.1, 0.1]) # selecting from an array of non-integers
array(['d', 'a', 'e', 'b', 'e'], dtype='<U1')

sampling from an array of letters with defined probabilities and without replacement

np.random.choice(['a','b','c','d','e','f'], 5, p=[0.2, 0.15, 0.15, 0.3, 0.1, 0.1], replace = False)
array(['c', 'e', 'd', 'a', 'b'], dtype='<U1')

Random Bytes

The numpy.random.bytes(length) function returns a string of random bytes. The length of the string to be returned must be provided to the numpy.random.bytes function.

This function returns a Bytes objects which according to the python docs are are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways.

A bytes object is displayed as a sequence of bytes between quotes and preceded by ‘b’ or ‘B’.

Bytes literals produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

One byte is a memory location with a size of 8 bits. A bytes object is an immutable sequence of bytes, conceptually similar to a string.

The bytes object is important because data written to disk is written as a stream of bytes, and because integers and strings are sequences of bytes. How the sequence of bytes is interpreted or displayed makes it an integer or a string. https://en.wikiversity.org/wiki/Python_Concepts/Bytes_objects_and_Bytearrays

np.random.bytes #<function RandomState.bytes>
<function RandomState.bytes>
print(np.random.bytes(3)) # a bytes string of length 3
print(np.random.bytes(2)) # a bytes string of length 2

b'J]\x10'
b'\x88\xb4'

Permutations functions

There are two permutations functions in the numpy.random package.

  • numpy.random.shuffle(x) is used to modify a sequence in-place by shuffling its contents.
  • numpy.random.permutation(x) is used to randomly permute a sequence, or return a permuted range.

These functions can be used to arrange or rearrange series of numbers or to change the order of samples in a sequence. Permutation and shuffling functions are used in machine learning to separate a dataset into training and test sets. Numpy random permutation functions can be used to shuffled multi-dimensional arrays. When shuffling rows of data it is usually important to keep the contents of the row together while the order of the rows will be moved about.

The numpy shuffle function shuffles a sequence in place so that the sequence provided to it is actually shuffled and not just a copy of the sequence.

numpy.random.shuffle

The numpy.random.shuffle(x) function is used to modify a sequence in-place by shuffling its contents. The contents of the numpy array is shuffled so that the contents are still the same but in a different order. The sequence is therefore changed with it’s elements in a random order.

If the array is multidimensional then this function will only shuffle the array along the first axis. The order of sub-arrays is changed but their contents remains the same. This basically swaps the order of the rows in the array, but not the order of the elements in each row.

numpy.random.permutation

This permutation function randomly permutes a sequence, or returns a permuted range. numpy.random.permutation takes either an array or an integer with which to create an array using arange. If it is given an array it makes a copy of this array and then shuffles the elements in the copy randomly. If it is only given an integer, it creates an array and randomly permutes the array.

If x is a multi-dimensional array provided to the function, it will only be shuffled along its first index.

The difference between the numpy.random.shuffle and the numpy.random.permutation seems to be that the original array is shuffled in the first case with the shuffle function but only the copy is shuffled in the second case with permutation. (This is where an array is provided and not where the function also has to create an array from np.arange).

The numpy.random.shuffle function only shuffles the array along the first axis of a multi-dimensional array. The order of sub-arrays is changed but their contents remains the same. Basically the rows have been moved about but not the elements in the rows.

Here I will create some multi-dimensional arrays and use these two functions and show the differences.

np.random.shuffle #<function RandomState.shuffle>
<function RandomState.shuffle>
np.random.permutation #<function RandomState.permutation>
<function RandomState.permutation>
array1 = np.arange(2,101,2).reshape(5,10)   ## create a 5 by 10 two-dimensional array
array2 = np.arange(0,150,5).reshape((6, 5))  ## create a 6 by 5 two-dimensional array

Comparing the numpy.random.shuffle and numpy.random.permutation function


print(f"The original array before being shuffled \n{array1} \n ") # the original array before being permuted
print("Shuffling the original array changes the original array \n")

print(f"The original array after being shuffled \n{array1} \n ")
The original array before being shuffled 
[[  2   4   6 ...  16  18  20]
 [ 22  24  26 ...  36  38  40]
 [ 42  44  46 ...  56  58  60]
 [ 62  64  66 ...  76  78  80]
 [ 82  84  86 ...  96  98 100]] 
 
Shuffling the original array changes the original array 

The original array after being shuffled 
[[  2   4   6 ...  16  18  20]
 [ 22  24  26 ...  36  38  40]
 [ 42  44  46 ...  56  58  60]
 [ 62  64  66 ...  76  78  80]
 [ 82  84  86 ...  96  98 100]] 
print(f"The original array before being permuted \n{array2} \n ")
print("Permuting the original array makes a copy to shuffle\n")
print(np.random.permutation(array2)) # provide an array
print(f"\n The original array after being permuted has not changed\n{array2} \n ")
The original array before being permuted 
[[  0   5  10  15  20]
 [ 25  30  35  40  45]
 [ 50  55  60  65  70]
 [ 75  80  85  90  95]
 [100 105 110 115 120]
 [125 130 135 140 145]] 
 
Permuting the original array makes a copy to shuffle

[[ 25  30  35  40  45]
 [ 50  55  60  65  70]
 [100 105 110 115 120]
 [  0   5  10  15  20]
 [ 75  80  85  90  95]
 [125 130 135 140 145]]

 The original array after being permuted has not changed
[[  0   5  10  15  20]
 [ 25  30  35  40  45]
 [ 50  55  60  65  70]
 [ 75  80  85  90  95]
 [100 105 110 115 120]
 [125 130 135 140 145]] 
Task2 screenshot

Tech used:
  • Python 3
  • Numpy
  • Matplotlib
  • seaborn