Task 4. Explain the use of seeds in generating pseudorandom numbers.

4. Explain the use of seeds in generating pseudorandom numbers.

Pseudo-random number generators (PRNG’s) and seeds

Computer programs produce outputs based on inputs and according to a set of predetermined rules. Pseudorandom numbers are generated in a sequence according to some deterministic algorithm from an input called a seed which is a number that is used to initialise the pseudorandom number generator. The seed used is typically the time in milliseconds on the computer when the code was run and is used as the starting point of the process.

According to statisticshowto, a random seed specifies the start point when a computer generates a random number. The random seed can be any number but it usually comes from seconds on the computer system’s clock which counts in seconds from January 1, 1970. (known as Unix time). This ensures that the same random sequence won’t be repeated unless you actually want it to.

Pseudo random number generators can be seeded which makes them deterministic. To recreate the exact same sequence of random numbers, then you can just explicitly supply the seed as an input to the random number generator. This means that while the numbers generated look random they are not truly random but pseudorandom. These pseudorandom numbers contain no real randomness at all - randomness is just being imitated - but they can take the role of random numbers for certain applications.

If you reinitialise a random number generator with the same seed, the default seed is ignored and the same sequence of pseudorandom numbers will be produced.

This all means that the outputs can be repeated by following the same set of steps and given the same inputs. Therefore if you know this seed then you can predict the next number to be generated in a sequence. Pseudorandom number generators should therefore not be used for cryptographic purposes because their predictability could be used to break the encryption but there are times when the exact same sequences of numbers may be required.

Psuedorandom numbers are suitable for purposes such as simulating datasets, testing machine learning algorithms etc as well as ensuring reproducibility of code for code sharing, teaching and demonstration purposes. Computer random number generation algorithms are based on patterns which generate numbers that follow particular probability distributions. Setting a seed will produce the same sequence of random numbers each time.

According to pynative.com

generally the seed value is the previous number generated by the generator. However, When the first time you use the random generator, there is no previous value. So by-default current system time is used as a seed value.

There are many different ways to generate pseudo-random numbers. Python uses the Mersenne Twister as the core generator. According to wikipedia, the Mersenne Twister is a pseudorandom number generator (PRNG) and is by far the most widely used general-purpose PRNG whose name derives from the fact that its period length is chosen to be a Mersenne prime (a prime number that is one less than a power of two. That is, it is a prime number of the form Mn = 2n − 1 for some integer n.).

Khan Academy has a video that gives a very good overview of how seeds are used in random number generators at Khan academy random vs pseudorandom number generators from which the following notes are based.

The physical world contans many truly random fluctuations everywhere. Truly random numbers could be generated by measuring or sampling this noise such as the electric current of tv static over time. Such random sequences could be visualised using a random walk where a path is drawn that changes direction according to each number. Random walks have no pattern at all as the next point is always unpredictable. Random processes are nondeterministic since they are impossible to determine in advance whereas machines are deterministic because their operation is predictable and repeatable. In 1946 while involved in running computations for the military, John Neumann required quick access to randomly generated numbers that could be repeated if necessary but as the computers of the time had very limited memory it could not store long random sequences. Nuemann therefore developed an algorithm to mechanically simulate the scrambling aspect of randomness as follows.

First a truly random numbers called the seed (which could come from measurement of noise or the current time in milliseconds) is selected which is then provided as input to a very simple calculation where the seed is multiplied by itself, the middle of this output becomes the seed for the next step and the process repeated as many times as required. This was called the middle-squares method and was the first pseudorandom number generator. The randomness of the sequence depends only on the randomness of the initial seed and the same seed will generate the same sequence.

The difference between a random generated versus a pseudorandomly generated sequence is that eventually the pseudorandom sequence will repeat when the algorithm reaches a seed it has previously used. There are many sequences that cannot occur in a pseudorandom sequence. The length before the pseudorandom sequence repeats is called the period and the period is strictly limited by the length of the initial seed. The longer the length of the initial seed the longer the period, so a 4 digit seed will produce a longer period of unrepeating sequences than a 3 digit seed which is will produce a longer period than a two digit seed etc.

For a pseudorandom sequence to be indistinguishable from a randomly generated sequence, it must be impractical for a computer to try all seeds and look for a match. There is an important distinction in computer science between what is possible versus what is possible in a reasonable amount of time. With pseudorandom generators the security increases as the length of the seed increases. If the most powerful computer would take hundreds of years to run through all seeds then we can safely assume its practically secure instead of perfectly secure. As computers get faster the seed size must increase accordingly.

Instead of having to share the entire random sequence in advance, you can share the relatively short random seed and expand it into the same random looking sequence when needed.

In summary, pseudorandom number generators (prngs) are algorithms for generating random looking numbers drawn from a probability distribution where the numbers are generated according to some deterministic algorithm from the input seed. The seed is the starting point for the algorithm. Different numbers used as the seed will produce a different set of pseudo random numbers from the same algorithm as they have a different starting point. If the starting point is the same and the steps in the algorithm are the same, then the outputs will be the same.

Using seeds in numpy.random

Explain the use of seeds in generating pseudorandom numbers

According to the numpy.random.seed documentation , the numpy.random.seed(seed=None) method is called when RandomState is initialized and can be called again to re-seed the generator.

In NumPy the Mersenne Twister is the basis for NumPy pseudo-random number generator. The numpy.random.RandomState class is the container for the Mersenne Twister pseudo-random number generator. RandomState exposes a number of methods for generating random numbers drawn from a variety of probability distributions. These include all the various functions mentioned above and other distribution functions not covered here. Throughout the document I have looked at various functions and each one shows <function RandomState.function-name> when simply called without the brackets.

The random seed used to initialize the pseudo-random number generator can be any integer between 0 and 2**32 - 1 inclusive or a sequence of such integers. The default is None in which case RandomState will try to read data from /dev/urandom (or the Windows analogue) if available or seed from the clock otherwise.
/dev/urandom are special files in Unix-like operating systems that serve as pseudorandom number generators which allow access to environmental noise collected from device drivers and other sources.

The call to numpy.random.seed method allows the seed to be set in order to create completely repeatable or reproducible results. This seed function works with the numpy.random methods listed above to create particular types of random numbers from various probability distributions. In addition to the distribution-specific arguments, the number of random numbers can be can be specified by providing a size argument.

If a numpy random function is used after providing a value to the numpy.random.seed function, then the very same set of random numbers can be generated again by calling the numpy.random.seed again with the same seed value.

The same code will produce the exact same output if the same seed value is used. You can also use the seed function when randomly sampling from an array or other sequence of elements.

Any NumPy random function that is executed with the same seed will produce the exact same result and this will ensure reproducibility.

numpy.random.get_state() return a tuple representing the internal state of the generator while numpy.random.set_state is used to set the internal state of the generator from a tuple.

Neither of these two functions are needed to work with any of the random distributions in NumPy and the reference manual almost advises against touching it!

If the internal state is manually altered, the user should know exactly what he/she is doing.

Using `numpy.random.seed`

I have used the random.seed method in this assignment for tasks 2 and 3 in order to generate the same sequence of random numbers when comparing the different functions.

When the random.seed method is used with the same seed then the same sequence of random numbers will be generated when this number is used to seed the generator again. It doesn’t really matter what number is used as the seed as long as the same number is used as a seed again when you want to generate the exact same sequence. If a different seed is used, then a different sequence of random numbers will be generated. The output of a random function will depend on the seed used and the algorithm the function uses. For other purposes such as for security then it might be appropriate to use a longer seed.
Without explicitly specifying a seed, a seed is generated from the /dev/urandom file or the system clock. If you don’t specify a seed it would be extremely difficult to reproduce the same sequence again. If the default seed is set from the computer systems internal clock to the millisecond, then the seed will change every moment that you use a numpy random function.

np.random.RandomState # mtrand.RandomState

np.random.seed # <function RandomState.seed>

np.random.get_state # <function RandomState.get_state>

np.random.set_state # <function RandomState.set_state>

Generate random numbers with and without seeding the generator

import numpy as np
print("\n without using a seed")
for i in range(3):
    x = np.random.rand(5) # generate a random float between 0 and 1 without using a seed
    print(x)
 
print("\n using seed to give the same random sequences")
for i in range(3):
    np.random.seed(10)
    x = np.random.rand(5) # generate another random float between 0 and 1  using a seed
    print(x)
    
print("\n without seed to give the same random sequences")
for i in range(3):
    x = np.random.randint(0,100,10) # generate another random float between 0 and 1  using a seed
    print(x)
    
print("\n using seed to give the same random sequences")
for i in range(3):
    np.random.seed(10)
    x = np.random.randint(0,100,10) # generate another random float between 0 and 1  using a seed
    print(x)

 without using a seed
[0.6442 0.4057 0.5327 0.5591 0.1471]
[0.7991 0.0264 0.5199 0.7715 0.8567]
[0.2632 0.8762 0.9744 0.5639 0.7641]

 using seed to give the same random sequences
[0.7713 0.0208 0.6336 0.7488 0.4985]
[0.7713 0.0208 0.6336 0.7488 0.4985]
[0.7713 0.0208 0.6336 0.7488 0.4985]

 without seed to give the same random sequences
[29  8 73 ... 11 54 88]
[62 33 72 ... 77 69 13]
[25 13 92 ... 12 65 31]

 using seed to give the same random sequences
[ 9 15 64 ...  8 73  0]
[ 9 15 64 ...  8 73  0]
[ 9 15 64 ...  8 73  0]

Random sampling from an array with and without seeding the generator.

Using loop to sample from an array of integers, first without setting the seed and then using the seed.


np.random.seed(10)
myarray = np.random.randint(0,100,10)  # create an array of ten random integers between 0 and 100 (exclusive)

print("\n sampling from an array without using a seed") 
for i in range(3):
    x = np.random.choice(myarray, 4) # sample 4 elements from the array 
    print(x)

print("\n Sampling from an array using a seed")
for i in range(3):
    np.random.seed(10)
    x = np.random.choice(myarray, 4) # sample 4 elements from the array 
    print(x)

 sampling from an array without using a seed
[73 29 89 28]
[ 9 89 29 73]
[15 73 89 15]

 Sampling from an array using a seed
[ 0 89  9 15]
[ 0 89  9 15]
[ 0 89  9 15]

np.random.seed(10)
myarray2 = np.random.randint(0,100,10)  # create an array of ten random integers between 0 and 100 (exclusive)

print("\n sampling from an array without using a seed") 
for i in range(3):
    x = np.random.choice(myarray, 4) # sample 4 elements from the array 
    print(x)

print("\n Sampling from an array using a seed")
for i in range(3):
    np.random.seed(10)
    x = np.random.choice(myarray, 4) # sample 4 elements from the array 
    print(x)

 sampling from an array without using a seed
[73 29 89 28]
[ 9 89 29 73]
[15 73 89 15]

 Sampling from an array using a seed
[ 0 89  9 15]
[ 0 89  9 15]
[ 0 89  9 15]

import string

print("\n sampling from an array without using a seed") 
for i in range(3):
    x = np.random.choice(np.array(list(string.ascii_lowercase)), 4) # sample 4 elements from the array 
    print(x)

print("\n Sampling from an array using a seed")
for i in range(3):
    np.random.seed(10)
    x = np.random.choice(np.array(list(string.ascii_lowercase)), 4) # sample 4 elements from the array 
    print(x)

 sampling from an array without using a seed
['z' 'q' 'r' 'i']
['j' 'a' 'k' 'i']
['w' 'e' 't' 'q']

 Sampling from an array using a seed
['j' 'e' 'p' 'a']
['j' 'e' 'p' 'a']
['j' 'e' 'p' 'a']

Tech used:

Python 3
Jupyter
Numpy
matplotlib
seaborn

Task4