Arrays and numpy Answers

Arrays and `numpy` Answers#

Part 1#

import numpy as np

Creating and Inspecting arrays#

Convert the following list into a numpy array

chlor_a_list = [0.3, 1.2, 0.8, 0.8, 1.1, 0.2, 0.4]

chlor_a = np.array(chlor_a_list)

Get the following values from the chlor_a array you made in the last problem:

The first value
The last value

chlor_a[0]

0.3

chlor_a[6]
# or
chlor_a[-1]

0.4

What is the data type of the chlor_a array you made?

# float (specifically a 64-bit float)
chlor_a.dtype

dtype('float64')

Use code to figure out how many items are in your array

chlor_a.shape

(7,)

Multiple dimensions#

What is the shape of the following array? Use the shape to determine how many elements are in the array

population_sparrows = np.array([[43, 24, 53, 24], [21, 32, 42, 32], [76, 23, 14, 12]])

population_sparrows.shape

(3, 4)

# Total elements is the size of each axis multiplied together
3*4

Use the len() function and the .shape property to calculate the number of dimensions of the population_sparrows array

len(population_sparrows.shape)

Return the same result as problem 6 in a different way, using the .ndim property

population_sparrows.ndim

Get the value for the item in the last row and the last column of the population_sparrows array

population_sparrows[-1, -1]

Get a 4-number array that is a subset of number from the population_sparrows array using the slice operator :

population_sparrows[0:2, 1:3]

array([[24, 53],
       [32, 42]])

Math and aggregations#

For the next few problems consider that the population_sparrows array represents the populations of sparrows at 12 different reserach locations.

Let’s say our sparrow population grew, and the population of every location doubled. Multiple all the values in the array by 2. Make sure array is updated with the new values.

population_sparrows = population_sparrows*2

Later in the season a group of biologists adds a few sparrows to each population. The number of sparrows they added to each location is represented by the following array:

indiviuals_added = np.array([[4, 4, 5, 4], [1, 2, 2, 3], [6, 3, 4, 2]])

Calculate the updated population values with the new additions. Update the variable value.

population_sparrows = population_sparrows + indiviuals_added

Calculate the sum of the sparrows at all the locations

population_sparrows.sum()

Calculate the sum of sparrows over axis 1

population_sparrows.sum(axis=1)

array([305, 262, 265])

Consider the following array. If you ran an aggregation (Ex. .max()) and specified an axis, over what axis would you get 3 numbers as a result?

example_array = np.array([[4, 4, 5, 4], [1, 2, 2, 3], [6, 3, 4, 2]])

# axis 1

Part 2#

Note that the exact answers to the practice problems may not be the same on my sheet as on yours, because the random function used to generate the values won’t create the same array on your computer as on mine. Check that the code does the same thing, or run my code in your notebook to check the answer

Question 1#

reflectances = np.random.randint(0, high=100, size=(5, 30, 40))

A) Get a subset of at least 15 values from the middle of the reflectances array.

reflectances[2:5, 10:20, 20]

array([[54, 88, 29, 97, 51,  0, 95, 20,  8, 13],
       [31, 51, 97, 68, 40, 91, 40, 48, 65, 88],
       [ 9,  8, 16, 76, 41, 44, 83, 24, 23, 18]])

B) Write a chunk of code to get the value in the center of the array. Make sure the code works for arrays of different sizes, so calculate the center index values using the properties of the array.

# Find the center index for each axis
axis0_indx = reflectances.shape[0] // 2 # floor division to ensure a whole number
axis1_indx = reflectances.shape[1] // 2 
axis2_indx = reflectances.shape[2] // 2
# Get the value
print(reflectances[axis0_indx, axis1_indx, axis2_indx])

Question 2#

A) Add 10 to every value in the reflectances array which has an index of 2 in the axis=0 position.

reflectances[2] + 10

array([[ 45, 109,  26, ...,  10,  97, 109],
       [ 85,  77,  65, ...,  71,  60,  84],
       [ 98,  48,  75, ..., 100,  77, 108],
       ...,
       [ 64,  81,  80, ..., 109,  41,  95],
       [ 96,  42,  26, ..., 105,  33,  90],
       [ 51,  52, 100, ...,  94,  53,  64]])

Question 3#

A) What will be the shapes of the following arrays? First take a guess, then run the code.

array1 = np.array([1, 2, 3])
array2 = np.array([[1], [2], [3]])

# array1 -> (3,)
# array2 -> (3, 1)

B) Starting with an array of all zeros, what how will the output look different adding together starting_array + array1 vs. starting_array + array2? Make your guess first, then run the code to compare to your expectation.

starting_array = np.zeros((3,3))

array1 = np.array([1, 2, 3])
array2 = np.array([[1], [2], [3]])

# Array is broadcast along axis 0
starting_array + array1

array([[1., 2., 3.],
       [1., 2., 3.],
       [1., 2., 3.]])

# Array is broadcast along axis 1
starting_array + array2

array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.]])

Question 4#

A) Going back to our reflectances array, find the mean and standard deviation of all the values in the array.

reflectances = np.random.randint(0, high=100, size=(5, 30, 40))

reflectances.mean()

49.9645

reflectances.std()

28.77261266812592

B) What is the maximum value of the 2D array at index 1 of axis 0?

example = np.array([[23, 43, 10], [3, 10, 8], [13, 16, 0]])

example[1].max()

Question 5#

A) The .astype() method is a method that changes the datatype of the values in the array. It takes one argument - the new data type (which you type in without quotations).

Use the .astype() method on the reflectances array to change the data type to float.

# Data types don't need quotations, they are one of the few words you can just type
int

int

reflectances.astype(float)

array([[[67., 73., 99., ..., 99., 13., 64.],
        [77., 55., 10., ..., 74., 14.,  7.],
        [12., 84., 55., ..., 92., 62., 93.],
        ...,
        [74., 20., 31., ..., 20., 91., 25.],
        [41., 94., 94., ..., 95., 43., 66.],
        [23.,  5., 13., ...,  8., 27., 41.]],

       [[94., 38., 42., ..., 55., 16., 24.],
        [53., 78., 18., ..., 81., 95., 74.],
        [30., 67., 37., ..., 25.,  9., 50.],
        ...,
        [36.,  1., 97., ..., 82., 11., 52.],
        [86., 61., 26., ..., 35., 67., 51.],
        [68., 23., 69., ..., 55.,  6., 54.]],

       [[29., 97., 37., ..., 98., 94., 97.],
        [27., 71.,  4., ..., 22., 30., 76.],
        [49., 12., 19., ..., 37., 79., 47.],
        ...,
        [96., 34., 51., ..., 64., 73., 16.],
        [69., 27., 13., ..., 37., 91., 76.],
        [62., 17., 93., ...,  1., 55., 40.]],

       [[32., 42., 18., ..., 42., 28., 32.],
        [77., 69., 21., ..., 54., 75., 25.],
        [ 6., 63., 54., ..., 30., 15., 69.],
        ...,
        [79., 45., 97., ...,  3., 26., 81.],
        [20., 62., 90., ..., 38., 48., 69.],
        [51., 70., 33., ..., 78., 54.,  7.]],

       [[77., 62., 15., ..., 27., 56., 41.],
        [42., 26., 31., ..., 31., 72., 76.],
        [51., 78., 77., ..., 71., 73., 39.],
        ...,
        [44., 27.,  6., ..., 80., 99., 23.],
        [85., 37., 60., ..., 21., 68., 15.],
        [63., 41.,  3., ..., 35., 31., 93.]]])

B) We have seen the use of axis as a kwarg in the .max() function. If you need to you can use multiple kwargs, which you seperate by commas.

The keepdims kwarg maintains the output value within the original axis they were calculated over. So if you took the sum of an array over axis 1, you would recieve the output array with a vertically stacked output. keepdims takes a boolean input - True if you would like to keep the dimensions and False if not.

Take the max of the reflectances array, using the axis kwarg with value 1 and the keepdims kwarg set to True. Try it again with the kwarg set to False. Take the shape of both outputs and notice how they change.

reflectances.max(axis=1, keepdims=True)

array([[[91, 99, 99, 98, 99, 93, 97, 99, 93, 91, 96, 97, 99, 99, 99, 94,
         99, 99, 99, 97, 94, 97, 99, 98, 94, 93, 96, 98, 94, 97, 97, 98,
         92, 99, 95, 95, 93, 99, 93, 97]],

       [[99, 96, 97, 99, 99, 99, 99, 97, 99, 93, 97, 99, 95, 99, 92, 99,
         99, 93, 98, 97, 96, 98, 99, 96, 94, 97, 99, 97, 98, 87, 99, 99,
         91, 94, 97, 99, 98, 99, 97, 97]],

       [[96, 99, 98, 99, 96, 95, 95, 96, 97, 97, 97, 99, 99, 87, 99, 97,
         94, 99, 94, 98, 97, 97, 96, 95, 99, 87, 95, 99, 93, 99, 96, 94,
         98, 98, 99, 99, 96, 99, 99, 97]],

       [[91, 89, 97, 94, 94, 96, 97, 95, 99, 98, 96, 99, 98, 98, 94, 99,
         98, 97, 97, 95, 91, 97, 91, 96, 99, 99, 99, 99, 95, 97, 94, 99,
         98, 97, 98, 95, 98, 96, 99, 96]],

       [[96, 97, 96, 98, 98, 97, 98, 97, 92, 88, 95, 96, 97, 99, 97, 96,
         97, 97, 93, 96, 97, 94, 93, 96, 96, 96, 99, 94, 97, 93, 96, 89,
         83, 98, 96, 97, 96, 94, 99, 97]]])

reflectances.max(axis=1, keepdims=True).shape

(5, 1, 40)

reflectances.max(axis=1, keepdims=False).shape

(5, 40)

Note: The “keepdims” concept is a bit conceptually abstract and I couldn’t find a good visual illustration for it right now. If you leave this problem feeling a little iffy on “keepdims” that is totally fine and normal. The more important concept to be comfortable with is using multiple kwargs in a function/method.

Question 6#

example = np.array([[6, 10, 5, 9], [6, 9, 9, 11], [12, 14, 6, 3]])

A) Get a list of unique items in the example array.

np.unique(example)

array([ 3,  5,  6,  9, 10, 11, 12, 14])

Google help: “numpy unique values in an array”, or this stackoverflow.

B) Pick one of the values in the array and determine how many times that value occurs in the array.

np.unique(example, return_counts=True)
# Then read the output to see that (for example) the value 6 occured 3 times

(array([ 3,  5,  6,  9, 10, 11, 12, 14]), array([1, 1, 3, 3, 1, 1, 1, 1]))

# OR

np.count_nonzero(example == 6)

Google help: “numpy number of occurances of a value” or this stackoverflow.

Question 7#

NaNs are an important data point when working with real data - rarely do you have a totally complete dataset.

You can make individual nans with np.NaN:

np.NaN

nan

Look at the docs for the function np.full() and create a new array of shape (4, 5, 6) filled with nan values.

np.full((4,5,6), np.NaN)

array([[[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]],

       [[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]],

       [[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]],

       [[nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan],
        [nan, nan, nan, nan, nan, nan]]])