Datetimes Answers#

Part 1#

Dates and Datetimes#

from datetime import date, datetime, timedelta
  1. Create a date() object representing your birthday. Assign it to a variable and use the variable to print out your birthyear.

my_birthday = date(1994, 4, 21)
print(my_birthday.year)
1994
  1. Explain why the following code returns an error:

date(2011, 2, 29)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 date(2011, 2, 29)

ValueError: day is out of range for month
# It returns an error because there weren't 29 days in February in 2011
  1. Create a datetime object for the 6:23pm on March 13th 1998

datetime(1998, 3, 13, 6, 23)
datetime.datetime(1998, 3, 13, 6, 23)
  1. Create a list with 3 datetime objects in it

sampling_times = [datetime(2011, 4, 10, 12), datetime(2011, 4, 11, 12), datetime(2011, 4, 12, 12)]
sampling_times
[datetime.datetime(2011, 4, 10, 12, 0),
 datetime.datetime(2011, 4, 11, 12, 0),
 datetime.datetime(2011, 4, 12, 12, 0)]

Creating Dates and Datetimes from Strings#

  1. Create a datetime object from the following string

date_string = '01-22-2009 21:00'
dt = datetime.strptime(date_string, '%m-%d-%Y %H:%M')
dt
datetime.datetime(2009, 1, 22, 21, 0)
  1. Create a datetime object from the following string

date_string = 'Jul 23 1998 8:02:00'

My mistake it turns out the windows machines don’t support the strftime format %-H because of the dash. I wrote below the correct syntax, but if you are using these notebooks on the SARP laptop they won’t run it. Luckily unpadded hours is really uncommon so this hopefully won’t be a problem for your datasets.

# This errors
dt = datetime.strptime(date_string, '%b %d %Y %-H:%M:%S')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-001c4c88cc57> in <module>
      1 # This errors
----> 2 dt = datetime.strptime(date_string, '%b %d %Y %-H:%M:%S')

~\.conda\envs\lessons\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
    566     """Return a class cls instance based on the input string and the
    567     format string."""
--> 568     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
    569     tzname, gmtoff = tt[-2:]
    570     args = tt[:6] + (fraction,)

~\.conda\envs\lessons\lib\_strptime.py in _strptime(data_string, format)
    339                     bad_directive = "%"
    340                 del err
--> 341                 raise ValueError("'%s' is a bad directive in format '%s'" %
    342                                     (bad_directive, format)) from None
    343             # IndexError only occurs when the format string is "%"

ValueError: '-' is a bad directive in format '%b %d %Y %-H:%M:%S'
  1. Create a datetime object from the following string

date_string = '12/1/72 01 52 12'

My mistake it turns out the windows machines don’t support the strftime format %-d because of the dash. I wrote below the correct syntax, but if you are using these notebooks on the SARP laptop they won’t run it. Luckily unpadded hours is really uncommon so this hopefully won’t be a problem for your datasets.

# This errors
dt = datetime.strptime(date_string, '%m/%-d/%y %H %M %S')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-8273d2976c5f> in <module>
----> 1 dt = datetime.strptime(date_string, '%m/%-*d/%y %H %M %S')

~\.conda\envs\lessons\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
    566     """Return a class cls instance based on the input string and the
    567     format string."""
--> 568     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
    569     tzname, gmtoff = tt[-2:]
    570     args = tt[:6] + (fraction,)

~\.conda\envs\lessons\lib\_strptime.py in _strptime(data_string, format)
    339                     bad_directive = "%"
    340                 del err
--> 341                 raise ValueError("'%s' is a bad directive in format '%s'" %
    342                                     (bad_directive, format)) from None
    343             # IndexError only occurs when the format string is "%"

ValueError: '-' is a bad directive in format '%m/%-*d/%y %H %M %S'
  1. Convert the date August 29th 2008 to the date in Julian days

starting_date = datetime(2008, 8, 29)
format_string = '%j'
starting_date.strftime(format_string)
'242'

Timedeltas#

  1. Calculate how many days it has been since your last birthday.

last_birthday = date(2021, 4, 21)
date.today() - last_birthday
datetime.timedelta(days=69)
  1. Calculate exactly how old you are, down to the hour, right now.

birth_moment = datetime(1994, 4, 21, 5, 33, 0)
datetime.now() - birth_moment
datetime.timedelta(days=9931, seconds=17267, microseconds=131821)

Part 2#

import pandas as pd
was_2020_filepath = "./data/SARP 2020 final.xlsx"
was_2020 = pd.read_excel(was_2020_filepath, "INPUT", skipfooter=7)
<ipython-input-36-f4ef3a08305f>:2: FutureWarning: Your version of xlrd is 1.2.0. In xlrd >= 2.0, only the xls format is supported. As a result, the openpyxl engine will be used if it is installed and the engine argument is not specified. Install openpyxl instead.
  was_2020 = pd.read_excel(was_2020_filepath, "INPUT", skipfooter=7)
was_2020
Can # Snake can # Location State Box Date Time Altitude (m) Temp. Celsius Wind Spd (mph) ... 3-Ethyltoluene (MS) 4-Ethyltoluene (MS) 2-Ethyltoluene (MS) 1,3,5-Trimethylbenzene (MS) 1,2,4-Trimethylbenzene (MS) alpha-Pinene (B/MS) beta-Pinene (B/MS) Unnamed: 113 Box.1 CH4 (ppmv height).1
0 5044 3701 Mars Hill/Lowell Observatory, Flagstaff, Arizona Arizona 37 2020-04-24 11:23:00 2200 18 5 ... 12 17 17 15 23 12 7 NaN 37 1.914
1 7338 3716 Buffalo Park, Flagstaff, Arizona Arizona 37 2020-04-24 11:44:00 2100 18 5 ... 12 16 15 14 22 31 9 NaN 37 1.916
2 6027 3717 Downtown Sedona, Arizona Arizona 37 2020-04-24 12:47:00 1350 27 5 ... -888 -888 -888 -888 6 11 -888 NaN 37 1.914
3 5012 3702 Buffalo Park, Flagstaff, Arizona Arizona 37 2020-05-01 11:16:00 2100 19 16 ... -888 -888 -888 -888 -888 22 -888 NaN 37 1.895
4 7330 3715 Mars Hill/Lowell Observatory, Flagstaff, Arizona Arizona 37 2020-05-01 11:49:00 2200 19 16 ... 6 3 -888 -888 8 23 -888 NaN 37 1.900
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
924 7267 2610 Oakwood Manor, Oshkosh, Wisconsin Wisconsin 26 2020-06-24 10:28:00 234.696 20 7 ... -888 -888 -888 -888 3 68 45 NaN 26 2.131
925 8257 2623 South Park, Oshkosh, Wisconsin Wisconsin 26 2020-06-24 10:42:00 231.64 21.67 6 ... -888 -888 -888 -888 4 33 18 NaN 26 2.074
926 5194 2608 Carl Traeger middle school, Oshkosh, Wisconsin Wisconsin 26 2020-07-01 13:09:00 243.84 28.89 9 ... 4 -888 -888 -888 5 7 -888 NaN 26 2.017
927 7101 2609 Oakwood Manor, Oshkosh, Wisconsin Wisconsin 26 2020-07-01 13:26:00 234.696 30.5 8 ... 3 -888 -888 -888 5 23 5 NaN 26 2.027
928 8190 2624 South Park, Oshkosh, Wisconsin Wisconsin 26 2020-07-01 13:50:00 231.64 30.5 9 ... 3 -888 -888 -888 5 11 3 NaN 26 1.983

929 rows × 116 columns

Question 1#

Using datetime object calculate how long the data record was_2020 is. In other words, how much time passed between the first and the last measurement in this sample list?

was_2020['Date'].max() - was_2020['Date'].min()
Timedelta('132 days 00:00:00')

Question 2#

Creating a datetime column from our was_2020 Dataframe

A) In the was_2020 dataset Date and Time are in two seperate columns. Combine the two dataframes into one and assign the output to a new variable called combined_datetime.

To do this you will need to:

  1. Convert each column to a string type

  2. Use concatenation to combine them

# Example of string concatenation
'hello ' + 'there'
'hello there'
combined_datetime = was_2020['Date'].astype(str) + was_2020['Time '].astype(str)
combined_datetime
0      2020-04-2411:23:00
1      2020-04-2411:44:00
2      2020-04-2412:47:00
3      2020-05-0111:16:00
4      2020-05-0111:49:00
              ...        
924    2020-06-2410:28:00
925    2020-06-2410:42:00
926    2020-07-0113:09:00
927    2020-07-0113:26:00
928    2020-07-0113:50:00
Length: 929, dtype: object

B) Now that you have a 'combined_datetime' variable, you can use the pandas function pd.to_datetime() to convert it from a string to a Series of datetime objects. Create a new column in your dataframe called 'datetime' for the new datetime objects.

was_2020['datetime'] = pd.to_datetime(combined_datetime, format='%Y-%m-%d%H:%M:%S')

Google suggestion “pandas to_datetime” or the examples in the docs

C) Delete the old 'Date' and 'Time' columns with the DATAFRAME.drop() method.

was_2020 = was_2020.drop(columns=['Date', 'Time '])

Google suggestion “pandas drop column, or the second (not the first) answer on this stackoverflow

Question 3#

Filtering our dataframe to include only the rows within 7 days of our target date

import numpy as np

A) Let’s say that we are interested in a phenomena that occurred on July 5th, 2020 so we want to narrow down our dataframe to inclue only the observations that occured within a week of the 5th.

Start by calculating the difference between each date in the ‘datetime’ column and July 5th, 2020. What type of object is returned in the result?

# Even though we don't care about the specific time we need to use a datetime object, not a date object, because the
# datetime column of the was_2020 dataframe is a datetime
target_date = datetime(2020, 7, 5)
was_2020['datetime'] - target_date
0     -72 days +11:23:00
1     -72 days +11:44:00
2     -72 days +12:47:00
3     -65 days +11:16:00
4     -65 days +11:49:00
             ...        
924   -11 days +10:28:00
925   -11 days +10:42:00
926    -4 days +13:09:00
927    -4 days +13:26:00
928    -4 days +13:50:00
Name: datetime, Length: 929, dtype: timedelta64[ns]

B) Use the calculation from part A and write a conditional statement checking if each of the rows occured within 7 days of the 5th. Don’t forget to include dates of samples both before and after the 5th.

abs(was_2020['datetime'] - target_date) <= timedelta(days=7)
0      False
1      False
2      False
3      False
4      False
       ...  
924    False
925    False
926     True
927     True
928     True
Name: datetime, Length: 929, dtype: bool

C) Use the boolean series from part B as a filter to output the was_2020 dataframe with only the rows within 7 days of July 5th, 2020.

# abs() or absolute value is what caputres days both before and after, since it gets positive and negative numbers
was_2020[abs(was_2020['datetime'] - target_date) <= timedelta(days=7)]
Can # Snake can # Location State Box Altitude (m) Temp. Celsius Wind Spd (mph) Wind direction (blowing from direction) Weather ... 4-Ethyltoluene (MS) 2-Ethyltoluene (MS) 1,3,5-Trimethylbenzene (MS) 1,2,4-Trimethylbenzene (MS) alpha-Pinene (B/MS) beta-Pinene (B/MS) Unnamed: 113 Box.1 CH4 (ppmv height).1 datetime
17 7336 3710 Mars Hill/Lowell Observatory, Flagstaff, Arizona Arizona 37 2200 27 10 SW Slightly Cloudy ... -888 -888 -888 4 126 12 NaN 37 1.882 2020-07-04 12:48:00
18 6035 3723 Downtown Sedona, Arizona Arizona 37 1350 36 7 SW Clear and sunny ... -888 -888 -888 9 44 10 NaN 37 1.882 2020-07-04 14:09:00
209 7993 810 Anaheim AQMD site Loara Elementary school, Ana... California 8 40 22 5 SSW sunny, but a bit hazy ... 3 -888 -888 6 4 -888 NaN 8 1.968 2020-07-02 12:06:00
210 8247 808 Imperial Park, Anaheim Hills, California California 8 137 23 5 WSW sunny, but a bit hazy ... 22 22 20 30 27 9 NaN 8 1.984 2020-07-02 12:35:00
211 9343 4110 Corona, California California 41 NaN 29 0 NaN Clear ... 4 3 -888 5 60 -888 NaN 41 1.979 2020-07-03 11:21:00
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
785 6157 3210 Spring Creek Park (Downwind), Tomball, Texas Texas 32 70 95 F 6 S Partly Cloudy ... 8 6 7 19 147 58 NaN 32 1.881 2020-07-08 15:00:00
786 5105 3223 Spring Creek Park (Downwind), Tomball, Texas Texas 32 70 94 F 5 SSW Partly Cloudy ... 6 6 6 24 61 41 NaN 32 1.878 2020-07-08 16:00:00
926 5194 2608 Carl Traeger middle school, Oshkosh, Wisconsin Wisconsin 26 243.84 28.89 9 ENE Clear skies and sunny ... -888 -888 -888 5 7 -888 NaN 26 2.017 2020-07-01 13:09:00
927 7101 2609 Oakwood Manor, Oshkosh, Wisconsin Wisconsin 26 234.696 30.5 8 ENE Clear skies and sunny ... -888 -888 -888 5 23 5 NaN 26 2.027 2020-07-01 13:26:00
928 8190 2624 South Park, Oshkosh, Wisconsin Wisconsin 26 231.64 30.5 9 ENE Clear skies and sunny ... -888 -888 -888 5 11 3 NaN 26 1.983 2020-07-01 13:50:00

66 rows × 115 columns

Question 4#

# Read in the data
water_vars = pd.read_csv('./data/englewood_3_12_21_usgs_water.tsv', sep='\t', skiprows=30)
# There are a lot of variables here, so let's shorten our dataframe to a few variables
water_vars = water_vars[['datetime', '210920_00060', '210922_00010', '210924_00300', '210925_00400']]
# Get rid of the first row of hard-coded datatype info
water_vars = water_vars.drop(0)
# Rename the columns from their USGS codes to more human-readible names
name_codes = {'210920_00060': 'discharge','210922_00010': 'temperature', '210924_00300': 'dissolved oxygen', '210925_00400': 'pH'}
water_vars = water_vars.rename(columns=name_codes)
# Convert columns with numbers to a numeric type
water_vars['discharge'] = pd.to_numeric(water_vars['discharge'])
water_vars['temperature'] = pd.to_numeric(water_vars['temperature'])
water_vars['dissolved oxygen'] = pd.to_numeric(water_vars['dissolved oxygen'])
water_vars['pH'] = pd.to_numeric(water_vars['pH'])
water_vars
datetime discharge temperature dissolved oxygen pH
1 2021-03-12 00:00 44.5 8.1 8.3 8.1
2 2021-03-12 00:15 44.5 8.1 8.2 8.1
3 2021-03-12 00:30 44.5 8.1 8.2 8.1
4 2021-03-12 00:45 44.5 8.1 8.1 8.1
5 2021-03-12 01:00 44.5 8.1 8.1 8.1
... ... ... ... ... ...
142 2021-03-13 11:15 42.6 6.7 9.8 7.9
143 2021-03-13 11:30 42.6 6.7 9.9 7.9
144 2021-03-13 11:45 42.6 6.7 10.2 7.9
145 2021-03-13 12:00 46.5 6.7 10.3 7.9
146 2021-03-13 12:15 NaN 6.6 10.3 7.9

146 rows × 5 columns

A) Convert the ‘datetime’ string column to a column of datetime objects using pd.to_datetime().

water_vars['datetime'] = pd.to_datetime(water_vars['datetime'], format='%Y-%m-%d %H:%M')

B) Set the new datetime column as the index of the dataframe.

water_vars = water_vars.set_index('datetime')

C) Use the new index to retrieve the value for '2021-03-12 13:30:00'

water_vars.loc['2021-03-12 13:30:00']
discharge           40.8
temperature          7.6
dissolved oxygen    11.9
pH                   8.1
Name: 2021-03-12 13:30:00, dtype: float64

D) One cool thing we can do when we have a datetime index is easily resample the data. Resampling is when we aggregate more finely resolved data to be more coarsely resolved. In this example we will be taking data that is reported every 15 minutes and resampling to an hourly resolution.

Use the DATAFRAME.resample() function to resample to hourly resolution using the mean value of the 15 minute intervals. Check out the docs page or the pandas datetime overview for examples.

water_vars.resample("H").mean()
discharge temperature dissolved oxygen pH
datetime
2021-03-12 00:00:00 44.500 8.100000 8.200 8.100
2021-03-12 01:00:00 44.500 8.050000 8.050 8.100
2021-03-12 02:00:00 44.500 7.950000 7.975 8.100
2021-03-12 03:00:00 44.500 7.750000 8.000 8.100
2021-03-12 04:00:00 46.500 7.533333 8.000 8.100
2021-03-12 05:00:00 47.500 7.350000 8.050 8.000
2021-03-12 06:00:00 48.500 7.100000 8.150 7.950
2021-03-12 07:00:00 48.500 6.900000 8.250 7.900
2021-03-12 08:00:00 47.500 6.725000 8.475 7.900
2021-03-12 09:00:00 46.500 6.700000 8.850 7.900
2021-03-12 10:00:00 44.500 6.725000 9.500 7.900
2021-03-12 11:00:00 43.550 6.900000 10.325 7.950
2021-03-12 12:00:00 42.600 7.150000 11.150 8.000
2021-03-12 13:00:00 40.800 7.550000 11.850 8.100
2021-03-12 14:00:00 39.900 8.075000 12.525 8.150
2021-03-12 15:00:00 39.000 8.450000 13.025 8.200
2021-03-12 16:00:00 40.350 8.475000 12.950 8.200
2021-03-12 17:00:00 41.700 8.400000 12.125 8.125
2021-03-12 18:00:00 44.025 8.300000 11.225 8.025
2021-03-12 19:00:00 45.500 8.250000 10.650 8.000
2021-03-12 20:00:00 46.500 8.100000 10.100 7.925
2021-03-12 21:00:00 46.500 7.950000 9.575 7.900
2021-03-12 22:00:00 46.000 7.800000 9.050 7.900
2021-03-12 23:00:00 44.500 7.700000 8.750 7.925
2021-03-13 00:00:00 44.500 7.700000 8.550 8.000
2021-03-13 01:00:00 44.500 7.700000 8.375 8.000
2021-03-13 02:00:00 44.500 7.675000 8.275 8.000
2021-03-13 03:00:00 44.500 7.600000 8.200 8.000
2021-03-13 04:00:00 46.000 7.475000 8.200 8.000
2021-03-13 05:00:00 46.500 7.350000 8.200 7.950
2021-03-13 06:00:00 46.500 7.150000 8.200 7.900
2021-03-13 07:00:00 46.500 6.966667 8.300 7.850
2021-03-13 08:00:00 46.500 6.875000 8.575 7.800
2021-03-13 09:00:00 46.500 6.800000 9.050 7.875
2021-03-13 10:00:00 45.000 6.775000 9.475 7.900
2021-03-13 11:00:00 42.600 6.700000 9.900 7.900
2021-03-13 12:00:00 46.500 6.650000 10.300 7.900