Datetimes Answers#
Part 1#
Dates and Datetimes#
from datetime import date, datetime, timedelta
Create a
date()
object representing your birthday. Assign it to a variable and use the variable to print out your birthyear.
my_birthday = date(1994, 4, 21)
print(my_birthday.year)
1994
Explain why the following code returns an error:
date(2011, 2, 29)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 1
----> 1 date(2011, 2, 29)
ValueError: day is out of range for month
# It returns an error because there weren't 29 days in February in 2011
Create a datetime object for the 6:23pm on March 13th 1998
datetime(1998, 3, 13, 6, 23)
datetime.datetime(1998, 3, 13, 6, 23)
Create a list with 3 datetime objects in it
sampling_times = [datetime(2011, 4, 10, 12), datetime(2011, 4, 11, 12), datetime(2011, 4, 12, 12)]
sampling_times
[datetime.datetime(2011, 4, 10, 12, 0),
datetime.datetime(2011, 4, 11, 12, 0),
datetime.datetime(2011, 4, 12, 12, 0)]
Creating Dates and Datetimes from Strings#
Create a datetime object from the following string
date_string = '01-22-2009 21:00'
dt = datetime.strptime(date_string, '%m-%d-%Y %H:%M')
dt
datetime.datetime(2009, 1, 22, 21, 0)
Create a datetime object from the following string
date_string = 'Jul 23 1998 8:02:00'
My mistake it turns out the windows machines don’t support the strftime format %-H
because of the dash. I wrote below the correct syntax, but if you are using these notebooks on the SARP laptop they won’t run it. Luckily unpadded hours is really uncommon so this hopefully won’t be a problem for your datasets.
# This errors
dt = datetime.strptime(date_string, '%b %d %Y %-H:%M:%S')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-20-001c4c88cc57> in <module>
1 # This errors
----> 2 dt = datetime.strptime(date_string, '%b %d %Y %-H:%M:%S')
~\.conda\envs\lessons\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
566 """Return a class cls instance based on the input string and the
567 format string."""
--> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
569 tzname, gmtoff = tt[-2:]
570 args = tt[:6] + (fraction,)
~\.conda\envs\lessons\lib\_strptime.py in _strptime(data_string, format)
339 bad_directive = "%"
340 del err
--> 341 raise ValueError("'%s' is a bad directive in format '%s'" %
342 (bad_directive, format)) from None
343 # IndexError only occurs when the format string is "%"
ValueError: '-' is a bad directive in format '%b %d %Y %-H:%M:%S'
Create a datetime object from the following string
date_string = '12/1/72 01 52 12'
My mistake it turns out the windows machines don’t support the strftime format %-d
because of the dash. I wrote below the correct syntax, but if you are using these notebooks on the SARP laptop they won’t run it. Luckily unpadded hours is really uncommon so this hopefully won’t be a problem for your datasets.
# This errors
dt = datetime.strptime(date_string, '%m/%-d/%y %H %M %S')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-8273d2976c5f> in <module>
----> 1 dt = datetime.strptime(date_string, '%m/%-*d/%y %H %M %S')
~\.conda\envs\lessons\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
566 """Return a class cls instance based on the input string and the
567 format string."""
--> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
569 tzname, gmtoff = tt[-2:]
570 args = tt[:6] + (fraction,)
~\.conda\envs\lessons\lib\_strptime.py in _strptime(data_string, format)
339 bad_directive = "%"
340 del err
--> 341 raise ValueError("'%s' is a bad directive in format '%s'" %
342 (bad_directive, format)) from None
343 # IndexError only occurs when the format string is "%"
ValueError: '-' is a bad directive in format '%m/%-*d/%y %H %M %S'
Convert the date August 29th 2008 to the date in Julian days
starting_date = datetime(2008, 8, 29)
format_string = '%j'
starting_date.strftime(format_string)
'242'
Timedeltas#
Calculate how many days it has been since your last birthday.
last_birthday = date(2021, 4, 21)
date.today() - last_birthday
datetime.timedelta(days=69)
Calculate exactly how old you are, down to the hour, right now.
birth_moment = datetime(1994, 4, 21, 5, 33, 0)
datetime.now() - birth_moment
datetime.timedelta(days=9931, seconds=17267, microseconds=131821)
Part 2#
import pandas as pd
was_2020_filepath = "./data/SARP 2020 final.xlsx"
was_2020 = pd.read_excel(was_2020_filepath, "INPUT", skipfooter=7)
<ipython-input-36-f4ef3a08305f>:2: FutureWarning: Your version of xlrd is 1.2.0. In xlrd >= 2.0, only the xls format is supported. As a result, the openpyxl engine will be used if it is installed and the engine argument is not specified. Install openpyxl instead.
was_2020 = pd.read_excel(was_2020_filepath, "INPUT", skipfooter=7)
was_2020
Can # | Snake can # | Location | State | Box | Date | Time | Altitude (m) | Temp. Celsius | Wind Spd (mph) | ... | 3-Ethyltoluene (MS) | 4-Ethyltoluene (MS) | 2-Ethyltoluene (MS) | 1,3,5-Trimethylbenzene (MS) | 1,2,4-Trimethylbenzene (MS) | alpha-Pinene (B/MS) | beta-Pinene (B/MS) | Unnamed: 113 | Box.1 | CH4 (ppmv height).1 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 5044 | 3701 | Mars Hill/Lowell Observatory, Flagstaff, Arizona | Arizona | 37 | 2020-04-24 | 11:23:00 | 2200 | 18 | 5 | ... | 12 | 17 | 17 | 15 | 23 | 12 | 7 | NaN | 37 | 1.914 |
1 | 7338 | 3716 | Buffalo Park, Flagstaff, Arizona | Arizona | 37 | 2020-04-24 | 11:44:00 | 2100 | 18 | 5 | ... | 12 | 16 | 15 | 14 | 22 | 31 | 9 | NaN | 37 | 1.916 |
2 | 6027 | 3717 | Downtown Sedona, Arizona | Arizona | 37 | 2020-04-24 | 12:47:00 | 1350 | 27 | 5 | ... | -888 | -888 | -888 | -888 | 6 | 11 | -888 | NaN | 37 | 1.914 |
3 | 5012 | 3702 | Buffalo Park, Flagstaff, Arizona | Arizona | 37 | 2020-05-01 | 11:16:00 | 2100 | 19 | 16 | ... | -888 | -888 | -888 | -888 | -888 | 22 | -888 | NaN | 37 | 1.895 |
4 | 7330 | 3715 | Mars Hill/Lowell Observatory, Flagstaff, Arizona | Arizona | 37 | 2020-05-01 | 11:49:00 | 2200 | 19 | 16 | ... | 6 | 3 | -888 | -888 | 8 | 23 | -888 | NaN | 37 | 1.900 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
924 | 7267 | 2610 | Oakwood Manor, Oshkosh, Wisconsin | Wisconsin | 26 | 2020-06-24 | 10:28:00 | 234.696 | 20 | 7 | ... | -888 | -888 | -888 | -888 | 3 | 68 | 45 | NaN | 26 | 2.131 |
925 | 8257 | 2623 | South Park, Oshkosh, Wisconsin | Wisconsin | 26 | 2020-06-24 | 10:42:00 | 231.64 | 21.67 | 6 | ... | -888 | -888 | -888 | -888 | 4 | 33 | 18 | NaN | 26 | 2.074 |
926 | 5194 | 2608 | Carl Traeger middle school, Oshkosh, Wisconsin | Wisconsin | 26 | 2020-07-01 | 13:09:00 | 243.84 | 28.89 | 9 | ... | 4 | -888 | -888 | -888 | 5 | 7 | -888 | NaN | 26 | 2.017 |
927 | 7101 | 2609 | Oakwood Manor, Oshkosh, Wisconsin | Wisconsin | 26 | 2020-07-01 | 13:26:00 | 234.696 | 30.5 | 8 | ... | 3 | -888 | -888 | -888 | 5 | 23 | 5 | NaN | 26 | 2.027 |
928 | 8190 | 2624 | South Park, Oshkosh, Wisconsin | Wisconsin | 26 | 2020-07-01 | 13:50:00 | 231.64 | 30.5 | 9 | ... | 3 | -888 | -888 | -888 | 5 | 11 | 3 | NaN | 26 | 1.983 |
929 rows × 116 columns
Question 1#
Using datetime object calculate how long the data record was_2020
is. In other words, how much time passed between the first and the last measurement in this sample list?
was_2020['Date'].max() - was_2020['Date'].min()
Timedelta('132 days 00:00:00')
Question 2#
Creating a datetime column from our was_2020
Dataframe
A) In the was_2020
dataset Date and Time are in two seperate columns. Combine the two dataframes into one and assign the output to a new variable called combined_datetime
.
To do this you will need to:
Convert each column to a string type
Use concatenation to combine them
# Example of string concatenation
'hello ' + 'there'
'hello there'
combined_datetime = was_2020['Date'].astype(str) + was_2020['Time '].astype(str)
combined_datetime
0 2020-04-2411:23:00
1 2020-04-2411:44:00
2 2020-04-2412:47:00
3 2020-05-0111:16:00
4 2020-05-0111:49:00
...
924 2020-06-2410:28:00
925 2020-06-2410:42:00
926 2020-07-0113:09:00
927 2020-07-0113:26:00
928 2020-07-0113:50:00
Length: 929, dtype: object
B) Now that you have a 'combined_datetime'
variable, you can use the pandas function pd.to_datetime()
to convert it from a string to a Series of datetime objects. Create a new column in your dataframe called 'datetime'
for the new datetime objects.
was_2020['datetime'] = pd.to_datetime(combined_datetime, format='%Y-%m-%d%H:%M:%S')
Google suggestion “pandas to_datetime” or the examples in the docs
C) Delete the old 'Date'
and 'Time'
columns with the DATAFRAME.drop()
method.
was_2020 = was_2020.drop(columns=['Date', 'Time '])
Google suggestion “pandas drop column, or the second (not the first) answer on this stackoverflow
Question 3#
Filtering our dataframe to include only the rows within 7 days of our target date
import numpy as np
A) Let’s say that we are interested in a phenomena that occurred on July 5th, 2020 so we want to narrow down our dataframe to inclue only the observations that occured within a week of the 5th.
Start by calculating the difference between each date in the ‘datetime’ column and July 5th, 2020. What type of object is returned in the result?
# Even though we don't care about the specific time we need to use a datetime object, not a date object, because the
# datetime column of the was_2020 dataframe is a datetime
target_date = datetime(2020, 7, 5)
was_2020['datetime'] - target_date
0 -72 days +11:23:00
1 -72 days +11:44:00
2 -72 days +12:47:00
3 -65 days +11:16:00
4 -65 days +11:49:00
...
924 -11 days +10:28:00
925 -11 days +10:42:00
926 -4 days +13:09:00
927 -4 days +13:26:00
928 -4 days +13:50:00
Name: datetime, Length: 929, dtype: timedelta64[ns]
B) Use the calculation from part A and write a conditional statement checking if each of the rows occured within 7 days of the 5th. Don’t forget to include dates of samples both before and after the 5th.
abs(was_2020['datetime'] - target_date) <= timedelta(days=7)
0 False
1 False
2 False
3 False
4 False
...
924 False
925 False
926 True
927 True
928 True
Name: datetime, Length: 929, dtype: bool
C) Use the boolean series from part B as a filter to output the was_2020
dataframe with only the rows within 7 days of July 5th, 2020.
# abs() or absolute value is what caputres days both before and after, since it gets positive and negative numbers
was_2020[abs(was_2020['datetime'] - target_date) <= timedelta(days=7)]
Can # | Snake can # | Location | State | Box | Altitude (m) | Temp. Celsius | Wind Spd (mph) | Wind direction (blowing from direction) | Weather | ... | 4-Ethyltoluene (MS) | 2-Ethyltoluene (MS) | 1,3,5-Trimethylbenzene (MS) | 1,2,4-Trimethylbenzene (MS) | alpha-Pinene (B/MS) | beta-Pinene (B/MS) | Unnamed: 113 | Box.1 | CH4 (ppmv height).1 | datetime | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
17 | 7336 | 3710 | Mars Hill/Lowell Observatory, Flagstaff, Arizona | Arizona | 37 | 2200 | 27 | 10 | SW | Slightly Cloudy | ... | -888 | -888 | -888 | 4 | 126 | 12 | NaN | 37 | 1.882 | 2020-07-04 12:48:00 |
18 | 6035 | 3723 | Downtown Sedona, Arizona | Arizona | 37 | 1350 | 36 | 7 | SW | Clear and sunny | ... | -888 | -888 | -888 | 9 | 44 | 10 | NaN | 37 | 1.882 | 2020-07-04 14:09:00 |
209 | 7993 | 810 | Anaheim AQMD site Loara Elementary school, Ana... | California | 8 | 40 | 22 | 5 | SSW | sunny, but a bit hazy | ... | 3 | -888 | -888 | 6 | 4 | -888 | NaN | 8 | 1.968 | 2020-07-02 12:06:00 |
210 | 8247 | 808 | Imperial Park, Anaheim Hills, California | California | 8 | 137 | 23 | 5 | WSW | sunny, but a bit hazy | ... | 22 | 22 | 20 | 30 | 27 | 9 | NaN | 8 | 1.984 | 2020-07-02 12:35:00 |
211 | 9343 | 4110 | Corona, California | California | 41 | NaN | 29 | 0 | NaN | Clear | ... | 4 | 3 | -888 | 5 | 60 | -888 | NaN | 41 | 1.979 | 2020-07-03 11:21:00 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
785 | 6157 | 3210 | Spring Creek Park (Downwind), Tomball, Texas | Texas | 32 | 70 | 95 F | 6 | S | Partly Cloudy | ... | 8 | 6 | 7 | 19 | 147 | 58 | NaN | 32 | 1.881 | 2020-07-08 15:00:00 |
786 | 5105 | 3223 | Spring Creek Park (Downwind), Tomball, Texas | Texas | 32 | 70 | 94 F | 5 | SSW | Partly Cloudy | ... | 6 | 6 | 6 | 24 | 61 | 41 | NaN | 32 | 1.878 | 2020-07-08 16:00:00 |
926 | 5194 | 2608 | Carl Traeger middle school, Oshkosh, Wisconsin | Wisconsin | 26 | 243.84 | 28.89 | 9 | ENE | Clear skies and sunny | ... | -888 | -888 | -888 | 5 | 7 | -888 | NaN | 26 | 2.017 | 2020-07-01 13:09:00 |
927 | 7101 | 2609 | Oakwood Manor, Oshkosh, Wisconsin | Wisconsin | 26 | 234.696 | 30.5 | 8 | ENE | Clear skies and sunny | ... | -888 | -888 | -888 | 5 | 23 | 5 | NaN | 26 | 2.027 | 2020-07-01 13:26:00 |
928 | 8190 | 2624 | South Park, Oshkosh, Wisconsin | Wisconsin | 26 | 231.64 | 30.5 | 9 | ENE | Clear skies and sunny | ... | -888 | -888 | -888 | 5 | 11 | 3 | NaN | 26 | 1.983 | 2020-07-01 13:50:00 |
66 rows × 115 columns
Question 4#
# Read in the data
water_vars = pd.read_csv('./data/englewood_3_12_21_usgs_water.tsv', sep='\t', skiprows=30)
# There are a lot of variables here, so let's shorten our dataframe to a few variables
water_vars = water_vars[['datetime', '210920_00060', '210922_00010', '210924_00300', '210925_00400']]
# Get rid of the first row of hard-coded datatype info
water_vars = water_vars.drop(0)
# Rename the columns from their USGS codes to more human-readible names
name_codes = {'210920_00060': 'discharge','210922_00010': 'temperature', '210924_00300': 'dissolved oxygen', '210925_00400': 'pH'}
water_vars = water_vars.rename(columns=name_codes)
# Convert columns with numbers to a numeric type
water_vars['discharge'] = pd.to_numeric(water_vars['discharge'])
water_vars['temperature'] = pd.to_numeric(water_vars['temperature'])
water_vars['dissolved oxygen'] = pd.to_numeric(water_vars['dissolved oxygen'])
water_vars['pH'] = pd.to_numeric(water_vars['pH'])
water_vars
datetime | discharge | temperature | dissolved oxygen | pH | |
---|---|---|---|---|---|
1 | 2021-03-12 00:00 | 44.5 | 8.1 | 8.3 | 8.1 |
2 | 2021-03-12 00:15 | 44.5 | 8.1 | 8.2 | 8.1 |
3 | 2021-03-12 00:30 | 44.5 | 8.1 | 8.2 | 8.1 |
4 | 2021-03-12 00:45 | 44.5 | 8.1 | 8.1 | 8.1 |
5 | 2021-03-12 01:00 | 44.5 | 8.1 | 8.1 | 8.1 |
... | ... | ... | ... | ... | ... |
142 | 2021-03-13 11:15 | 42.6 | 6.7 | 9.8 | 7.9 |
143 | 2021-03-13 11:30 | 42.6 | 6.7 | 9.9 | 7.9 |
144 | 2021-03-13 11:45 | 42.6 | 6.7 | 10.2 | 7.9 |
145 | 2021-03-13 12:00 | 46.5 | 6.7 | 10.3 | 7.9 |
146 | 2021-03-13 12:15 | NaN | 6.6 | 10.3 | 7.9 |
146 rows × 5 columns
A) Convert the ‘datetime’ string column to a column of datetime objects using pd.to_datetime()
.
water_vars['datetime'] = pd.to_datetime(water_vars['datetime'], format='%Y-%m-%d %H:%M')
B) Set the new datetime column as the index of the dataframe.
water_vars = water_vars.set_index('datetime')
C) Use the new index to retrieve the value for '2021-03-12 13:30:00'
water_vars.loc['2021-03-12 13:30:00']
discharge 40.8
temperature 7.6
dissolved oxygen 11.9
pH 8.1
Name: 2021-03-12 13:30:00, dtype: float64
D) One cool thing we can do when we have a datetime index is easily resample the data. Resampling is when we aggregate more finely resolved data to be more coarsely resolved. In this example we will be taking data that is reported every 15 minutes and resampling to an hourly resolution.
Use the DATAFRAME.resample()
function to resample to hourly resolution using the mean value of the 15 minute intervals. Check out the docs page or the pandas datetime overview for examples.
water_vars.resample("H").mean()
discharge | temperature | dissolved oxygen | pH | |
---|---|---|---|---|
datetime | ||||
2021-03-12 00:00:00 | 44.500 | 8.100000 | 8.200 | 8.100 |
2021-03-12 01:00:00 | 44.500 | 8.050000 | 8.050 | 8.100 |
2021-03-12 02:00:00 | 44.500 | 7.950000 | 7.975 | 8.100 |
2021-03-12 03:00:00 | 44.500 | 7.750000 | 8.000 | 8.100 |
2021-03-12 04:00:00 | 46.500 | 7.533333 | 8.000 | 8.100 |
2021-03-12 05:00:00 | 47.500 | 7.350000 | 8.050 | 8.000 |
2021-03-12 06:00:00 | 48.500 | 7.100000 | 8.150 | 7.950 |
2021-03-12 07:00:00 | 48.500 | 6.900000 | 8.250 | 7.900 |
2021-03-12 08:00:00 | 47.500 | 6.725000 | 8.475 | 7.900 |
2021-03-12 09:00:00 | 46.500 | 6.700000 | 8.850 | 7.900 |
2021-03-12 10:00:00 | 44.500 | 6.725000 | 9.500 | 7.900 |
2021-03-12 11:00:00 | 43.550 | 6.900000 | 10.325 | 7.950 |
2021-03-12 12:00:00 | 42.600 | 7.150000 | 11.150 | 8.000 |
2021-03-12 13:00:00 | 40.800 | 7.550000 | 11.850 | 8.100 |
2021-03-12 14:00:00 | 39.900 | 8.075000 | 12.525 | 8.150 |
2021-03-12 15:00:00 | 39.000 | 8.450000 | 13.025 | 8.200 |
2021-03-12 16:00:00 | 40.350 | 8.475000 | 12.950 | 8.200 |
2021-03-12 17:00:00 | 41.700 | 8.400000 | 12.125 | 8.125 |
2021-03-12 18:00:00 | 44.025 | 8.300000 | 11.225 | 8.025 |
2021-03-12 19:00:00 | 45.500 | 8.250000 | 10.650 | 8.000 |
2021-03-12 20:00:00 | 46.500 | 8.100000 | 10.100 | 7.925 |
2021-03-12 21:00:00 | 46.500 | 7.950000 | 9.575 | 7.900 |
2021-03-12 22:00:00 | 46.000 | 7.800000 | 9.050 | 7.900 |
2021-03-12 23:00:00 | 44.500 | 7.700000 | 8.750 | 7.925 |
2021-03-13 00:00:00 | 44.500 | 7.700000 | 8.550 | 8.000 |
2021-03-13 01:00:00 | 44.500 | 7.700000 | 8.375 | 8.000 |
2021-03-13 02:00:00 | 44.500 | 7.675000 | 8.275 | 8.000 |
2021-03-13 03:00:00 | 44.500 | 7.600000 | 8.200 | 8.000 |
2021-03-13 04:00:00 | 46.000 | 7.475000 | 8.200 | 8.000 |
2021-03-13 05:00:00 | 46.500 | 7.350000 | 8.200 | 7.950 |
2021-03-13 06:00:00 | 46.500 | 7.150000 | 8.200 | 7.900 |
2021-03-13 07:00:00 | 46.500 | 6.966667 | 8.300 | 7.850 |
2021-03-13 08:00:00 | 46.500 | 6.875000 | 8.575 | 7.800 |
2021-03-13 09:00:00 | 46.500 | 6.800000 | 9.050 | 7.875 |
2021-03-13 10:00:00 | 45.000 | 6.775000 | 9.475 | 7.900 |
2021-03-13 11:00:00 | 42.600 | 6.700000 | 9.900 | 7.900 |
2021-03-13 12:00:00 | 46.500 | 6.650000 | 10.300 | 7.900 |