Datetimes Answers

Datetimes Answers#

Part 1#

Dates and Datetimes#

from datetime import date, datetime, timedelta

Create a date() object representing your birthday. Assign it to a variable and use the variable to print out your birthyear.

my_birthday = date(1994, 4, 21)
print(my_birthday.year)

Explain why the following code returns an error:

date(2011, 2, 29)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 date(2011, 2, 29)

ValueError: day is out of range for month

# It returns an error because there weren't 29 days in February in 2011

Create a datetime object for the 6:23pm on March 13th 1998

datetime(1998, 3, 13, 6, 23)

datetime.datetime(1998, 3, 13, 6, 23)

Create a list with 3 datetime objects in it

sampling_times = [datetime(2011, 4, 10, 12), datetime(2011, 4, 11, 12), datetime(2011, 4, 12, 12)]

sampling_times

[datetime.datetime(2011, 4, 10, 12, 0),
 datetime.datetime(2011, 4, 11, 12, 0),
 datetime.datetime(2011, 4, 12, 12, 0)]

Creating Dates and Datetimes from Strings#

Create a datetime object from the following string

date_string = '01-22-2009 21:00'

dt = datetime.strptime(date_string, '%m-%d-%Y %H:%M')

dt

datetime.datetime(2009, 1, 22, 21, 0)

Create a datetime object from the following string

date_string = 'Jul 23 1998 8:02:00'

My mistake it turns out the windows machines don’t support the strftime format %-H because of the dash. I wrote below the correct syntax, but if you are using these notebooks on the SARP laptop they won’t run it. Luckily unpadded hours is really uncommon so this hopefully won’t be a problem for your datasets.

# This errors
dt = datetime.strptime(date_string, '%b %d %Y %-H:%M:%S')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-001c4c88cc57> in <module>
      1 # This errors
----> 2 dt = datetime.strptime(date_string, '%b %d %Y %-H:%M:%S')

~\.conda\envs\lessons\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
    566     """Return a class cls instance based on the input string and the
    567     format string."""
--> 568     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
    569     tzname, gmtoff = tt[-2:]
    570     args = tt[:6] + (fraction,)

~\.conda\envs\lessons\lib\_strptime.py in _strptime(data_string, format)
    339                     bad_directive = "%"
    340                 del err
--> 341                 raise ValueError("'%s' is a bad directive in format '%s'" %
    342                                     (bad_directive, format)) from None
    343             # IndexError only occurs when the format string is "%"

ValueError: '-' is a bad directive in format '%b %d %Y %-H:%M:%S'

Create a datetime object from the following string

date_string = '12/1/72 01 52 12'

My mistake it turns out the windows machines don’t support the strftime format %-d because of the dash. I wrote below the correct syntax, but if you are using these notebooks on the SARP laptop they won’t run it. Luckily unpadded hours is really uncommon so this hopefully won’t be a problem for your datasets.

# This errors
dt = datetime.strptime(date_string, '%m/%-d/%y %H %M %S')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-24-8273d2976c5f> in <module>
----> 1 dt = datetime.strptime(date_string, '%m/%-*d/%y %H %M %S')

~\.conda\envs\lessons\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
    566     """Return a class cls instance based on the input string and the
    567     format string."""
--> 568     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
    569     tzname, gmtoff = tt[-2:]
    570     args = tt[:6] + (fraction,)

~\.conda\envs\lessons\lib\_strptime.py in _strptime(data_string, format)
    339                     bad_directive = "%"
    340                 del err
--> 341                 raise ValueError("'%s' is a bad directive in format '%s'" %
    342                                     (bad_directive, format)) from None
    343             # IndexError only occurs when the format string is "%"

ValueError: '-' is a bad directive in format '%m/%-*d/%y %H %M %S'

Convert the date August 29th 2008 to the date in Julian days

starting_date = datetime(2008, 8, 29)

format_string = '%j'
starting_date.strftime(format_string)

'242'

Timedeltas#

Calculate how many days it has been since your last birthday.

last_birthday = date(2021, 4, 21)

date.today() - last_birthday

datetime.timedelta(days=69)

Calculate exactly how old you are, down to the hour, right now.

birth_moment = datetime(1994, 4, 21, 5, 33, 0)

datetime.now() - birth_moment

datetime.timedelta(days=9931, seconds=17267, microseconds=131821)

Part 2#

import pandas as pd

was_2020_filepath = "./data/SARP 2020 final.xlsx"
was_2020 = pd.read_excel(was_2020_filepath, "INPUT", skipfooter=7)

<ipython-input-36-f4ef3a08305f>:2: FutureWarning: Your version of xlrd is 1.2.0. In xlrd >= 2.0, only the xls format is supported. As a result, the openpyxl engine will be used if it is installed and the engine argument is not specified. Install openpyxl instead.
  was_2020 = pd.read_excel(was_2020_filepath, "INPUT", skipfooter=7)

was_2020

	Can #	Snake can #	Location	State	Box	Date	Time	Altitude (m)	Temp. Celsius	Wind Spd (mph)	...	3-Ethyltoluene (MS)	4-Ethyltoluene (MS)	2-Ethyltoluene (MS)	1,3,5-Trimethylbenzene (MS)	1,2,4-Trimethylbenzene (MS)	alpha-Pinene (B/MS)	beta-Pinene (B/MS)	Unnamed: 113	Box.1	CH4 (ppmv height).1
0	5044	3701	Mars Hill/Lowell Observatory, Flagstaff, Arizona	Arizona	37	2020-04-24	11:23:00	2200	18	5	...	12	17	17	15	23	12	7	NaN	37	1.914
1	7338	3716	Buffalo Park, Flagstaff, Arizona	Arizona	37	2020-04-24	11:44:00	2100	18	5	...	12	16	15	14	22	31	9	NaN	37	1.916
2	6027	3717	Downtown Sedona, Arizona	Arizona	37	2020-04-24	12:47:00	1350	27	5	...	-888	-888	-888	-888	6	11	-888	NaN	37	1.914
3	5012	3702	Buffalo Park, Flagstaff, Arizona	Arizona	37	2020-05-01	11:16:00	2100	19	16	...	-888	-888	-888	-888	-888	22	-888	NaN	37	1.895
4	7330	3715	Mars Hill/Lowell Observatory, Flagstaff, Arizona	Arizona	37	2020-05-01	11:49:00	2200	19	16	...	6	3	-888	-888	8	23	-888	NaN	37	1.900
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
924	7267	2610	Oakwood Manor, Oshkosh, Wisconsin	Wisconsin	26	2020-06-24	10:28:00	234.696	20	7	...	-888	-888	-888	-888	3	68	45	NaN	26	2.131
925	8257	2623	South Park, Oshkosh, Wisconsin	Wisconsin	26	2020-06-24	10:42:00	231.64	21.67	6	...	-888	-888	-888	-888	4	33	18	NaN	26	2.074
926	5194	2608	Carl Traeger middle school, Oshkosh, Wisconsin	Wisconsin	26	2020-07-01	13:09:00	243.84	28.89	9	...	4	-888	-888	-888	5	7	-888	NaN	26	2.017
927	7101	2609	Oakwood Manor, Oshkosh, Wisconsin	Wisconsin	26	2020-07-01	13:26:00	234.696	30.5	8	...	3	-888	-888	-888	5	23	5	NaN	26	2.027
928	8190	2624	South Park, Oshkosh, Wisconsin	Wisconsin	26	2020-07-01	13:50:00	231.64	30.5	9	...	3	-888	-888	-888	5	11	3	NaN	26	1.983

929 rows × 116 columns

Question 1#

Using datetime object calculate how long the data record was_2020 is. In other words, how much time passed between the first and the last measurement in this sample list?

was_2020['Date'].max() - was_2020['Date'].min()

Timedelta('132 days 00:00:00')

Question 2#

Creating a datetime column from our was_2020 Dataframe

A) In the was_2020 dataset Date and Time are in two seperate columns. Combine the two dataframes into one and assign the output to a new variable called combined_datetime.

To do this you will need to:

Convert each column to a string type
Use concatenation to combine them

# Example of string concatenation
'hello ' + 'there'

'hello there'

combined_datetime = was_2020['Date'].astype(str) + was_2020['Time '].astype(str)

combined_datetime

    2020-04-2411:23:00
    2020-04-2411:44:00
    2020-04-2412:47:00
    2020-05-0111:16:00
    2020-05-0111:49:00
              ...        
  2020-06-2410:28:00
  2020-06-2410:42:00
  2020-07-0113:09:00
  2020-07-0113:26:00
  2020-07-0113:50:00
Length: 929, dtype: object

B) Now that you have a 'combined_datetime' variable, you can use the pandas function pd.to_datetime() to convert it from a string to a Series of datetime objects. Create a new column in your dataframe called 'datetime' for the new datetime objects.

was_2020['datetime'] = pd.to_datetime(combined_datetime, format='%Y-%m-%d%H:%M:%S')

Google suggestion “pandas to_datetime” or the examples in the docs

C) Delete the old 'Date' and 'Time' columns with the DATAFRAME.drop() method.

was_2020 = was_2020.drop(columns=['Date', 'Time '])

Google suggestion “pandas drop column, or the second (not the first) answer on this stackoverflow

Question 3#

Filtering our dataframe to include only the rows within 7 days of our target date

import numpy as np

A) Let’s say that we are interested in a phenomena that occurred on July 5th, 2020 so we want to narrow down our dataframe to inclue only the observations that occured within a week of the 5th.

Start by calculating the difference between each date in the ‘datetime’ column and July 5th, 2020. What type of object is returned in the result?

# Even though we don't care about the specific time we need to use a datetime object, not a date object, because the
# datetime column of the was_2020 dataframe is a datetime
target_date = datetime(2020, 7, 5)

was_2020['datetime'] - target_date

   -72 days +11:23:00
   -72 days +11:44:00
   -72 days +12:47:00
   -65 days +11:16:00
   -65 days +11:49:00
             ...        
 -11 days +10:28:00
 -11 days +10:42:00
  -4 days +13:09:00
  -4 days +13:26:00
  -4 days +13:50:00
Name: datetime, Length: 929, dtype: timedelta64[ns]

B) Use the calculation from part A and write a conditional statement checking if each of the rows occured within 7 days of the 5th. Don’t forget to include dates of samples both before and after the 5th.

abs(was_2020['datetime'] - target_date) <= timedelta(days=7)

    False
    False
    False
    False
    False
       ...  
  False
  False
   True
   True
   True
Name: datetime, Length: 929, dtype: bool

C) Use the boolean series from part B as a filter to output the was_2020 dataframe with only the rows within 7 days of July 5th, 2020.

# abs() or absolute value is what caputres days both before and after, since it gets positive and negative numbers
was_2020[abs(was_2020['datetime'] - target_date) <= timedelta(days=7)]

	Can #	Snake can #	Location	State	Box	Altitude (m)	Temp. Celsius	Wind Spd (mph)	Wind direction (blowing from direction)	Weather	...	4-Ethyltoluene (MS)	2-Ethyltoluene (MS)	1,3,5-Trimethylbenzene (MS)	1,2,4-Trimethylbenzene (MS)	alpha-Pinene (B/MS)	beta-Pinene (B/MS)	Unnamed: 113	Box.1	CH4 (ppmv height).1	datetime
17	7336	3710	Mars Hill/Lowell Observatory, Flagstaff, Arizona	Arizona	37	2200	27	10	SW	Slightly Cloudy	...	-888	-888	-888	4	126	12	NaN	37	1.882	2020-07-04 12:48:00
18	6035	3723	Downtown Sedona, Arizona	Arizona	37	1350	36	7	SW	Clear and sunny	...	-888	-888	-888	9	44	10	NaN	37	1.882	2020-07-04 14:09:00
209	7993	810	Anaheim AQMD site Loara Elementary school, Ana...	California	8	40	22	5	SSW	sunny, but a bit hazy	...	3	-888	-888	6	4	-888	NaN	8	1.968	2020-07-02 12:06:00
210	8247	808	Imperial Park, Anaheim Hills, California	California	8	137	23	5	WSW	sunny, but a bit hazy	...	22	22	20	30	27	9	NaN	8	1.984	2020-07-02 12:35:00
211	9343	4110	Corona, California	California	41	NaN	29	0	NaN	Clear	...	4	3	-888	5	60	-888	NaN	41	1.979	2020-07-03 11:21:00
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
785	6157	3210	Spring Creek Park (Downwind), Tomball, Texas	Texas	32	70	95 F	6	S	Partly Cloudy	...	8	6	7	19	147	58	NaN	32	1.881	2020-07-08 15:00:00
786	5105	3223	Spring Creek Park (Downwind), Tomball, Texas	Texas	32	70	94 F	5	SSW	Partly Cloudy	...	6	6	6	24	61	41	NaN	32	1.878	2020-07-08 16:00:00
926	5194	2608	Carl Traeger middle school, Oshkosh, Wisconsin	Wisconsin	26	243.84	28.89	9	ENE	Clear skies and sunny	...	-888	-888	-888	5	7	-888	NaN	26	2.017	2020-07-01 13:09:00
927	7101	2609	Oakwood Manor, Oshkosh, Wisconsin	Wisconsin	26	234.696	30.5	8	ENE	Clear skies and sunny	...	-888	-888	-888	5	23	5	NaN	26	2.027	2020-07-01 13:26:00
928	8190	2624	South Park, Oshkosh, Wisconsin	Wisconsin	26	231.64	30.5	9	ENE	Clear skies and sunny	...	-888	-888	-888	5	11	3	NaN	26	1.983	2020-07-01 13:50:00

66 rows × 115 columns

Question 4#

# Read in the data
water_vars = pd.read_csv('./data/englewood_3_12_21_usgs_water.tsv', sep='\t', skiprows=30)
# There are a lot of variables here, so let's shorten our dataframe to a few variables
water_vars = water_vars[['datetime', '210920_00060', '210922_00010', '210924_00300', '210925_00400']]
# Get rid of the first row of hard-coded datatype info
water_vars = water_vars.drop(0)
# Rename the columns from their USGS codes to more human-readible names
name_codes = {'210920_00060': 'discharge','210922_00010': 'temperature', '210924_00300': 'dissolved oxygen', '210925_00400': 'pH'}
water_vars = water_vars.rename(columns=name_codes)
# Convert columns with numbers to a numeric type
water_vars['discharge'] = pd.to_numeric(water_vars['discharge'])
water_vars['temperature'] = pd.to_numeric(water_vars['temperature'])
water_vars['dissolved oxygen'] = pd.to_numeric(water_vars['dissolved oxygen'])
water_vars['pH'] = pd.to_numeric(water_vars['pH'])

water_vars

	datetime	discharge	temperature	dissolved oxygen	pH
1	2021-03-12 00:00	44.5	8.1	8.3	8.1
2	2021-03-12 00:15	44.5	8.1	8.2	8.1
3	2021-03-12 00:30	44.5	8.1	8.2	8.1
4	2021-03-12 00:45	44.5	8.1	8.1	8.1
5	2021-03-12 01:00	44.5	8.1	8.1	8.1
...	...	...	...	...	...
142	2021-03-13 11:15	42.6	6.7	9.8	7.9
143	2021-03-13 11:30	42.6	6.7	9.9	7.9
144	2021-03-13 11:45	42.6	6.7	10.2	7.9
145	2021-03-13 12:00	46.5	6.7	10.3	7.9
146	2021-03-13 12:15	NaN	6.6	10.3	7.9

146 rows × 5 columns

A) Convert the ‘datetime’ string column to a column of datetime objects using pd.to_datetime().

water_vars['datetime'] = pd.to_datetime(water_vars['datetime'], format='%Y-%m-%d %H:%M')

B) Set the new datetime column as the index of the dataframe.

water_vars = water_vars.set_index('datetime')

C) Use the new index to retrieve the value for '2021-03-12 13:30:00'

water_vars.loc['2021-03-12 13:30:00']

discharge           40.8
temperature          7.6
dissolved oxygen    11.9
pH                   8.1
Name: 2021-03-12 13:30:00, dtype: float64

D) One cool thing we can do when we have a datetime index is easily resample the data. Resampling is when we aggregate more finely resolved data to be more coarsely resolved. In this example we will be taking data that is reported every 15 minutes and resampling to an hourly resolution.

Use the DATAFRAME.resample() function to resample to hourly resolution using the mean value of the 15 minute intervals. Check out the docs page or the pandas datetime overview for examples.

water_vars.resample("H").mean()

	discharge	temperature	dissolved oxygen	pH
datetime
2021-03-12 00:00:00	44.500	8.100000	8.200	8.100
2021-03-12 01:00:00	44.500	8.050000	8.050	8.100
2021-03-12 02:00:00	44.500	7.950000	7.975	8.100
2021-03-12 03:00:00	44.500	7.750000	8.000	8.100
2021-03-12 04:00:00	46.500	7.533333	8.000	8.100
2021-03-12 05:00:00	47.500	7.350000	8.050	8.000
2021-03-12 06:00:00	48.500	7.100000	8.150	7.950
2021-03-12 07:00:00	48.500	6.900000	8.250	7.900
2021-03-12 08:00:00	47.500	6.725000	8.475	7.900
2021-03-12 09:00:00	46.500	6.700000	8.850	7.900
2021-03-12 10:00:00	44.500	6.725000	9.500	7.900
2021-03-12 11:00:00	43.550	6.900000	10.325	7.950
2021-03-12 12:00:00	42.600	7.150000	11.150	8.000
2021-03-12 13:00:00	40.800	7.550000	11.850	8.100
2021-03-12 14:00:00	39.900	8.075000	12.525	8.150
2021-03-12 15:00:00	39.000	8.450000	13.025	8.200
2021-03-12 16:00:00	40.350	8.475000	12.950	8.200
2021-03-12 17:00:00	41.700	8.400000	12.125	8.125
2021-03-12 18:00:00	44.025	8.300000	11.225	8.025
2021-03-12 19:00:00	45.500	8.250000	10.650	8.000
2021-03-12 20:00:00	46.500	8.100000	10.100	7.925
2021-03-12 21:00:00	46.500	7.950000	9.575	7.900
2021-03-12 22:00:00	46.000	7.800000	9.050	7.900
2021-03-12 23:00:00	44.500	7.700000	8.750	7.925
2021-03-13 00:00:00	44.500	7.700000	8.550	8.000
2021-03-13 01:00:00	44.500	7.700000	8.375	8.000
2021-03-13 02:00:00	44.500	7.675000	8.275	8.000
2021-03-13 03:00:00	44.500	7.600000	8.200	8.000
2021-03-13 04:00:00	46.000	7.475000	8.200	8.000
2021-03-13 05:00:00	46.500	7.350000	8.200	7.950
2021-03-13 06:00:00	46.500	7.150000	8.200	7.900
2021-03-13 07:00:00	46.500	6.966667	8.300	7.850
2021-03-13 08:00:00	46.500	6.875000	8.575	7.800
2021-03-13 09:00:00	46.500	6.800000	9.050	7.875
2021-03-13 10:00:00	45.000	6.775000	9.475	7.900
2021-03-13 11:00:00	42.600	6.700000	9.900	7.900
2021-03-13 12:00:00	46.500	6.650000	10.300	7.900