Datetimes Practice#
Part 1#
Dates and Datetimes#
Create a
date()
object representing your birthday. Assign it to a variable and use the variable to print out your birthyear.
Explain why the following code returns an error:
date(2011, 2, 29)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 1
----> 1 date(2011, 2, 29)
NameError: name 'date' is not defined
Create a datetime object for the 6:23pm on March 13th 1998
Create a list with 3 datetime objects in it
Creating Dates and Datetimes from Strings#
Create a datetime object from the following string
date_string = '01-22-2009 21:00'
Create a datetime object from the following string
date_string = 'Jul 23 1998 8:02:00'
Create a datetime object from the following string
date_string = '12/1/72 01 52 12'
Convert the date August 29th 2008 to the date in Julian days
Timedeltas#
Calculate how many days it has been since your last birthday.
Calculate exactly how old you are, down to the hour, right now.
Part 2#
import pandas as pd
was_2020_filepath = "../data/SARP 2020 final.xlsx"
was_2020 = pd.read_excel(was_2020_filepath, "INPUT", skipfooter=7)
Question 1#
Using datetime object calculate how long the data record was_2020
is. In other words, how much time passed between the first and the last measurement in this sample list?
Question 2#
Creating a datetime column from our was_2020
Dataframe
A) In the was_2020
dataset Date and Time are in two seperate columns. Combine the two dataframes into one and assign the output to a new variable called combined_datetime
.
To do this you will need to:
Convert each column to a string type
Use concatenation to combine them
# Example of string concatenation
'hello ' + 'there'
B) Now that you have a 'combined_datetime'
variable, you can use the pandas function pd.to_datetime()
to convert it from a string to a Series of datetime objects. Create a new column in your dataframe called 'datetime'
for the new datetime objects.
Hint
Google suggestion: “pandas to_datetime”
try following the Examples in the middle of the docs page
C) Delete the old 'Date'
and 'Time'
columns with the DATAFRAME.drop()
method.
Hint
Google suggestion: “pandas drop column
the second (not the first) answer on this stackoverflow
Question 3#
Filtering our dataframe to include only the rows within 7 days of our target date
import numpy as np
A) Let’s say that we are interested in a phenomena that occurred on July 5th, 2020 so we want to narrow down our dataframe to inclue only the observations that occured within a week of the 5th.
Start by calculating the difference between each date in the ‘datetime’ column and July 5th, 2020. What type of object is returned in the result?
B) Use the calculation from part A and write a conditional statement checking if each of the rows occured within 7 days of the 5th. Don’t forget to include dates of samples both before and after the 5th.
C) Use the boolean series from part B as a filter to output the was_2020
dataframe with only the rows within 7 days of July 5th, 2020.
Question 4#
# Read in the data
water_vars = pd.read_csv('../data/englewood_3_12_21_usgs_water.tsv', sep='\t', skiprows=30)
# There are a lot of variables here, so let's shorten our dataframe to a few variables
water_vars = water_vars[['datetime', '210920_00060', '210922_00010', '210924_00300', '210925_00400']]
# Get rid of the first row of hard-coded datatype info
water_vars = water_vars.drop(0)
# Rename the columns from their USGS codes to more human-readible names
name_codes = {'210920_00060': 'discharge','210922_00010': 'temperature', '210924_00300': 'dissolved oxygen', '210925_00400': 'pH'}
water_vars = water_vars.rename(columns=name_codes)
# Convert columns with numbers to a numeric type
water_vars['discharge'] = pd.to_numeric(water_vars['discharge'])
water_vars['temperature'] = pd.to_numeric(water_vars['temperature'])
water_vars['dissolved oxygen'] = pd.to_numeric(water_vars['dissolved oxygen'])
water_vars['pH'] = pd.to_numeric(water_vars['pH'])
water_vars
A) Convert the ‘datetime’ string column to a column of datetime objects using pd.to_datetime()
.
B) Set the new datetime column as the index of the dataframe.
C) Use the new index to retrieve the value for '2021-03-12 13:30:00'
D) One cool thing we can do when we have a datetime index is easily resample the data. Resampling is when we aggregate more finely resolved data to be more coarsely resolved. In this example we will be taking data that is reported every 15 minutes and resampling to an hourly resolution.
Use the DATAFRAME.resample()
function to resample to hourly resolution using the mean value of the 15 minute intervals. Check out the docs page or the pandas datetime overview for examples.