Date: Friday, Feb 28, 2025 at 09:30 AM US/Eastern
Dates and datetimes are deceptively tricky in Python and pandas. Whether you’re working with timestamps in financial data, scheduling events, or aligning time series, small mistakes can lead to major errors.
In this seminar, we’ll break down the core datetime implementations in Python and pandas, showing how to parse, manipulate, and analyze date-based data effectively. But more importantly, we’ll explore the hidden pitfalls—handling time zones, ambiguous/nonexistent dates, subtle indexing issues, and more—that can cause silent failures in your analysis.
By the end of this session, you’ll walk away with:
If you’ve ever been burned by a timezone bug or an off-by-one-day error, this seminar is for you!
python -m pip install numpy pandas pyarrow python-dateutil
print("Let's take a look!")
Let’s start from the very beginning. Say we have some measurement that we have captured over time.
We could record this data in a pandas.Series for the purposes of performing
analyses.
from pandas import Series
from numpy import tile, arange, repeat
from numpy.random import default_rng
rng = default_rng(0)
entities = ['abc', 'def', 'xyz']
s = Series(
index=(idx := tile(entities, 3)),
data=rng.random(size=len(idx)),
).rename_axis('entity')
print(
s,
# s.loc['abc'],
# s.loc['abc'].diff(1),
s
.to_frame('value')
.assign(num=repeat(arange(len(entities)), 3))
.set_index('num', append=True)
.unstack('entity')['value']
,
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
If we look at how the evolving values are tagged, we see the first indication of a “time” value. We are presenting the various time samples as numerical values 0…2. If the actual measurement was captured on an even frequency, then this numerical value could represent the number of those units since an “epoch.”
Indeed, that’s precisely how much of our timeseries data may be represented.
from numpy import array
timestamps = [
1_577_836_800_000_000_000,
1_577_836_801_000_000_000,
1_577_836_802_000_000_000,
]
xs = array(timestamps).astype('datetime64[ns]')
print(f'{xs = }')
Typically, the “epoch” that is selected is 1970-01-01. The “frequency” can vary based on the desired fidelity of our measurement.
from numpy import array
timestamps = [
1_577_836_800,
1_577_836_801,
1_577_836_802,
]
xs = array(timestamps)
print(
xs.astype('datetime64[s]'),
(xs * 1_000).astype('datetime64[ms]'),
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
This choice of units will also affect the maximum value we can represent.
from numpy import array
print(
array([2 ** (64-1) - 1]).astype('datetime64[ns]'),
array([2 ** (64-1) - 1]).astype('datetime64[s]'),
sep='\n',
)
The reason we may want to represent this data in numpy as
dtype=datetime64[ns] is that we want to convenient datetime operations on it,
similar to what is afforded to us in pure Python with the datetime module.
from datetime import datetime, timedelta
x = datetime(2020, 1, 1, 9, 30, 0)
print(
f'{x = }',
f'{x.year = }',
f'{x.weekday() = }',
f'{x + timedelta(days=3) = }',
sep='\n',
)
from numpy import array
x = array(1_577_836_800, dtype='datetime64[s]')[()]
y = array( 3, dtype='timedelta64[D]')[()]
print(
f'{x = }',
f'{x + y = }',
sep='\n',
)
There are two general kinds of time that we may capture in our code:
These are reflected quite well in our choices of functions in the Python standard library.
from time import time, perf_counter, sleep
before = time() # “wall clock”
sleep(1)
after = time()
assert after > before, 'Time went backwards!'
before = perf_counter() # “monotonic”
sleep(1)
after = perf_counter()
assert after > before, 'Time went backwards!'
But it’s important to note that datetimes are a very special type of measurement, given their intimate relationship to the human legal, social, and political world.
As a consequence of people wanting to wake up at 9:00 according to their local “wall clock” no matter where they live on earth, we have time zones.
These time zones are decided by regulatory bodies. Sometimes a country will have time zones roughly aligned with the longitudes; sometimes a country will have a single time zone, despite spanning many longitudes.
Additionally, these time zones change over time, since the regulatory bodies that govern them change over time. Furthermore, there may be offsets applied to these time zones (e.g., “daylight saving” time) to accomplish various economic or social objectives.
And that isn’t even taking into account that the earth’s rotation around the sun is not an even 365 days×24 hours/day×60 minutes/hour×60 seconds/minute— sometimes we have to insert leap-days or leap-seconds into our calendar to ensure the alignment of our calendar to the seasons.
Time zones may seem very complicated, because the political mechanisms behind them are complicated. But, in essence, a time zone is a very simple idea.
Every time we collect a measurement, we want to collect up to two additional pieces of data associated with that measurement: how that measurement is calibrated and the reason that that calibration is selected. Anything less than this is a simplification that is throwing away information.
The calibration is the time zone, the reasoning for that calibration can be, for example, the geographic coördinates for a physical entity or, for example, the governing body which regulates a social entity.
Here is how we represent a datetime with a timezone in pure Python.
from datetime import datetime
from zoneinfo import ZoneInfo
# ts = datetime(2020, 1, 1, 9, 30) # timezone-naïve
ts = datetime(2020, 1, 1, 9, 30, tzinfo=ZoneInfo('US/Eastern')) # timezone-aware
print(
f'{ts = :%Y-%m-%d %H:%M:%S}',
f'{ts.astimezone(ZoneInfo("US/Eastern")) = :%Y-%m-%d %H:%M:%S}',
f'{ts.astimezone(ZoneInfo("US/Pacific")) = :%Y-%m-%d %H:%M:%S}',
sep='\n',
)
Note, of course, that we are providing a timezone and not a timezone offset. You may be familiar with UTC—Coördinated Universal Time—which is often represented by the letter Z or called “Zulu time,” which is a reference to the nautical time zone (GMT.)
A consequence is that…
from datetime import datetime
from zoneinfo import ZoneInfo
tss = [
datetime(2020, x, 1, tzinfo=ZoneInfo('US/Eastern'))
for x in range(1, 12+1)
]
for ts in tss:
print(f'{ts:%a %d %b, %Y (%Z)} ({ts.utcoffset()})')
from datetime import datetime, timedelta
from itertools import groupby, pairwise
from zoneinfo import ZoneInfo
tss = [
datetime(2020, 1, 1, tzinfo=ZoneInfo('US/Eastern')) + timedelta(days=x, hours=y)
for x in range(366+1)
for y in range(24+1)
]
for (x, xs), (y, ys) in pairwise(groupby(tss, lambda ts: ts.utcoffset())):
print(f'-{-x} → -{-y} on {next(ys):%a %b %d, %Y @ %H:%M (%Z)}')
This is a single, coördinated, global reference point that we can “convert” a local measurement into. Given our timezone and some rules, we can derive a timezone offset which will typically be the offset from this reference point.
Storing UTC-time instead of a timezone-aware timestamp is a very common thing to do; it‘s a loss of information, but we might argue that we threw away information that might not have been strictly necessary for our use-case.
In the time module, the most useful thing is perf_counter or monotonic.
Don’t use time.time for measuring timings; it isn’t monotonic. Represent
timestamps using datetime for better human readability.
from time import time, monotonic, perf_counter
print(
f'{time() = }',
f'{monotonic() = }',
f'{perf_counter() = }',
sep='\n',
)
In datetime, we have date, datetime, timedelta, and time.
datetime represents the “wall-clock” date and time and, by default, is
timezone naïve.
date represents just the date; it can be thought of as a datetime value
with day-level fidelity. There is no way to represent a timezone aware date-only
in Python. Note that the Python date cannot represent dates before 1 AD.
timedelta represents a fixed delta between dates or datetimes.
time represents a time by itself.
from datetime import date, datetime, timedelta, time
print(
f'{date(2020, 1, 1) = }',
f'{datetime(2020, 1, 1, 9, 30) = }',
f'{timedelta(days=3) = }',
f'{time(9, 30) = }',
sep='\n',
)
We can represent a timezone-aware datetime in Python by using zoneinfo.
from zoneinfo import ZoneInfo
from datetime import datetime
dt = datetime(2020, 1, 1, tzinfo=ZoneInfo('US/Eastern'))
dt = datetime(2020, 1, 1).astimezone()
print(
f'{dt = }',
)
To determine the timezone (the Olson TZ name) in a portable manner may be tricky. Here is how I may do it using my system’s configuration.
from pathlib import Path
print(
Path('/etc/localtime').resolve().relative_to('/usr/share/zoneinfo')
)
In NumPy, we have a paramterised datetime64[…] dtype that can be used to
store datetime values in an int64 with flexible units, using the Unix Epoch.
from numpy import datetime64
w = datetime64('2020-01-01 09:30:00')
x = datetime64('2020-01-01 09:30:00', 's')
y = datetime64('2020-01-01 09:30:00', 'D')
z = datetime64('2020-01-01 09:30:00', 'Y')
print(
f'{w.astype(int) = :<16,} {w = }',
f'{x.astype(int) = :<16,} {x = }',
f'{y.astype(int) = :<16,} {y = }',
f'{z.astype(int) = :<16,} {z = }',
sep='\n',
)
from numpy import datetime64
for unit in ['Y', 'W', '4Y', '3M']:
x = datetime64('2020-01-01 09:30:00', unit)
print(f'{x.astype(int) = :<16,} {x = }')
There is a timedelta64[…] type as well.
from numpy import datetime64, timedelta64
x = datetime64('2020-01-01')
for unit in ['s', 'D', '7D', 'W']:
y = timedelta64('1', unit)
print(f'{x + y = }')
We can’t really do that much with a NumPy datetime64[ns].
from numpy import array
xs = array(['2020-01-01', '2020-01-02'], dtype='datetime64[s]')
ys = array(['2020-01-01', '2020-01-03'], dtype='datetime64[s]')
print(
f'{xs = }',
f'{ys = }',
f'{xs - ys = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
NumPy does not handle timezones!
In pandas, we have a Timestamp type to represent a single timestamp. It
extends the Python datetime.datetime type.
from datetime import datetime
from pandas import Timestamp
dt = datetime(2020, 1, 1, 9, 30)
ts = Timestamp(2020, 1, 1, 9, 30)
print(
f'{dt = }',
f'{ts = }',
f'{({*dir(ts)} ^ {*dir(dt)}) = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
We can see that it adds only a little bit on top of a datetime.datetime.
from pandas import Timestamp
ts = Timestamp(2020, 1, 1, 9, 30)
print(
f'{ts = }',
f'{ts.to_numpy() = }',
f'{ts.to_period("Y") = }',
f'{ts.is_month_start = }',
f'{ts.is_quarter_start = }',
f'{ts.tz_localize("US/Eastern") = }',
f'{ts.tz_localize("US/Eastern").tz_convert("US/Pacific") = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
pandas also provides a Timedelta type that extends the datetime.timedelta
type.
from pandas import Timedelta
td = Timedelta('3d')
print(
f'{td = }',
)
When representing single scalar values, where pandas really diverges from
what is available in pure Python is the Period type. A Period represents
an interval of time.
Here’s an interesting question: is a “date” a Timestamp or a Period?
from pandas import Period, Timestamp
p = Period('2020-01-01')
print(
f'{p = }',
f'{p.start_time = }',
f'{p.end_time = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
Of course, in pandas, we are most interested containers such as pandas.array
subtypes and pandas.Index subtypes.
The popular pandas.date_range gives us a DatetimeIndex.
from pandas import date_range
idx = date_range('2020-01-01', '2020-01-14')
print(
# f'{idx = }',
f'{idx.astype("datetime64[ms]") = }',
f'{idx.astype("datetime64[s]") = }',
# f'{idx.astype("datetime64[D]") = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
We also have a PeriodIndex which can be quite useful!
from pandas import period_range, Series, Timestamp
from numpy.random import default_rng
rng = default_rng(0)
idx = period_range('2020-01-01', '2020-01-14', freq='d')
s = Series(index=idx, data=rng.normal(size=len(idx)))
print(
s,
# s.loc['2020-01-01'],
# s.loc['2020-01-01':'2020-01-03'],
# s.loc['2020-01-01 09:00:00'],
# s.loc['2020-01-01 09:00:00':'2020-01-03 12:00:00'],
s.loc[lambda s: Timestamp('2020-01-01 09:00:00').to_period(s.index.freq)],
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
Finally, we have datetime, timedelta, and period array types.
from pandas import array, date_range, period_range, timedelta_range
xs = array(date_range('2020-01-01', periods=3))
ys = array(period_range('2020-01-01', periods=3))
zs = array(timedelta_range('1d', periods=3))
print(
xs,
ys,
zs,
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
The pandas.Series and pandas.DataFrame have been extended to call methods
on the DatetimeArray type via the .dt registered accessor. Furthermore,
they have been extended to support useful operations involving a DatetimeIndex.
from pandas import Series, date_range
from numpy.random import default_rng
rng = default_rng(0)
s = Series(
index=(idx := date_range('2020-01-01', freq='h', periods=48)),
data=rng.random(size=len(idx)),
)
print(
# s,
# s.between_time('09:00', '17:00'),
# s.index.between_time('09:00', '17:00'),
# s.index.indexer_between_time('09:00', '17:00'),
# s[s.index.indexer_between_time('09:00', '17:00')],
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
from pandas import merge_asof, Series, date_range, to_timedelta
from numpy.random import default_rng
rng = default_rng(0)
s0 = Series(
index=(idx := date_range('2020-01-01', freq='h', periods=8, name='timestamp')),
data=rng.random(size=len(idx)),
name='s0',
)
s1 = Series(
index=(idx := s0.index + to_timedelta(rng.uniform(0, 60*60), unit='s')),
data=rng.random(size=len(idx)),
name='s1',
)
print(
s0.head(),
s1.head(),
merge_asof(s0, s1, left_index=True, right_index=True),
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
from pandas import Series, date_range
s = Series(
data=date_range('2020-01-01', periods=4)
)
print(
s,
f'{s.dt = }',
f'{s.dt.year = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
from string import ascii_lowercase
from numpy.random import default_rng
from pandas import Series, Categorical, MultiIndex, to_datetime, to_timedelta
rng = default_rng(0)
s = Series(
index=(idx := MultiIndex.from_product([
rng.choice([*ascii_lowercase], size=(3, 4)).view('<U4').ravel(),
Categorical('available ready active'.split()),
], names=['entity', 'state'])),
data=(
to_datetime('2020-01-01')
+ to_timedelta(
rng.integers(14, size=(
idx.get_level_values('entity').nunique(),
idx.get_level_values('state').nunique(),
)).cumsum(-1).ravel(),
unit='d',
)
),
)
print(
s.head(),
s
.groupby(['entity', 'state'], observed=True).agg(
lambda g: {
'available': lambda g: g.head(1),
'ready': lambda g: g.tail(1),
'active': lambda g: g.tail(1),
}[g.index.get_level_values('state')[0]](g)
)
.groupby('entity', observed=True).agg(
lambda g: g.droplevel('entity').loc['active'] - g.droplevel('entity').loc['available']
)
.mean()
,
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
Unfortunately, there are still some gaps in pandas datetime functionality.
from string import ascii_lowercase
from numpy.random import default_rng
from pandas import Series, MultiIndex, date_range
rng = default_rng(0)
s = Series(
index=(idx := MultiIndex.from_product([
date_range('2020-01-01', periods=90),
rng.choice([*ascii_lowercase], size=(3, 4)).view('<U4').ravel(),
], names=['timestamp', 'entity'])),
data=(
rng.normal(loc=1, scale=0.01, size=(
idx.get_level_values('timestamp').nunique(),
idx.get_level_values('entity').nunique(),
)).cumprod(-1).ravel()
)
)
print(
s.head(),
# s.groupby('entity', observed=True).max(),
# s.groupby('entity', observed=True).cummax(),
# s.groupby('entity', observed=True).idxmax(),
# s.groupby('entity', observed=True).agg(lambda g: g.droplevel('entity').idxmax()),
# s.groupby('entity', observed=True).cumidxmax(),
# s.groupby('entity', observed=True).transform(
# lambda g: g.expanding().agg(lambda x: x.max())
# ),
s.groupby('entity', observed=True).transform(
lambda g: g.expanding().agg(lambda x: x.idxmax())
),
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
Arrow has support for many similar types as pandas, except it also has
support for separated dates and times. In arrow, we have a pyarrow.timestamp
and pyarrow.duration corresponding to our pandas.Timestamp and pandas.Timedelta.
from pyarrow import timestamp, duration, date64, time64
print(
f'{timestamp("ms") = }',
f'{timestamp("ns") = }',
f'{duration("s") = }',
f'{duration("ns") = }',
f'{date64() = }',
f'{time64("us") = }',
sep='\n',
)
from pandas import array as pd_array, date_range, period_range, timedelta_range
from pyarrow import array as pa_array
xs = pd_array(
date_range('2020-01-01', periods=4)
)
ys = pd_array(
period_range('2020-01-01', periods=4)
)
zs = pd_array(
timedelta_range('1d', periods=4)
)
print(
f'{pa_array(xs) = }',
f'{pa_array(xs).type = }',
f'{pa_array(ys) = }',
f'{pa_array(ys).type = }',
f'{pa_array(zs) = }',
f'{pa_array(zs).type = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
from pandas import array, date_range, period_range, timedelta_range
xs = array(date_range('2020-01-01', periods=4), dtype='timestamp[s][pyarrow]')
ys = array(timedelta_range('1d', periods=4), dtype='duration[s][pyarrow]')
print(
xs,
ys,
# f'{xs.day = }',
# f'{xs.astype("datetime64[s]").day = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
from datetime import date, time
from pandas import array
xs = array([date(2020, 1, 1), date(2020, 1, 2), date(2020, 1, 3)], dtype='date64[pyarrow]')
ys = array([time(9, 15), time(9, 30), time(9, 45)], dtype='time64[us][pyarrow]')
print(
xs,
ys,
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
Let’s look at how datetimes operate in common file formats.
Pickle should only ever be used to move data from your “left hand” to your “right hand.” In those cases, it’s a great choice, and it can perfectly maintain and represent any arbitrarily complex datetime value.
from datetime import datetime
from pickle import dumps, loads
from zoneinfo import ZoneInfo
dt = datetime(2020, 1, 1, 9, 30, tzinfo=ZoneInfo('US/Eastern'))
print(
f'{dt = }',
f'{loads(dumps(dt)) = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
CSV is an extremely common data format, but it is purely textual. Therefore,
common CSV readers and writers need to establish a way to represent
datetime values. pandas.Series.to_csv and pandas.read_csv chooses
ISO-8601/RFC-3339 for representing these.
In pure Python this looks like the below. Did you notice that we lost the timezone and only kept the timezone offset?
from datetime import datetime
from zoneinfo import ZoneInfo
dt = datetime(2020, 1, 1, 9, 30, tzinfo=ZoneInfo('US/Eastern'))
print(
f'{dt = }',
f'{dt.isoformat() = }',
f'{datetime.fromisoformat(dt.isoformat()) = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
This ends up getting us in trouble when using pandas…
from itertools import islice
from pathlib import Path
from string import ascii_lowercase
from tempfile import TemporaryDirectory
from numpy.random import default_rng
from pandas import MultiIndex, Series, date_range, read_csv
rng = default_rng(0)
s = Series(
index=(idx := MultiIndex.from_product([
date_range('2020-01-01', '2020-12-31', freq='h'),
rng.choice([*ascii_lowercase], size=(3, 4)).view('<U4').ravel(),
], names=['timestamp', 'entity'],
)),
data=rng.normal(size=len(idx)),
).sort_index()
s = s.pipe(lambda s: s
.set_axis(MultiIndex.from_arrays([
# s.index.get_level_values('timestamp').tz_localize('US/Eastern'),
s.index.get_level_values('timestamp').tz_localize('UTC').tz_convert('US/Eastern'),
s.index.get_level_values('entity'),
], names=s.index.names))
)
with TemporaryDirectory() as d:
d = Path(d)
s.to_csv(filename := (d / 's.csv'))
with open(filename) as f:
for ln in islice(f, 3):
print(f'{ln = }')
s = read_csv(
filename,
parse_dates=['timestamp'],
index_col=['timestamp', 'entity'],
).squeeze(axis='columns')
print(
# s,
s.index.get_level_values('timestamp'),
s.pipe(lambda s: s
.set_axis(
MultiIndex.from_arrays([
[x.tz_convert('US/Eastern') for x in s.index.get_level_values('timestamp')],
s.index.get_level_values('entity'),
], names=s.index.names)
)
),
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
from itertools import islice
from pathlib import Path
from string import ascii_lowercase
from tempfile import TemporaryDirectory
from numpy.random import default_rng
from pandas import MultiIndex, Series, date_range, read_feather, read_parquet
rng = default_rng(0)
s = Series(
index=(idx := MultiIndex.from_product([
date_range('2020-01-01', '2020-12-31', freq='h'),
rng.choice([*ascii_lowercase], size=(3, 4)).view('<U4').ravel(),
], names=['timestamp', 'entity'],
)),
data=rng.normal(size=len(idx)),
).sort_index()
s = s.pipe(lambda s: s
.set_axis(MultiIndex.from_arrays([
s.index.get_level_values('timestamp').tz_localize('UTC').tz_convert('US/Eastern'),
s.index.get_level_values('entity'),
], names=s.index.names))
)
with TemporaryDirectory() as d:
d = Path(d)
s.to_frame().to_feather(filename := (d / 's.feather'))
s = read_feather(
filename,
).squeeze(axis='columns')
print(
s,
s.index.get_level_values('timestamp'),
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
s.to_frame().to_parquet(filename := (d / 's.parquet'))
s = read_parquet(
filename,
).squeeze(axis='columns')
print(
s,
s.index.get_level_values('timestamp'),
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
pytzNotice that all previous examples avoided use of pytz. Generally, with
zoneinfo in the Python standard library and with timezone functionality
accessed via pandas .tz_convert and .tz_localize there is not a strong
reason to use pytz.
Especially with raw numerical datetime values, be aware of the distinction between microsecond and nanosecond precision.
from datetime import datetime
from pandas import Timestamp
# microsecond precision
dt = datetime(2020, 1, 1, microsecond=1)
ts = Timestamp(2020, 1, 1, nanosecond=1)
print(
f'{dt = }',
f'{ts = }',
f'{Timestamp(ts.to_pydatetime()) == ts = }',
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
from pathlib import Path
from tempfile import TemporaryDirectory
from pandas import Series, date_range, to_timedelta, read_parquet
s0 = Series(date_range('2020-01-01', periods=4) + to_timedelta('1ns'), name='s').astype('datetime64[ns]')
with TemporaryDirectory() as d:
d = Path(d)
s0.to_frame().to_parquet(filename := (d / 's.parquet'))
s1 = read_parquet(filename).squeeze('columns')
print(
s0,
s1,
s0 == s1,
sep=f'\n{"\N{box drawings light horizontal}"*40}\n',
)
We may see an unusual timezone “GMT+2:00” referenced as a (legacy) global timezone. Unfortunately, “GMT+2:00” does not exist as a database identifier and is not guaranteed to be understood by Python tooling. Confusingly, we may have to use “Etc/GMT-2” to represent this in our code.
from string import ascii_lowercase
from numpy.random import default_rng
from pandas import Series, MultiIndex, date_range
rng = default_rng(0)
s = Series(
index=(idx := MultiIndex.from_product([
# date_range('2020-01-01', periods=3).tz_localize('GMT+2:00'), # “global timezone”
date_range('2020-01-01', periods=3).tz_localize('Etc/GMT-2'), # “global timezone”
rng.choice([*ascii_lowercase], size=(3, 4)).view('<U4').ravel(),
], names='timestamp entity'.split())),
data=rng.normal(size=len(idx)),
)
print(
s,
f'{s.index.get_level_values("timestamp")[0].utcoffset() = !s}',
)
Sometimes we want to represent dates, independent of times.
There are approximately five ways to do this in pandas…
datetime.datePeriod or PeriodIndexdatetime64[ns] normalized to midnightdatetime64[ns] normalized to midnight in UTCdatetime64[ns] normalized to midnight a relevant timezoneWhich do we choose?
from datetime import date
from numpy.random import default_rng
from pandas import Series, date_range, period_range
rng = default_rng(0)
s = Series(
# # index=(idx := [date(2020, 1, 1), date(2020, 1, 2), date(2020, 1, 3)]),
# # index=(idx := period_range('2020-01-01', periods=3)),
index=(idx := date_range('2020-01-01', periods=3)),
# index=(idx := date_range('2020-01-01', periods=3).tz_localize('UTC')),
index=(idx := date_range('2020-01-01', periods=3).tz_localize('US/Eastern')),
data=rng.normal(size=len(idx)),
)
print(
s,
)