ts-python

applied pandas I: pandas is well designed, actually!

Discussion (Wed Nov 18, 2020; 3:30 PM EST)

Keywords: understanding datatypes in pandas including pandas.array, pandas.Series, pandas.DataFrame; nullable ints, pandas.Categorical; resampling, masking

Presenter James Powell james@dutc.io
Date Wednesday, November 18, 2020
Time 3:30 PM EST
print('Good afternoon!')
# Given the below data, 
from io import StringIO
data = StringIO('''
Python 3.5.9	xs = range(1000000); f = lambda x: x**2	[f(x) for x in xs]	5 loops, best of 3: 507 msec per loop
Python 3.5.9	xs = range(1000000); f = lambda x: x**2	list(map(f, xs))	5 loops, best of 3: 511 msec per loop
Python 3.5.9	xs = range(1000000); f = lambda x: x**2	[*map(f, xs)]	5 loops, best of 3: 519 msec per loop
Python 3.5.9	xs = range(1000000)	[x**2 for x in xs]	5 loops, best of 3: 403 msec per loop
Python 3.6.10	xs = range(1000000); f = lambda x: x**2	[f(x) for x in xs]	5 loops, best of 3: 533 msec per loop
Python 3.6.10	xs = range(1000000); f = lambda x: x**2	list(map(f, xs))	5 loops, best of 3: 475 msec per loop
Python 3.6.10	xs = range(1000000); f = lambda x: x**2	[*map(f, xs)]	5 loops, best of 3: 478 msec per loop
Python 3.6.10	xs = range(1000000)	[x**2 for x in xs]	5 loops, best of 3: 394 msec per loop
Python 3.7.7	xs = range(1000000); f = lambda x: x**2	[f(x) for x in xs]	5 loops, best of 3: 509 msec per loop
Python 3.7.7	xs = range(1000000); f = lambda x: x**2	list(map(f, xs))	5 loops, best of 3: 441 msec per loop
Python 3.7.7	xs = range(1000000); f = lambda x: x**2	[*map(f, xs)]	5 loops, best of 3: 448 msec per loop
Python 3.7.7	xs = range(1000000)	[x**2 for x in xs]	5 loops, best of 3: 392 msec per loop
Python 3.8.3	xs = range(1000000); f = lambda x: x**2	[f(x) for x in xs]	5 loops, best of 3: 463 msec per loop
Python 3.8.3	xs = range(1000000); f = lambda x: x**2	list(map(f, xs))	5 loops, best of 3: 412 msec per loop
Python 3.8.3	xs = range(1000000); f = lambda x: x**2	[*map(f, xs)]	5 loops, best of 3: 402 msec per loop
Python 3.8.3	xs = range(1000000)	[x**2 for x in xs]	5 loops, best of 3: 328 msec per loop
Python 3.9.0b1	xs = range(1000000); f = lambda x: x**2	[f(x) for x in xs]	5 loops, best of 3: 632 msec per loop
Python 3.9.0b1	xs = range(1000000); f = lambda x: x**2	list(map(f, xs))	5 loops, best of 3: 575 msec per loop
Python 3.9.0b1	xs = range(1000000); f = lambda x: x**2	[*map(f, xs)]	5 loops, best of 3: 573 msec per loop
Python 3.9.0b1	xs = range(1000000)	[x**2 for x in xs]	5 loops, best of 3: 503 msec per loop
''')

# TASK: given the above performance measurements,
#       find the worst case in which an assumption
#       of the fastest code for some version of Python
#       leads to the slowest code for some other version
#       of Python
# TASK: repeat the above, excluding the pure-list-comprehension
#       entries (i.e., `[xs**2 for x in xs]`)