pandas I: pandas is well designed, actually!Keywords: understanding datatypes in pandas including pandas.array, pandas.Series, pandas.DataFrame; nullable ints, pandas.Categorical; resampling, masking
| Presenter | James Powell james@dutc.io |
| Date | Wednesday, November 18, 2020 |
| Time | 3:30 PM EST |
print('Good afternoon!')
# Given the below data,
from io import StringIO
data = StringIO('''
Python 3.5.9 xs = range(1000000); f = lambda x: x**2 [f(x) for x in xs] 5 loops, best of 3: 507 msec per loop
Python 3.5.9 xs = range(1000000); f = lambda x: x**2 list(map(f, xs)) 5 loops, best of 3: 511 msec per loop
Python 3.5.9 xs = range(1000000); f = lambda x: x**2 [*map(f, xs)] 5 loops, best of 3: 519 msec per loop
Python 3.5.9 xs = range(1000000) [x**2 for x in xs] 5 loops, best of 3: 403 msec per loop
Python 3.6.10 xs = range(1000000); f = lambda x: x**2 [f(x) for x in xs] 5 loops, best of 3: 533 msec per loop
Python 3.6.10 xs = range(1000000); f = lambda x: x**2 list(map(f, xs)) 5 loops, best of 3: 475 msec per loop
Python 3.6.10 xs = range(1000000); f = lambda x: x**2 [*map(f, xs)] 5 loops, best of 3: 478 msec per loop
Python 3.6.10 xs = range(1000000) [x**2 for x in xs] 5 loops, best of 3: 394 msec per loop
Python 3.7.7 xs = range(1000000); f = lambda x: x**2 [f(x) for x in xs] 5 loops, best of 3: 509 msec per loop
Python 3.7.7 xs = range(1000000); f = lambda x: x**2 list(map(f, xs)) 5 loops, best of 3: 441 msec per loop
Python 3.7.7 xs = range(1000000); f = lambda x: x**2 [*map(f, xs)] 5 loops, best of 3: 448 msec per loop
Python 3.7.7 xs = range(1000000) [x**2 for x in xs] 5 loops, best of 3: 392 msec per loop
Python 3.8.3 xs = range(1000000); f = lambda x: x**2 [f(x) for x in xs] 5 loops, best of 3: 463 msec per loop
Python 3.8.3 xs = range(1000000); f = lambda x: x**2 list(map(f, xs)) 5 loops, best of 3: 412 msec per loop
Python 3.8.3 xs = range(1000000); f = lambda x: x**2 [*map(f, xs)] 5 loops, best of 3: 402 msec per loop
Python 3.8.3 xs = range(1000000) [x**2 for x in xs] 5 loops, best of 3: 328 msec per loop
Python 3.9.0b1 xs = range(1000000); f = lambda x: x**2 [f(x) for x in xs] 5 loops, best of 3: 632 msec per loop
Python 3.9.0b1 xs = range(1000000); f = lambda x: x**2 list(map(f, xs)) 5 loops, best of 3: 575 msec per loop
Python 3.9.0b1 xs = range(1000000); f = lambda x: x**2 [*map(f, xs)] 5 loops, best of 3: 573 msec per loop
Python 3.9.0b1 xs = range(1000000) [x**2 for x in xs] 5 loops, best of 3: 503 msec per loop
''')
# TASK: given the above performance measurements,
# find the worst case in which an assumption
# of the fastest code for some version of Python
# leads to the slowest code for some other version
# of Python
# TASK: repeat the above, excluding the pure-list-comprehension
# entries (i.e., `[xs**2 for x in xs]`)