ts-python

Seminar IV: “The Caine Mutiny” (effective use of the Python standard library)

Materials:

Lt. Cmdr. Philip Francis Queen (Humphrey Bogart): Aboard my ship, excellent performance is standard, standard performance is sub-standard, and sub-standard performance is not permitted to exist. That, I warn you.

The Caine Mutiny (1954)

Date Time Track Meeting Link
July 16, 2021 9:30 AM EST Improving Use of Common Libraries Seminar IV: “The Caine Mutiny”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill-levels.

Our expected audience should comprise attendees with a…

… or greater!

During this session, we will endeavour to guide our audience to developing…

… and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

Let’s turn our attention to the Python standard library and how it provides a set of first-approximation tools for helping us accomplish common scripting tasks. These scripting tasks typically “surround” our analytical use-cases—they may not be intimately or directly related to the analysis itself, but they provide functionality that supports some data modeling, data cleaning, or other automation-related capability.

In this episode, we’ll look at the Python standard library, focusing on libraries such as pathlib, collections, tempfile, functools, textwrap, itertools, argparse, and others.

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue to add scripting, structuring, and automation surrounding the analytical tasks in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Contents

Notes

print("Let's get started!")

What is the standard library?

“Batteries included”?

We will be talking about the the Python Standard Library, which is libraries that are packaged directly with most distributions of Python. Some of these packages are considered to be inextricable from Python (in that they are used internally by the interpreter,) and others are generally assumed to always be available by end-user code.

Separate from this is an unofficial notion of PyData Standard Toolkit or “ecosystem”, which is a very informal designation comprised of major, mature tools like numpy, scipy, pandas, and other major tools and their direct dependencies. The majority of these tools are NumFOCUS sponsored or affiliated.

Separate from this is an unofficial notion of “what we usually got,” which consists of other major, mature common tools that are not direct dependencies of the above, but are so useful that they are almost always available in most internal distributions of Python. e.g., requests or httpx, xlrd and xlwt, &c.

The Python Standard Library serves two purposes…

  1. allow core developers of Python to access dependencies that are guaranteed to be maintained and available.
  2. allow users of Python to access first-order approximations to many types of problems.

An example of ①:

from graphlib import TopologicalSorter
graph = {
    'b': {'c'},
    'c': {'d'},
    'd': {'e'},
}
ts = TopologicalSorter(graph)
ts.add('a', 'b', 'c')
print(f'{[*ts.static_order()] = }')
from networkx import DiGraph
from networkx.algorithms.dag import topological_sort

g = DiGraph()
g.add_edge('a', 'b')
g.add_edge('b', 'c')
g.add_edge('c', 'd')
g.add_edge('d', 'e')
print(f"{[*topological_sort(g)] = }")

An example of ②:

from math import sin, exp, pi
from statistics import mean, median, pvariance

print(f'{sin(pi)       = :.2f}')
print(f'{exp(1)        = :.2f}')
print(f'{exp(pi * 1j) = }')

xs = [1, 2, 3, 4]
print(f'{mean(xs)      = }')
print(f'{median(xs)    = }')
print(f'{pvariance(xs) = }')
from numpy import array, sin, exp, pi

print(f'{sin(pi)       = :.2f}')
print(f'{exp(1)        = :.2f}')
print(f'{exp(pi * 1j)  = :.2f}')

xs = array([1, 2, 3, 4], dtype='float64')
print(f'{xs.mean()     = }')
print(f'{xs.var()      = }')
from sympy import exp, pi, I

print(f'{exp(pi * I).simplify() = }')
from array import array
from struct import pack

xs = array('d')
xs.extend([1, 2, 3, 4])

print(f'{xs      = }')
print(f'{xs + xs = }')

ys = pack('4d', 1, 2, 3, 4)
#  print(f'{ys      = }')

Other examples

We will not be able to survey the entire standard library. There are useful parts that are likely to be outside of our immediate use-cases that we will largely skip:

from decimal import Decimal, localcontext

with localcontext() as ctx:
    ctx.prec = 10
    x, y = Decimal('1'), Decimal('3')
    print(f'{x / y = }')
    x, y = Decimal('1'), Decimal('10')
    print(f'{x / y = }')
from ipaddress import IPv4Address, IPv4Network

ip  = IPv4Address('192.168.1.100')
net = IPv4Network('192.168.1.0/24')

print(f'{ip.is_loopback = }')
print(f'{ip in net      = }')

Gratuitous reminder:

from pandas import Series, DataFrame
from enum import Enum
from numpy.random import default_rng
from functools import total_ordering
from dataclasses import dataclass

rng = default_rng(0)

@dataclass
class Name:
    value : str
    __hash__ = lambda s: hash(s.value)

@total_ordering
class OrderedEnum(Enum):
  __eq__ = lambda s, o: isinstance(o, Stars) and s.value == o.value
  __lt__ = lambda s, o: isinstance(o, Stars) and s.value < o.value
  __hash__ = lambda s: hash(s.value)

Stars  = Enum('Stars',  'Sol Sirius Epsilon Wolf', type=OrderedEnum)
Assets = Enum('Assets', 'Medicine Software Uranium StarGems Credits', type=OrderedEnum)
Assets.Tradeable = {*Assets} - {Assets.Credits}

market = DataFrame({
  asset: Series(
      rng.random(size=(sz := rng.integers(2, len(Stars) // 2 + 1))) * 1_000,
      index=rng.choice([*Stars], size=sz, replace=False)
  )
  for asset in Assets.Tradeable
}).round(2)
market[Assets.Credits] = 1
market = market.sort_index()
market.index.name, market.columns.name = Name(Assets), Name(Stars)

inventory = Series(
  rng.integers(10, 1_000, size=len(Assets.Tradeable)),
  index=[*Assets.Tradeable],
)
inventory[Assets.Credits] = 1_000
inventory = inventory.sort_index()
inventory.index.name = Name(Assets)

print(
    #  market,
    #  inventory,
    #  market * inventory,
    #  (market * inventory).sum(axis='columns'),
    #  (market * inventory).sum(axis='columns').idxmax(),
)

typing.Literal

def f(mode=True):
    pass
def f(mode : str = 'up'):
    ''' mode can be "up", "down", "left", or "right" '''
    pass
from typing import Literal

Mode = Literal['up', 'down', 'left', 'right']
def f(mode : Mode = 'up'):
    pass

print(f'{Mode          = }')
print(f'{Mode.__args__ = }')

enum.Enum

from enum import Enum
from random import choice

Mode = Enum('Mode', 'Up Down Left Right')

print(f"{Mode['Up']      = }")

m = Mode['Down']
print(f'{m is Mode.Down  = }')

print(f'{[*Mode]         = }')
print(f'{choice([*Mode]) = }')
from enum import Enum, auto

class Mode(Enum):
    Up    = auto()
    Down  = auto()
    Left  = auto()
    Right = auto()

print(f'{[*Mode] = }')
from enum import Enum, auto
from numpy import array

class Mode(Enum):
    Up    = [+1,  0]
    Down  = [-1,  0]
    Left  = [ 0, +1]
    Right = [ 0, -1]
    def __add__(self, other):
        if isinstance(other, Mode):
            return self.value + other.value
        return self.value + other
    __radd__ = __add__

print(f'{[*Mode] = }')

pos = array([0, 0])
moves = [Mode.Up, Mode.Down, Mode.Left, Mode.Left]
print(f'{sum(array(m.value) for m in moves)       = }')
print(f'{pos + sum(array(m.value) for m in moves) = }')
from enum import Enum, auto
from numpy import array

class Mode(Enum):
    Up    = array([+1,  0])
    Down  = array([-1,  0])
    Left  = array([ 0, +1])
    Right = array([ 0, -1])

collections.namedtuple

from collections import namedtuple

objs = [
    ('xyz', 123, {'a', 'b', 'c'}),
    ('def', 456, {'a', 'd'}),
]

for name, score, choices in objs:
    print(f'{name = } has choices {choices = }')

for obj in objs:
    print(f'{obj[0] = } has choices {obj[-1] = }')
from collections import namedtuple

Entrant = namedtuple('Entrant', 'name score choices')
objs = [
    Entrant('xyz', 123, {'a', 'b', 'c'}),
    Entrant('def', 456, {'a', 'd'}),
]

for name, score, choices in objs:
    print(f'{name = } has choices {choices = }')

for obj in objs:
    print(f'{obj[0] = } has choices {obj[-1] = }')

for obj in objs:
    print(f'{obj.name = } has choices {obj.choices = }')
from collections import namedtuple

class Entrant(namedtuple('Entrant', 'name score choices')):
    def __new__(cls, name, score, choices=set()):
        if score < 0:
            raise ValueError('score must be positive')
        return super().__new__(cls, name, score, choices)

objs = [
    Entrant('xyz', 123, {'a', 'b', 'c'}),
    Entrant('def', 456),
]

for obj in objs:
    print(f'{obj = }')

json vs simplejson

from collections import namedtuple
from json import dumps

Entrant = namedtuple('Entrant', 'name score choices')
objs = [
    Entrant('xyz', 123, {'a', 'b', 'c'}),
    Entrant('def', 456, {'a', 'd'}),
]

for obj in objs:
    print(f'{dumps(obj) = }')
from collections import namedtuple
from json import dumps

Entrant = namedtuple('Entrant', 'name score choices')
objs = [
    Entrant('xyz', 123, {'a', 'b', 'c'}),
    Entrant('def', 456, {'a', 'd'}),
]

def default(obj):
    if isinstance(obj, set):
        return [*obj]
    if isinstance(obj, Entrant):
        return obj._asdict()
        
for obj in objs:
    print(f'{dumps(obj, default=default) = }')
from collections import namedtuple
from simplejson import dumps

Entrant = namedtuple('Entrant', 'name score choices')
objs = [
    Entrant('xyz', 123, {'a', 'b', 'c'}),
    Entrant('def', 456, {'a', 'd'}),
]

def default(obj):
    if isinstance(obj, set):
        return [*obj]
        
for obj in objs:
    print(f'{dumps(obj, default=default) = }')

collections

from collections import deque
xs = deque([1, 2, 3, 4, 5])
xs.append(6)
xs.append(7)

while xs:
    print(f'{xs.popleft() = }')
from collections import deque
xs = deque(maxlen=3)
xs.append(1)
xs.append(2)
xs.append(3)
xs.append(4)
print(f'{xs = }')
from collections import defaultdict

d = defaultdict(int)
print(f"{d['abc'] = }")
print(f"{d['def'] = }")
print(f'{d        = }')
class passthru(dict):
    def __missing__(self, key):
        return key
        
d = passthru({
    'abc': 'ABC',
})
print(f"{d['abc'] = }")
print(f"{d['ABC'] = }")
from collections import ChainMap

layer0 = {'abc': 123,           }
layer1 = {            'def': 456}
layer2 = {'abc': 789,           }

cm = ChainMap(layer2, layer1, layer0)
print(f"{cm['abc'] = }")
print(f"{cm['def'] = }")
print(f'{cm.maps   = }')
from collections import ChainMap, deque

layer0 = {'abc': 123,           }
layer1 = {            'def': 456}
layer2 = {'abc': 789,           }

cm = ChainMap()
cm.maps = deque()
cm.maps.extend([layer2, layer1, layer0])

print(f"{cm['abc'] = }")
cm.maps.popleft()
print(f"{cm['abc'] = }")
from collections import OrderedDict
od1 = OrderedDict({'a': 1, 'b': 2, 'c': 3})
od2 = OrderedDict({'c': 3, 'b': 2, 'a': 1})

print(f'{od1 == od2    = }')
print(f'{od1.popitem() = }')
from collections import Counter

c = Counter('aaabbccddddd')
print(f'{c = }')
from collections import Counter

c1 = Counter('abc')
c2 = Counter('bcd')

print(f'{c1 + c2 = }')
print(f'{c1 & c2 = }')
print(f'{c1 | c2 = }')
from collections import UserDict
class mydict(UserDict):
    def __setitem__(self, key, value):
        super().__setitem__(key.upper(), value)

d = mydict({'abc': 123})
d.update({'def': 456})
d['xyz'] = 789
print(f'{d = }')
from collections.abc import MutableMapping

class mydict(MutableMapping):
    def __init__(self, value={}):
        self._d = {}
        for k, v in value.items():
            self[k] = v
    def __getitem__(self, key):
        return self._d[key]
    def __setitem__(self, key, value):
        self._d[key.upper()] = value
    def __delitem__(self, key):
        del self._d[key]
    def __iter__(self):
        return iter(self._d)
    def __len__(self):
        return len(self._d)
    def __repr__(self):
        return f'mydict({self._d!r})'
    
d = mydict({'abc': 123})
d.update({'def': 456})
d['xyz'] = 789
print(f'{d = }')

dataclasses.dataclass

from dataclasses import dataclass

@dataclass
class Entrant:
    name    : str
    score   : int
    choices : set = ()

    def __post_init__(self):
        if self.score < 0:
            raise ValueError('score must be positive')
        self.choices = {*self.choices}

objs = [
    Entrant('xyz', 123, {'a', 'b', 'c'}),
    Entrant('def', 456, {}),
]

for obj in objs:
    print(f'{obj = }')
from attr import attrs, attrib, validators
from collections.abc import Iterable

@attrs(order=True)
class Entrant:
    name    : str = attrib(validator=validators.instance_of(str))
    score   : int = attrib(validator=validators.instance_of(int))
    choices : set = attrib(validator=validators.instance_of(Iterable),
                           factory=set, converter=set, kw_only=True)

objs = [
    Entrant('xyz', 123, choices=['a', 'b', 'c']),
    Entrant('def', 456),
]

for obj in objs:
    print(f'{obj = }')
from collections.abc import Iterable

class Entrant:
    def __init__(self, name, score, choices=()):
        if score < 0:
            raise ValueError('score must be positive')
        self.name, self.score = name, score
        self.choices = {*choices}
    def __repr__(self):
        return f'Entrant({self.name!r}, {self.score!r}, {self.choices!r})'
    def __eq__(self, other):
        return self.name == other.name and self.score == other.score and self.choices == other.choices

objs = [
    Entrant('xyz', 123, choices=['a', 'b', 'c']),
    Entrant('def', 456),
]

for obj in objs:
    print(f'{obj = }')

functools.total_ordering

from enum import Enum, auto
from functools import total_ordering

@total_ordering
class Hands(Enum):
    Straight      = auto() 
    Flush         = auto()
    StraightFlush = auto()
    RoyalFlush    = auto()

    def __lt__(self, other):
        return self.value < other.value
    def __eq__(self, other):
        return self.value == other.value

print(f'{Hands.Straight      <  Hands.RoyalFlush = }')
print(f'{Hands.StraightFlush >= Hands.RoyalFlush = }')

functools

functools.reduce

from functools import reduce
from operator import add, mul

xs = [1, 2, 3, 4, 5]
print(f'{reduce(add, xs) = }')
print(f'{reduce(mul, xs) = }')
from numpy import array, product
from pandas import Series

xs = array([1, 2, 3, 4, 5])
print(f'{xs.sum()  = }')
print(f'{xs.prod() = }')

s = Series([1, 2, 3, 4, 5])
print(
    s.expanding().sum(),
    s.expanding().apply(product),
    sep='\n\n'
)

functools.partial

from functools import partial
from functools import wraps

def f(a, b, c):
    return a + b - c

g = partial(f, c=0)
print(f'{g(1, 2) = }')

help(g)

g = wraps(f)(lambda *a, **kw: f(*a, **kw, c=0))
g.__doc__ = '\n'.join([g.__doc__ or '', 'Fixing c = 0'])
print(f'{g(1, 2) = }')

help(g)

functools.wraps

from functools import wraps

def dec(f):
    @wraps(f)
    def inner(*args, **kwargs):
        return f(*args, **kwargs) + 1
    return inner

@dec
def f(x, y):
    ''' adds x and y '''
    return x + y

help(f)

functools.lru_cache

from functools import lru_cache
from time import sleep, perf_counter

@lru_cache
def f(x, y):
    sleep(1)
    return x + y

start = perf_counter()
print(f'{f(1, 1)     = }')
print(f'{f(1, 1)     = }')
print(f'{f(1, 1)     = }')
print(f'{f(x=1, y=1) = }')
print(f'{f(1, y=1)   = }')
stop = perf_counter()
print(f'\N{mathematical bold capital delta}t: {stop - start:.2f}s')

inspect.signature

from inspect import signature
from time import sleep, perf_counter

class memoise(dict):
    def __init__(self, f):
        self.f, self.sig = f, signature(f)
    def __call__(self, *args, **kwargs):
        key = self.sig.bind(*args, **kwargs)
        return self[key.args, frozenset(key.kwargs.items())]
    def __missing__(self, key):
        args, kwargs = key
        self[key] = self.f(*args, **dict(kwargs))
        return self[key]

@memoise
def f(x, y):
    sleep(1)
    return x + y

start = perf_counter()
print(f'{f(1, 1)     = }')
print(f'{f(1, 1)     = }')
print(f'{f(1, 1)     = }')
print(f'{f(x=1, y=1) = }')
print(f'{f(1, y=1)   = }')
stop = perf_counter()
print(f'\N{mathematical bold capital delta}t: {stop - start:.2f}s')

inspect

def dec(f):
    pass

@dec
def f():
    pass

def dec(arg):
    def inner_dec(f):
        pass
    return inner_dec

@dec(...)
def f():
    pass
from inspect import getsource
from ast import parse

def dec(f):
    #  print(f'{getsource(f).splitlines()[0] = }')
    print(f'{parse(getsource(f)).body[0].decorator_list = }')
    return f

@dec
def f():
    pass
from inspect import signature
from itertools import chain

print(f'{signature(len)   = }')
print(f'{signature(chain) = }')
from inspect import getsource
from json import loads
from itertools import chain
#  print(f'{getsource(chain) = }')

time.perf_counter, time.perf_counter_ns

from time import perf_counter, perf_counter_ns, sleep

start = perf_counter()
sleep(1)
stop  = perf_counter()
print(f'\N{mathematical bold capital delta}t: {stop - start:.2f}s')

start = perf_counter_ns()
sleep(1)
stop  = perf_counter_ns()
print(f'\N{mathematical bold capital delta}t: {stop - start:.0f}ns')

contextlib.contextmanager

from contextlib import contextmanager
from time import sleep, perf_counter

@contextmanager
def timed(msg=''):
    start = perf_counter()
    try:
        yield
    finally:
        stop  = perf_counter()
        print(f'{msg} \N{mathematical bold capital delta}t: {stop - start:.2f}s')

with timed('one-second nap'):
    sleep(1)

abc.ABC

from abc import ABC, abstractmethod

class Interface(ABC):
    @abstractmethod
    def f(self, a, b):
        pass

class BadImplementation(Interface):
    pass

#  obj = BadImplementation()

class BadImplementation(Interface):
    def f(self):
        pass
        
obj = BadImplementation()
from inspect import signature
from collections.abc import Callable

def abstractmethod(f):
    f.abstract = True
    return f

class Interface:
    @abstractmethod
    def f(self, a, b):
        pass
        
    def __init_subclass__(cls):
        methods = {name: meth for name in dir(Interface)
                   if isinstance(meth := getattr(Interface, name), Callable)
                       and getattr(meth, 'abstract', False)}

        for name, meth in methods.items():
            if not hasattr(cls, name) or getattr(cls, name) is meth:
                raise TypeError(f'{cls} missing method {name}') 
            if signature(getattr(cls, name)) != signature(meth):
                raise TypeError(f'{cls} mismatched signature on {name}') 

try:
    class BadImplementation(Interface):
        pass
except Exception as e:
    print(f'{e = }')

try:
    class BadImplementation(Interface):
        def f(self):
            pass
except Exception as e:
    print(f'{e = }')

class GoodImplementation(Interface):
    def f(self, a, b):
        pass
from inspect import signature

def make_interface(methods):
    def interface(cls):
        for name, meth in methods.items():
            if not hasattr(cls, name) or getattr(cls, name) is meth:
                raise TypeError(f'{cls} missing method {name}') 
            if signature(getattr(cls, name)) != signature(meth):
                raise TypeError(f'{cls} mismatched signature on {name}') 
        return cls
    return interface

interface = make_interface({
    'f': lambda self, a, b: None
})

try:
    @interface
    class BadImplementation:
        pass
except Exception as e:
    print(f'{e = }')

try:
    @interface
    class BadImplementation:
        def f(self):
            pass
except Exception as e:
    print(f'{e = }')

@interface
class GoodImplementation:
    def f(self, a, b):
        pass

collections.abc

from collections.abc import Callable, Iterable, Container, Sized
from pandas import Series
from numpy import array

def f(): pass
class g: pass
h = lambda: None

print(
    f'{isinstance(f, Callable)   = }',
    f'{isinstance(g, Callable)   = }',
    f'{isinstance(h, Callable)   = }',
    sep='\n',
)

xs = [1, 2, 3]
ys = {1, 2, 3}
zs = Series([1, 2, 3])

print(
    f'{isinstance(xs, Iterable)  = }',
    f'{isinstance(ys, Iterable)  = }',
    f'{isinstance(zs, Iterable)  = }',
    sep='\n',
)

s = 'abc'
print(
    #  f'{isinstance(xs, Iterable)  = }',
    #  f'{isinstance(xs, Iterable) and not isinstance(xs, str) = }',
    sep='\n',
)


xs = array([1, 2, 3])
print(
    f'{isinstance(xs, Container) = }',
    f'{isinstance(xs, Sized)     = }',
    f'{len(xs)                   = }',
    f'{bool(xs)                  = }',
    sep='\n',
)

You’ll see this in the pandas codebase:

xs = [1, 2, 3]
try:
    iter(xs)
    len(xs)
except Exception as e:
    pass
from enum import Enum
from pandas import DataFrame
from numpy import zeros

en = Enum('Enum', 'a b c')
df = DataFrame(zeros((len(en), len(en))))
df.index   = [*en]
df.columns = [*en]
df.index.name = df.columns.name = en
print(
    df,
)

textwrap

class A:
    values = ['a', 'b', 'c', 'd', 'e']
print(f'{A.values = }')

class A:
    values = '''
        a b c d e
    '''.split()
print(f'{A.values = }')
from textwrap import dedent, indent, shorten

class A:
    message = dedent('''
        Some
        Message
    ''').strip()
print(f'{A.message = }')

print(
    indent(A.message, '----'),
)
print(
    shorten('Some long message', 10, placeholder='...'),
)

pathlib, tempfile

from pathlib import Path

curdir = Path('.')

for path in curdir.iterdir():
    if path.is_file():
        print(f'{path        = }')
        print(f'{path.suffix = }')
        print(f'{path.with_stem(path.stem.upper()) = }')
        print(f"{path.with_suffix(path.suffix + '.gz') = }")
        #  print(f'{path.stat() = }')
        break

datadir = (curdir / 'a/b/c').mkdir(parents=True, exist_ok=True)
from tempfile import TemporaryFile, NamedTemporaryFile

with TemporaryFile(mode='w+t') as f:
    f.write('abc')
    f.seek(0)
    print(f'{f.read() = }')

with NamedTemporaryFile(mode='w+t') as f:
    f.write('abc')
    print(f'{f.name = }')
from tempfile import TemporaryDirectory
from pathlib import Path

with TemporaryDirectory(prefix='test-') as d:
    d = Path(d)
    print(f'{d = }')

itertools

A generalisation of iteration helpers:

A tool for building simple, powerful iteration helpers!

from itertools import tee, islice, zip_longest, repeat, chain

nwise = lambda g, n=2: zip(*(islice(g, idx, None) for idx, g in enumerate(tee(g, n))))
nwise_longest = lambda g, n=2, fv=object: zip_longest(*(islice(g, idx, None) for idx, g in enumerate(tee(g, n))), fillvalue=fv)
first = lambda g, n=1: zip(g, chain(repeat(True, n), repeat(False)))
last  = lambda g, m=1, s=object(): ((x, y[-1] is s) for x, *y in nwise_longest(g, m+1, fv=s))

for x, y in nwise('abcd'):
    print(f'{x, y = }')
print()

for x, y in nwise_longest('abcd'):
    print(f'{x, y = }')
print()

for x, is_first in first('abcd'):
    print(f'{x, is_first = }')
print()

for x, is_last in last('abcd'):
    print(f'{x, is_last = }')
print()

Next time!