# Functions

## Premise

- Notebooks are great at some things:
    - Distributable, literate computing environments (mixing code & narrative text.)

- Notebooks are poor at some things (but are often used for these purposes):
    - Rapid prototyping environments.
    - Code development environments.

Common:
- code that shouldn't be in a notebook but is.

Rare:
- code that should be in a notebook but isn't.

## Mission

Guide attendees to moving code from their notebooks to proper deployable and sharable, “production-grade” coding environments.

i.e., notebooks → scripts

Guide attendees to better use of Python, by stressing Python language fundamentals, including language-provided metaphors, design guidances, and language mechanics.

## Motivating Example

In [1]:
from pandas import DataFrame
from numpy.random import normal
from collections import namedtuple
from IPython.display import display

In [2]:
df = DataFrame({'a': normal(size=(size:=3)), 'b': normal(size=size)})
df

Unnamed: 0,a,b
0,0.675352,-0.795785
1,1.159616,-1.259179
2,-2.605369,-0.331966


In [3]:
df1 = DataFrame({'a': normal(size=(size:=3)), 'b': normal(size=size)})
df2 = DataFrame({'a': normal(size=(size:=3)), 'b': normal(size=size)})

In [4]:
df1['c'] = df1['a'] + df1['b']
df1

Unnamed: 0,a,b,c
0,-0.973914,0.556433,-0.417481
1,1.344512,1.609397,2.953909
2,-1.014098,0.541415,-0.472683


In [5]:
df2['c'] = df2['a'] + df2['b']
df2

Unnamed: 0,a,b,c
0,2.294991,-1.366591,0.9284
1,0.313675,2.350156,2.663831
2,-0.335537,-0.643755,-0.979292


In [6]:
df1_subset = df1[df1['a'] < df1['b']]
df1_subset

Unnamed: 0,a,b,c
0,-0.973914,0.556433,-0.417481
1,1.344512,1.609397,2.953909
2,-1.014098,0.541415,-0.472683


In [7]:
df2_subset = df2[df2['a'] < df2['b']]
df2_subset

Unnamed: 0,a,b,c
1,0.313675,2.350156,2.663831


In [8]:
def subset(df):
    return df[df['a'] < df['b']]

df1_subset = subset(df1)
df2_subset = subset(df2)

In [9]:
Subset = namedtuple('Subset', 'orig subset')

def subset(df):
    return Subset(df, df[df['a'] < df['b']])

df1_subset = subset(df1)
df2_subset = subset(df2)

display(df1_subset.orig)
display(df1_subset.subset)

Unnamed: 0,a,b,c
0,-0.973914,0.556433,-0.417481
1,1.344512,1.609397,2.953909
2,-1.014098,0.541415,-0.472683


Unnamed: 0,a,b,c
0,-0.973914,0.556433,-0.417481
1,1.344512,1.609397,2.953909
2,-1.014098,0.541415,-0.472683


In [13]:
Inputs = namedtuple('Inputs', 'x y')
Subset = namedtuple('Subset', 'orig subset')

def subset(x, y):
    return Inputs(
        Subset(x, x[x['a'] < x['b']]),
        Subset(y, y[y['a'] < y['b']]),
    )

subsets = subset(df1, df2)
display(subsets.x.orig)
display(subsets.y.subset)

Unnamed: 0,a,b,c
0,-0.973914,0.556433,-0.417481
1,1.344512,1.609397,2.953909
2,-1.014098,0.541415,-0.472683


Unnamed: 0,a,b,c
1,0.313675,2.350156,2.663831


In [15]:
class Inputs(namedtuple('InputsBase', 'x y')):
    Subset = namedtuple('Subset', 'orig subset')
    @classmethod
    def from_df(cls, x, y):
        return cls(
            cls.Subset(x, x[x['a'] <= x['b']]),
            cls.Subset(y, y[y['a'] <= y['b']]),            
        )
    
inputs = Inputs.from_df(x=df1, y=df2)

display(inputs.y.subset)

Unnamed: 0,a,b,c
1,0.313675,2.350156,2.663831


## Theory

Functions are our most basic unit of modularity and computational structuring.

Thus, our design of functions should have, as its goal, the addition of useful structuring and useful modularity.

Structuring is the addition of out-of-band metadata—namely, how data interrelates so that it can be programmatically manipulated.

Modualirity is typically about avoiding repetition to avoid “update anomalies.”

## Context: `lambda`, `def`-functions, classes with `__call__`, `def`-generators

`lambda` typically connotes either:
- a function that is stateless (one single expression)
- a function that is used for a single, adhoc purpose (and not intended for reuse)

In [16]:
f = lambda df: df.mean()

In [17]:
sorted({'a': 1, 'b': -2, 'c': 3}.items(), key=lambda kv: abs(kv[-1]))

[('a', 1), ('b', -2), ('c', 3)]

`def` function connotes:
- some function that performs some computation or some action (may be stateful)
- some function that may be reused

In [20]:
f = lambda df: df.mean()

In [19]:
def f(df):
    return df.mean()

In [21]:
class T:
    def __call__(self, df):
        return df.mean()

In [23]:
class T:
    def __init__(self, df):
        self.df = df
    def __call__(self):
        return self.df.mean()

df = DataFrame({'a': normal(size=3)})
x = T(df)
x()

a   -0.450016
dtype: float64

In [22]:
def create_mean(df):
    def mean():
        return df.mean()
    return mean

df = DataFrame({'a': normal(size=3)})
x = create_mean(df)
x()

a    1.095531
dtype: float64

In [24]:
class T:
    def __init__(self, df):
        self.df = df
    def first_pass(self):
        self.df1 = self.df - self.df.mean()
    def second_pass(self):
        self.df2 = self.df1[self.df1 > 0]
    def third_pass(self):
        self.df3 = self.df2 * 10

df = DataFrame({'a': normal(size=3)})
x = T(df)
x.first_pass()
# ...
x.second_pass()
# ...
x.third_pass()
# ...
x.df3

Unnamed: 0,a
0,
1,10.070738
2,0.999803


In [26]:
def g(df):
    yield (df1 := df - df.mean())
    yield (df2 := df1[df1 > 0])
    yield (df3 := df2 * 10)

df = DataFrame({'a': normal(size=3)})
gi = g(df)
next(gi)
# ...
next(gi)
# ...
next(gi)

Unnamed: 0,a
0,10.021744
1,
2,


In [31]:
class T:
    def __init__(self, df):
        self.df = df
    def __getitem__(self, key):
        return self.df[self.df['a'] > abs(key)]
    
x = T(DataFrame({'a': normal(size=3)}))

x[1]

Unnamed: 0,a
2,1.088412


In [32]:
def f():
    pass

## Mechanics of the `def`-function

In [None]:
def f(df):
    df.to_csv('output.csv')
    ...
    ...
    ...
    return None

def f(df):
    df.to_csv('output.csv')
    ...
    ...
    ...
    return

def f(df):
    df.to_csv('output.csv')
    ...
    ...

In [33]:
def f(*, b, a):
    pass

f(a=None, b=None)

In [None]:
from pandas import read_csv
read_csv('file.csv', delimiter=',', headers=None)

In [34]:
def f(a, /):
    pass
f(None)

In [35]:
from matplotlib.pyplot import hist
hist?

In [None]:
def f(a, b, c):
    return

f(*'123')
f(*(1, 2, 3))
f(*[1, 2, 3])
f(*{1: 'one', 2: 'two', 3: 'three'})

f(*{1, 2, 3})

In [36]:
def f(a, b, c):
    return a + b + c

In [37]:
def f(nums):
    return sum(nums)

In [38]:
def f(*args):
    pass

f(1, 2, 3)

In [39]:
def f(**kwargs):
    pass

f(a=1, b=2, c=3)

In [None]:
isinstance(x, {int, float})

In [None]:
# helpful for "wrapper" functions
def f(*args, **kwargs):
    pass

f(1, 2, 3, a=1, b=2, c=3)

In [40]:
def f(field_a, field_b):
    pass
f(1, 2)

Struct = namedtuple('Struct', 'a b')
def f(struct):
    pass

f(Struct(1, 2))

In [45]:
from pandas import DataFrame, to_datetime
df = DataFrame({'a': [1]}, index=to_datetime(['2020-07-04']))
df.loc['2020-07-04']

a    1
Name: 2020-07-04 00:00:00, dtype: int64

https://github.com/python-variants/variants

In [47]:
# convenience layer
from variants import primary 

@primary
def f(field_a, field_b):
    pass

@f.variant('structured')
def f(struct):
    pass

f('a', 'b')
f.structured(('a', 'b'))

In [48]:
def f(x, y):
    # x and y are integers
    pass

def f(x, y):
    '''x and y are integers'''
display(f.__doc__)

def f(x, y):
    '''
    x: int
    y: int
    '''
display({k.strip(): eval(v.strip())
         for line in f.__doc__.strip().splitlines() 
         for k, v in [line.split(':', 1)]
         if ':' in line
        })

def f(x : int, y : int):
    pass
f.__annotations__

'x and y are integers'

{'x': int, 'y': int}

{'x': int, 'y': int}

In [None]:
def f(x, y):
    # x and y are pandas.DataFrames with a column 'a'
    pass

def f(x, y):
    '''
    x and y are pandas.DataFrames with a column 'a'
    '''
display(f.__doc__)

def f(x, y):
    '''
    x: DataFrame(columns=['a'])
    y: DataFrame(columns=['a'])
    '''
display({k.strip(): eval(v.strip())
         for line in f.__doc__.strip().splitlines() 
         for k, v in [line.split(':', 1)]
         if ':' in line
        })

def f(x : DataFrame(columns=['a']), y : DataFrame(columns=['a'])):
    pass
f.__annotations__

In [50]:
def pure(f):
    f.pure = True
    return f

@pure
def f():
    pass

def g():
    pass

f.pure
getattr(g, 'pure', False)

False

In [51]:
from collections.abc import Callable
class PureMeta(type):
    def __instancecheck__(self, inst):
        return isinstance(inst, Callable) and getattr(inst, 'pure', False)
class pure(metaclass=PureMeta):
    def __call__(self, f):
        f.pure = True
        return f
    
@pure()
def f(x, y, *args):
    return ...
    return None

def g():
    pass

isinstance(f, pure)
isinstance(g, pure)

False