ts-python

What’s Useful in SciPy?

Date: Friday, Oct 25, 2024 at 09:30 US/Eastern

Topics: scipy

Sparse Arrays, Optimization, and Interpolation — Oh My!

SciPy is a grab bag of scientific computing objects and functions. But what’s there that’s actually useful? Join us to find out!

In this seminar, we will the origins of SciPy and how the package has evolved to become one of the most commonly used scientific computing packages in the Python ecosystem. We will discuss the core technical problems that SciPy helps solve, including sparse arrays, optimization problems, interpolation, clustering, and much more! We will demonstrate practical SciPy solutions to real-world problems so that you can take these approaches and implement them in your own work. If SciPy has you feeling overwhelmed, then this is a seminar that you won’t want to miss.

python -m pip install numpy scipy pandas pyarrow torch matplotlib xarray

Notes

Framing

print("Let's take a look!")

What is SciPy, the library?

First, let’s discuss what NumPy is, as a library.

from numpy import ndarray

from numpy import array
from numpy import ones, zeros, empty, fill, eye

NumPy is a library that provides one of the most critical data types for data analysis: the n-dimensional array. Historically, almost all analysis tools were written on top of NumPy. However, in the contemporary era, tools like Arrow can provide a better substructure for columnar analyses (as in Pandas) and tools like PyTorch can provide a better substructure for n-dimensional analyses (where we want GPU-acceleration.)

NumPy provides more than just the numpy.ndarray. For example, NumPy also provides the numpy.ma.core.MaskedArray type that offers an out-of-band encoding for representing missing data.

from numpy.ma import array

Beyond that, NumPy also provides useful, common routines for working with n-dimensional structures. e.g., numpy.linalg provides some basic linear algebra routines and numpy.random provides functionality for generating random values.

from numpy.random import default_rng
from numpy.linalg import det

rng = default_rng(0)
xs = rng.integers(-10, +10, size=(3, 3))

print(f'{det(xs) = }')

SciPy, on the other hand, is first-and-foremost focused on providing useful routines for operating on n-dimensional structures (generally assuming that those structures are backed by numpy.ndarray.)

from scipy import cluster
from scipy import constants
from scipy import datasets
from scipy import fft, fftpack
from scipy import integrate
from scipy import interpolate
from scipy import io
from scipy import linalg
from scipy import misc
from scipy import ndimage
from scipy import odr
from scipy import optimize
from scipy import signal
from scipy import sparse
from scipy import special
from scipy import stats

There is some overlap between functionality found in SciPy and that found in NumPy or in even more specialized tools like Scikit-Learn or Skimage.

from scipy.stats import linregress
from sklearn.linear import LinearModel

from scipy import ndimage
import skimage

SciPy does provide some core types, most notably sparse matrix and spatial subdivision types, but this can be viewed as a consequence of wanting provide certain functionalities on top of arrays.

Therefore, “What’s Useful in SciPy” can be answered by reviewing core functionality provided by SciPy that is likely to come up in our work and reviewing how some of the assumptions of SciPy (and NumPy) interact with our other requirements.

Statistics & Randomness

print("Let's take a look!")

We are huge fans of randomly generated “dummy data.”

from numpy.random import default_rng

rng = default_rng(0)

This isn’t simply because we don’t have access to or don’t want to risk using proprietary data for examples, but also that we strongly believe in tightning the iterative loop as much as possible.

In other words, when performing some complex analysis, I want to be able to write one line of code, see what the effect of that line of code is, confirm and validate that that’s what I want, and then integrate it into my work… and do this in a loop… where each loop is time-bounded.

We observe non-linearities as the speed of iteration increases—a 10× increase in iteration speed (from 1s to 10s) may incur >10× slowdown in coding (or research) velocity.

One key aspect of improving the iterative loop is ‘isolation.’ When iterating on a data analysis, it is extremely helpful to isolate ourselves from data loading (esp. if the data is changing.) Therefore, we may choose to use randomly generated data as a quick starting point.

While NumPy provides us with random attached to random number generators provided by numpy.random, SciPy provides us with a much richer set of distributions, and SciPy allows us to represent these distributions as first-class entities.

In NumPy…

from numpy.random import default_rng

rng = default_rng(0)

print(
    f'{rng.normal(loc=100, scale=5, size=10)        = }',
    f'{rng.normal(loc=100, scale=5, size=10).mean() = }',
    f'{rng.normal(loc=100, scale=5, size=10).std()  = }',
    sep='\n',
)

In SciPy…

from scipy.stats import norm
from numpy.random import default_rng

rng = default_rng(0)

dist = norm(loc=100, scale=5)

print(
    f'{dist                           = }',
    # f'{dist.stats("mvsk")             = }',
    # f'{dist.mean()                    = }',
    # f'{dist.std()                     = }',
    # f'{dist.cdf(dist.mean())          = }',
    # f'{dist + dist = }',
    f'{dist.rvs(10, random_state=rng) = }',
    sep='\n',
)

SciPy also provides us with a much richer set of distributions than NumPy. For example, although the normal distribution is a very easy distribution to manipulate mathematically, it can underestimate extreme outlying behavior that can be historically observed in financial markets. For this, we may choose to use a Levy-stable distribution.

from matplotlib.pyplot import subplots, show, legend

from numpy.random import default_rng
from numpy import linspace

from scipy.stats import norm, levy_stable

dists = {
    'normal':      norm(loc=0, scale=1),
    'levy stable (α = 0.5)': levy_stable(alpha=0.5, beta=0),
    'levy stable (α = 1.0)': levy_stable(alpha=1.0, beta=0),
    'levy stable (α = 1.5)': levy_stable(alpha=1.5, beta=0),
}

xs = linspace(-20, +20, 1_000)

fig, axes = subplots()
for name, dist in dists.items():
    axes.plot(xs, dist.pdf(xs), label=name)

legend()
show()

Additionally, SciPy provides additional statistical operations, such as summary statistics missing from NumPy.

from numpy.random import default_rng
from scipy.stats import skew, kurtosis

rng = default_rng(0)

xs = rng.normal(size=10)

print(
    f'{xs.mean()    = }',
    f'{xs.var()     = }',
    f'{xs.std()     = }',
    f'{skew(xs)     = }',
    f'{kurtosis(xs) = }',
    sep='\n',
)

There are also other statistical functionality like z-transforms.

from numpy.random import default_rng
from scipy.stats import zscore

rng = default_rng(0)

xs = rng.normal(size=10)

print(
    f'{zscore(xs) = }',
    sep='\n',
)

Consider that early versions of NumPy and SciPy did not consider extensions on these base types; however, operations in NumPy and SciPy can use the __array_interface__ and the __array__() method.

from numpy.random import default_rng
from pandas import Series, date_range
from scipy.stats import zscore, kurtosis

rng = default_rng(0)

s = Series(
    index=(idx := date_range('2020-01-01', periods=10)),
    data=rng.normal(size=len(idx)),
)

print(
    # zscore(s),
    f'{s.skew() = }',
    f'{s.kurt() = }',
    kurtosis(s, fisher=False),
    sep='\n',
)

Of course, this does not guarantee that the operation will be fast with any arbitrary structure.

from itertools import pairwise

from numpy import allclose
from numpy.random import default_rng
from scipy.stats import zscore
from torch import from_numpy

from _lib import timed

rng = default_rng(0)
numpy_xs = rng.random(100_000)
torch_xs = from_numpy(numpy_xs)

results = []
with timed('zscore(numpy_xs)'):
    results.append(
        zscore(numpy_xs)
    )

with timed('zscore(torch_xs)'):
    results.append(
        zscore(torch_xs)
    )

with timed('(torch_xs - μ) / σ'):
    results.append(
        (torch_xs - torch_xs.mean()) / torch_xs.std(correction=0)
    )

assert all(allclose(x, y) for x, y in pairwise(results))

from itertools import pairwise

from numpy import allclose
from numpy.random import default_rng
from scipy.stats import zscore
from torch import from_numpy

from _lib import timed

rng = default_rng(0)
numpy_xs = rng.random((10_000, 10_000))
torch_xs = from_numpy(numpy_xs)

results = []
with timed('zscore(numpy_xs)'):
    results.append(
        zscore(numpy_xs)
    )

with timed('zscore(torch_xs)'):
    results.append(
        zscore(torch_xs)
    )

with timed('(torch_xs - μ) / σ'):
    results.append(
        (torch_xs - torch_xs.mean(axis=0)) / torch_xs.std(axis=0, correction=0)
    )

assert all(allclose(x, y) for x, y in pairwise(results))

We should, of course, be careful to thoroughly understand the design of our restricted computation domain to avoid making poor assumptions about performance.

from itertools import pairwise

from numpy import allclose
from numpy.linalg import matrix_power
from numpy.random import default_rng
from xarray import DataArray
from torch import from_numpy

from _lib import timed

rng = default_rng(0)

numpy_xs = rng.integers(-10, +10, size=(1_000, 1_000))
torch_xs = from_numpy(numpy_xs)
xarray_xs = DataArray(
    data=numpy_xs,
)

results = []
with timed('matrix_power(numpy_xs, 10)'):
    results.append(
        matrix_power(numpy_xs, 10)
    )

with timed('matrix_power(xarray_xs, 10)'):
    results.append(
        matrix_power(xarray_xs, 10)
    )

with timed('matrix_power(torch_xs, 10)'):
    results.append(
        matrix_power(torch_xs, 10)
    )

assert all(allclose(x, y) for x, y in pairwise(results))

Spatial Subdivision

print("Let's take a look!")

SciPy does provide us with some data types (that are not mere implementations of the NumPy n-dimensionall array API.) For example, scipy.spatial includes spatial subdivision algorithms such as scipy.spatial.KDTree.

Let’s say we need to compute the collision of n-entities in m-dimensional space. The naïve approach is clearly O(n²).

from numpy.random import default_rng
from numpy import meshgrid, subtract, sqrt, eye

rng = default_rng(0)

entities = rng.uniform(-1, +1, size=(10_000, 2)) # ℝ²
threshold = .10

print(
    (
        (sqrt(
            subtract(*meshgrid(entities[:, 0], entities[:, 0])) ** 2
            +
            subtract(*meshgrid(entities[:, 1], entities[:, 1])) ** 2
        ) < threshold)
        ^
        eye(entities.shape[0], dtype=bool)
    ).sum() // 2,
    sep='\n',
)

from scipy.spatial import KDTree
from numpy.random import default_rng

rng = default_rng(0)

entities = rng.uniform(-1, +1, size=(10_000, 2)) # ℝ²
threshold = .10

kdtree = KDTree(entities)

print(f'{len(kdtree.query_pairs(threshold)) = }')

from numpy.random import default_rng
from numpy import meshgrid, subtract, sqrt, eye
from scipy.spatial import KDTree
from _lib import timed

rng = default_rng(0)

entities = rng.uniform(-1, +1, size=(15_000, 2)) # ℝ²
threshold = .10

kdtree = KDTree(entities)

def np_query_pairs(entities, threshold):
    return (
        (sqrt(
            subtract(*meshgrid(entities[:, 0], entities[:, 0])) ** 2
            +
            subtract(*meshgrid(entities[:, 1], entities[:, 1])) ** 2
        ) < threshold)
        ^
        eye(entities.shape[0], dtype=bool)
    ).sum()

with timed('kdtree.query_pairs(threshold)'):
    kdtree.query_pairs(threshold)
with timed('np_query_pairs(entities, threshold)'):
    np_query_pairs(entities, threshold)

While spatial subdivision may not be particularly interesting for our use cases, scipy.spatial includes functionality for finding convex hulls, which may (e.g., for pricing in electricity markets.)

from matplotlib.pyplot import show, subplots

from numpy.random import default_rng

from scipy.spatial import ConvexHull

rng = default_rng(0)

points = rng.uniform(-10, 10, (10, 2))
hull = ConvexHull(points)

fig, axes = subplots()
axes.plot(*points.T, 'o')
for idx in hull.simplices:
    axes.plot(*points[idx].T, 'k-', alpha=.25)

show()

Sparse Matrices

print("Let's take a look!")

Before we discuss sparse matrices, we should be clear that some things that can be modeled as a matrix maybe should not be.

from string import ascii_lowercase

from numpy import unique
from numpy.random import default_rng
from pandas import date_range, CategoricalIndex
from xarray import DataArray

rng = default_rng(0)

tickers = CategoricalIndex(unique(
    rng.choice([*ascii_lowercase], size=(100, 4)).view('<U4').ravel()
), name='ticker')
dates = date_range('2020-01-01', periods=90, name='date')

da = DataArray(
    data=rng.normal(size=(len(dates), len(tickers))),
    dims='date ticker'.split(),
    coords={
        'date': dates,
        'ticker': tickers,
    },
)

print(da)

Instead, we can model these more accurately as one-dimensional structures with sophisticated (e.g., hierarchical) indexing.

from string import ascii_lowercase

from numpy import unique
from numpy.random import default_rng
from pandas import date_range, CategoricalIndex, MultiIndex, Series
from xarray import DataArray

rng = default_rng(0)

tickers = CategoricalIndex(unique(
    rng.choice([*ascii_lowercase], size=(100, 4)).view('<U4').ravel()
), name='ticker')
dates = date_range('2020-01-01', periods=90, name='date')

s = Series(
    index=(idx := MultiIndex.from_product([dates, tickers])),
    data=rng.normal(size=len(idx)),
).pipe(lambda s: s.sample(len(s) - 5, random_state=rng)).sort_index()

print(
    # s,
    s.unstack(fill_value=0).head(),
    sep='\n{}\n'.format('\N{box drawings light horizontal}' * 40),
)

Let’s try to consider only truly n-dimensional structures—e.g., surfaces, volumes, &c.—or those for which an established mathematical treatment might genuinely view as n-dimensional—e.g., graphs, Markov/stochastic matrices, &c.

In fact, the graph analogy is quite apropos, as we know that the two most common representations of a graph are either as an adjacency list or an adjacency matrix.

As an adjacency matrix:

from numpy import array
from numpy.linalg import matrix_power

connectivity = array([
#    0  1  2  3  4  5
    [1, 0, 0, 0, 1, 0], # 0
    [0, 1, 0, 0, 1, 0], # 1
    [0, 0, 1, 0, 1, 0], # 2
    [0, 0, 0, 1, 0, 0], # 3
    [0, 0, 1, 0, 1, 0], # 4
    [0, 1, 0, 0, 1, 1], # 5
])

print(
    matrix_power(connectivity, 2)
)

As an adjacency list:

connectivity = {
[0, 4],
[1, 4],
[2, 4],
[3],
[2, 4],
[1, 4, 5],
}

As an edge list:

connectivity = {
    (0, 0),
    (0, 4),
    (1, 1),
    (1, 4),
    (2, 2),
    (2, 4),
    (3, 3),
    (4, 2),
    (4, 4),
    (5, 1),
    (5, 4),
    (5, 5),
}

As an edge list with edge data:

connectivity = {
    (0, 0): 1,
    (0, 4): 1,
    (1, 1): 1,
    (1, 4): 1,
    (2, 2): 1,
    (2, 4): 1,
    (3, 3): 1,
    (4, 2): 1,
    (4, 4): 1,
    (5, 1): 1,
    (5, 4): 1,
    (5, 5): 1,
}

As an adjacency list, with a matrix-interface:

from numpy import zeros
from numpy.linalg import matrix_power
from scipy.sparse import csr_matrix

connectivity = csr_matrix([
#    0  1  2  3  4  5
    [1, 0, 0, 0, 1, 0], # 0
    [0, 1, 0, 0, 1, 0], # 1
    [0, 0, 1, 0, 1, 0], # 2
    [0, 0, 0, 1, 0, 0], # 3
    [0, 0, 1, 0, 1, 0], # 4
    [0, 1, 0, 0, 1, 1], # 5
])

print(
    connectivity,
    # matrix_power(connectivity, 2),
    # connectivity ** 2,
    # connectivity + 10,
    # connectivity + zeros(connectivity.shape),
    sep='\n',
)

scipy.sparse provides us with a variety of sparse matrix types, offering various trade-offs

from numpy import array
from scipy.sparse import (
    bsr_matrix, # Block Spare Row
    coo_matrix, # Coordinate
    csc_matrix, # Compressed Sparse Column
    csr_matrix, # Compressed Sparse Row
    dia_matrix, # Diagonal Storage
    dok_matrix, # Dictionary of Keys
    lil_matrix, # List of Lists
)
from scipy.sparse import eye

print(
    # eye(3, format='bsr'),
    # eye(3, format='coo'),
    # eye(3, format='csc'),
    # eye(3, format='csr'),
    eye(3, format='dia'),
    # eye(3, format='dok'),
    # eye(3, format='lil'),
    sep='\n{}\n'.format('\N{box drawings light horizontal}' * 40),
)

Interpolation

print("Let's take a look!")

NumPy supports interpolation with numpy.interp, but only one-dimensional linear interpolation (for monotonically increasing sample points.)

from numpy import linspace, hstack, interp
from matplotlib.pyplot import subplots, show

xs = linspace(-10, 10, 1_000)
ys = xs ** 2

missing = range(250, 750)
# missing = range(550, 750)

sampled_xs = hstack([xs[:missing.start], xs[missing.stop:]])
sampled_ys = sampled_xs ** 2

interpolated_xs = xs
interpolated_ys = interp(interpolated_xs, sampled_xs, sampled_ys)

fig, axes = subplots()

axes.plot(xs, ys, label='original', color='grey', alpha=.25, linewidth=3)
axes.plot(interpolated_xs, interpolated_ys, '--', label='interpolated')

show()

Of course, we can fit polynomials with NumPy. In the past, this would use numpy.polyfit and numpy.polyval but we’re encouraged to use numpy.polynomial these days.

from numpy import linspace, hstack
from numpy.polynomial import Polynomial
from matplotlib.pyplot import subplots, show

xs = linspace(-10, 10, 1_000)
ys = xs ** 2

missing = range(100, 150)
missing = range(500, 750)

sampled_xs = hstack([xs[:missing.start], xs[missing.stop:]])
sampled_ys = sampled_xs ** 2

# poly = Polynomial.fit(sampled_xs, sampled_ys, deg=1)
# poly = Polynomial.fit(sampled_xs, sampled_ys, deg=2)
# poly = Polynomial.fit(sampled_xs, sampled_ys, deg=3)
print(f'{poly.coef.round(2) = }')

interpolated_xs = xs
interpolated_ys = poly(xs)

fig, axes = subplots()

axes.plot(xs, ys, label='original', color='grey', alpha=.25, linewidth=3)
axes.plot(interpolated_xs, interpolated_ys, '--', label='interpolated')

show()

scipy.interpolate provides you with additional functionality for interpolation, both univariate and multivariate interpolation.

For univariate interpolation, you can use interp1d

from numpy import linspace, hstack
from scipy.interpolate import interp1d
from matplotlib.pyplot import subplots, show, legend

xs = linspace(-10, 10, 1_000)
ys = xs ** 2

missing = range(550, 750)

sampled_xs = hstack([xs[:missing.start], xs[missing.stop:]])
sampled_ys = sampled_xs ** 2

interpolated_xs = xs
# interpolated_ys = interp1d(sampled_xs, sampled_ys)(interpolated_xs)
# interpolated_ys = interp1d(sampled_xs, sampled_ys, kind='quadratic')(interpolated_xs)
# interpolated_ys = interp1d(sampled_xs, sampled_ys, kind='previous')(interpolated_xs)
# interpolated_ys = interp1d(sampled_xs, sampled_ys, kind='next')(interpolated_xs)
interpolated_ys = interp1d(sampled_xs, sampled_ys, kind='slinear')(interpolated_xs) # spline of order 1

fig, axes = subplots()

axes.plot(xs, ys, label='original', color='grey', alpha=.25, linewidth=3)
axes.plot(interpolated_xs, interpolated_ys, '--', label='interpolated')

legend()
show()

Note that numpy.interpolate does not support multidimensional structures, complex-valued data, but is faster than scipy.interp1d

Of course, scipy.interpolate has much more sophisticated approaches, such as piecewise-polynomials or B-spline interpolation.

from numpy import linspace, sin, cos, arange
from numpy.random import default_rng
from matplotlib.pyplot import subplots, show, legend
from scipy.interpolate import splrep, splev, PPoly, CubicHermiteSpline

rng = default_rng(0)

xs = linspace(-10, 10, 1_000)
ys = sin(xs)

sampled_idxs = rng.choice(arange(len(xs)), size=5)
sampled_idxs.sort()
sampled_xs = xs[sampled_idxs]
sampled_ys = ys[sampled_idxs]

spline = splrep(sampled_xs, sampled_ys)

bspline_xs = linspace(sampled_xs.min(), sampled_xs.max(), len(xs))
bspline_ys = splev(bspline_xs, spline)

ppoly_xs = linspace(sampled_xs.min(), sampled_xs.max(), len(xs))
# ppoly_ys = PPoly.from_spline(spline)(ppoly_xs)
ppoly_ys = CubicHermiteSpline(sampled_xs, sampled_ys, cos(sampled_xs))(ppoly_xs)

fig, axes = subplots()

axes.plot(xs, ys, label='original', color='grey', alpha=.25, linewidth=3)
axes.plot(sampled_xs, sampled_ys, 'o', label='samples')
axes.plot(bspline_xs, bspline_ys, label='bspline')
axes.plot(ppoly_xs, ppoly_ys, label='piece-wise poly')

legend()
show()

Linear Algebra

print("Let's take a look!")

NumPy includes a tools for linear algebra, including all of the functionality you woud expect.

There is numpy.dot, numpy.inner, and numpy.outer for dot, inner, and outer products.

from numpy.random import default_rng
from numpy import dot, inner, outer

rng = default_rng(0)

xs = rng.integers(-10, +10, size=10)
ys = rng.integers(-10, +10, size=10)

print(
    f'{dot(xs, ys)   = }',
    # f'{inner(xs, ys) = }',
    f'{outer(xs, ys) = }',
    sep='\n',
)

There is numpy.linalg.det and numpy.linalg.trace for matrix determinant and trace.

from numpy.random import default_rng
from numpy.linalg import det, trace

rng = default_rng(0)

xs = rng.integers(-10, +10, size=(3, 3))

print(
    f'{det(xs)   = }',
    f'{trace(xs) = }',
    sep='\n',
)

There is numpy.linalg.eig for eigenvalues.

from numpy.random import default_rng
from numpy.linalg import eig

rng = default_rng(0)

xs = rng.integers(-10, +10, size=(3, 3))

print(
    eig(xs),
    sep='\n',
)

There are decompositions: numpy.linalg.qr and numpy.linalg.svd.

from numpy.random import default_rng
from numpy.linalg import qr, svd

rng = default_rng(0)

xs = rng.integers(-10, +10, size=(3, 3))

print(
    qr(xs),
    svd(xs),
    sep='\n{}\n'.format('\N{box drawings light horizontal}' * 40),
)

Largely, SciPy offers slightly differing or slightly more functionality over NumPy for these methods. For example, numpy.linalg.pinv and scipy.linalg.pinv both provide the pseudo-inverse of a matrix with minor differences in the signatures of these methods and their return values.

def pinv(a, rcond=None, hermitian=False, *, rtol=_NoValue):
    """
    Compute the (Moore-Penrose) pseudo-inverse of a matrix.

    Calculate the generalized inverse of a matrix using its
    singular-value decomposition (SVD) and including all
    *large* singular values.
    """

    a, wrap = _makearray(a)
    if rcond is None:
        if rtol is _NoValue:
            rcond = 1e-15
        elif rtol is None:
            rcond = max(a.shape[-2:]) * finfo(a.dtype).eps
        else:
            rcond = rtol
    elif rtol is not _NoValue:
        raise ValueError("`rtol` and `rcond` can't be both set.")
    else:
        # NOTE: Deprecate `rcond` in a few versions.
        pass

    rcond = asarray(rcond)
    if _is_empty_2d(a):
        m, n = a.shape[-2:]
        res = empty(a.shape[:-2] + (n, m), dtype=a.dtype)
        return wrap(res)
    a = a.conjugate()
    u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)

    # discard small singular values
    cutoff = rcond[..., newaxis] * amax(s, axis=-1, keepdims=True)
    large = s > cutoff
    s = divide(1, s, where=large, out=s)
    s[~large] = 0

    res = matmul(transpose(vt), multiply(s[..., newaxis], transpose(u)))
    return wrap(res)

def pinv(a, *, atol=None, rtol=None, return_rank=False, check_finite=True):
    """
    Compute the (Moore-Penrose) pseudo-inverse of a matrix.

    Calculate a generalized inverse of a matrix using its
    singular-value decomposition ``U @ S @ V`` in the economy mode and picking
    up only the columns/rows that are associated with significant singular
    values.

    If ``s`` is the maximum singular value of ``a``, then the
    significance cut-off value is determined by ``atol + rtol * s``. Any
    singular value below this value is assumed insignificant.
    """

    a = _asarray_validated(a, check_finite=check_finite)
    u, s, vh = _decomp_svd.svd(a, full_matrices=False, check_finite=False)
    t = u.dtype.char.lower()
    maxS = np.max(s, initial=0.)

    atol = 0. if atol is None else atol
    rtol = max(a.shape) * np.finfo(t).eps if (rtol is None) else rtol

    if (atol < 0.) or (rtol < 0.):
        raise ValueError("atol and rtol values must be positive.")

    val = atol + maxS * rtol
    rank = np.sum(s > val)

    u = u[:, :rank]
    u /= s[:rank]
    B = (u @ vh[:rank]).conj().T

    if return_rank:
        return B, rank
    else:
        return B

Optimization

print("Let's take a look!")

SciPy includes some (simple) optimization mechanisms for curve-fitting, linear-least squares, mixed-integer linear programming and other optimization approaches.

from scipy.optimize import brute, minimize
from numpy import linspace
from numpy.polynomial import Polynomial
from matplotlib.pyplot import subplots, show

poly = Polynomial([-2, -1.5, 0.75, 0.25])
xs = linspace(-5, +5, 100)
ys = poly(xs)

# ranges = [
#     slice(-2, +2),
# ]
# min_xs = brute(poly, ranges)

x0 = 0
min_xs = minimize(poly, x0, method='Powell').x # Bound-Constrained
min_ys = poly(min_xs)

fig, axes = subplots()

axes.set_xlim(xs.min(), xs.max())
axes.set_ylim(ys.min(), ys.max())

axes.plot(xs, ys)
axes.axvline(min_xs, color='red', alpha=.5)
axes.axhline(min_ys, color='red', alpha=.5)
# for rng in ranges:
#     axes.axvspan(rng.start, rng.stop, color='grey', alpha=.25)

show()