ts-python

Series II: Python @ Two Sigma (ts-python.dutc.io)

Logo: Don't Use This Code, LLC

Contents

Seminar I: “Rear Window” (pandas Window Functions, .rolling, .expanding, .ewm)

Materials:

Lisa (Grace Kelly): I wish I were creative…

Jeff (Jimmy Stewart): You are! You’re great at creating difficult situations.

Rear Window (1954)

Date Time Track Meeting Link
May 7, 2021 9:30 AM EST In Depth with pandas Seminar I: “Rear Window”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with at least a…

During this session, we will endeavour to guide our audience to developing…

…and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

In a previous episode, we looked at .groupby in-depth, as well as the various reduction operations it supports (.apply, .transform, .aggregate.)

Let’s turn our attention to other window functions in pandas—functions which operate on “windows” of multiple rows or multiple columns to perform aggregations or other transformations.

In this episode, we’ll look at .rolling, .expanding, and .ewm, their various options and modalities, as well as the operations available on the Window, Rolling, Expanding, and ExponentialMovingWindow objects they return. We’ll discuss these operations in the context of time series analysis and discuss performance considerations related to the use of each.

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue or begin to use window methods in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar II: “Artists and Models” (the Python Object Model)

Materials:

Abigail ‘Abby’ Parker (Dorothy Malone): I’m doing an advertising layout. So if you’ll excuse me, I’ll get my models and get to work.

Richard ‘Rick’ Todd (Dean Martin): Models?

Abby: Yes, models. You understand. They’re people who pose.

Artists and Models (1955)

Date Time Track Meeting Link
June 4, 2021 11:00 AM EST Better Use of Python Seminar II: “Artists and Models”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with a…

… or greater!

During this session, we will endeavour to guide our audience to developing…

…and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

In a previous episode, we looked at motivating the use of object orientation in Python in analytical work, looking at boilerplate reduction tools such as dataclasses.dataclass and collections.namedtuple.

Let’s turn our attention to the Python object model itself, the various mechanics it supports, and the protocols it provides.

In this episode, we’ll look at the Python object model and the various “hook points” it provides to users to integrate within the common vocabulary–the “language”—of the language. We’ll look at these in terms of their overall design as protocols and look at common conventions and rules around the most common protocols that users may implement:

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue or begin to use object orientated approaches and the Python data model in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar III: “Sunset Boulevard” (better understanding plotting with matplotlib)

Materials:

Betty Schaefer (Nancy Olson): Oh, the old familiar story. You help a timid little soul cross a crowded street, she turns out to be a multimillionaire and leave you all her money.

Joe Gillis (William Holden): That’s the trouble with you readers—you know all the plots.

Sunset Boulevard (1950)

Date Time Track Meeting Link
July 2, 2021 9:30 AM EST Fluent, Effective Visualization Seminar III: “Sunset Boulevard”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with at least a…

During this session, we will endeavour to guide our audience to developing…

…and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

Let’s turn our attention to visualizing data with matplotlib and the underlying mechanics and theory of how the library and API are designed.

In this episode, we’ll take a look at matplotlib’s underlying design, tying it to common visualization tasks we want to perform. We’ll look at the underlying conceptual entities involved (e.g., Figure, Axes, subplots, Tick, Patch) to attempt to build a strong conceptual understanding of how matplotlib constructs a visualization. We’ll bridge this conceptual understanding to common customizations we may want to perform and use it to structure and categorize our knowledge of the multitude of “conceptual entities” encountered when attempting a complex visualization.

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue or begin to use matplotlib in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar IV: “The Caine Mutiny” (effective use of the Python standard library)

Materials:

Lt. Cmdr. Philip Francis Queen (Humphrey Bogart): Aboard my ship, excellent performance is standard, standard performance is sub-standard, and sub-standard performance is not permitted to exist. That, I warn you.

The Caine Mutiny (1954)

Date Time Track Meeting Link
July 16, 2021 9:30 AM EST Improving Use of Common Libraries Seminar IV: “The Caine Mutiny”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with at least a…

During this session, we will endeavour to guide our audience to developing…

…and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

Let’s turn our attention to the Python standard library and how it provides a set of first-approximation tools for helping us accomplish common scripting tasks. These scripting tasks typically “surround” our analytical use cases—they may not be intimately or directly related to the analysis itself, but they provide functionality that supports some data modeling, data cleaning, or another automation-related capability.

In this episode, we’ll look at the Python standard library, focusing on libraries such as pathlib, collections, tempfile, functools, textwrap, itertools, argparse, and others.

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue to add scripting, structuring, and automation surrounding the analytical tasks in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar V: “The Abyss” (a deep dive into a pandas problem)

Materials:

Lindsey Brigman (Mary Elizabeth Mastrantonio): I got over four years invested in this project.

Virgil “Bud” Brigman (Ed Harris): Yeah, you only had three years invested in me.

Lindsey Brigman: Well, you have to have priorities.

The Abyss (1989)

Date Time Theme Meeting Link
Fri Aug 6, 2021 9:30 AM EDT A deep dive into a pandas problem Seminar V: “The Abyss”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with at least a…

During this session, we will endeavour to guide our audience to developing…

Abstract

In previous episodes, we have looked at many problems involving time-series or columnar-data analysis using pandas. Our goal has been to present these examples in order to understand some broader conceptual point, such as the structure of the pandas.Series and pandas.DataFrame, better use of indices, or better use of windowing operations.

By popular demand, this episode will set aside our goal for deeper thematic or conceptual understanding and instead provide us an opportunity to look at a couple of problems in-depth!

Join us for this episode to look at a number of in-depth pandas examples, drawn from our previous discussions, where we will go line-by-line and look at what constitutes fluent, precise Python and pandas code!

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you try to improve the fluency or the precision of your code?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar VI: “Ghostbusters” (Index and MultiIndex in pandas)

Materials:

Dr. Peter Venkman (Bill Murray): Ray has gone bye-bye, Egon… what’ve you got left?

Dr. Egon Spengler (Harold Ramis): Sorry, Venkman, I’m terrified beyond the capacity for rational thought.

Ghostbusters (1984)

Date Time Track Meeting Link
Fri Sep 10, 2021 9:30 AM EDT Working with Indices and MultiIndices Seminar VI: “Ghostbusters”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with at least a…

During this session, we will endeavour to guide our audience to developing…

Abstract

In previous episodes, we have talked about the structure of the pandas.Series and pandas.DataFrame. We have shown examples that involve complex pandas.Index structures, including the use of explicitly hierarchical pandas.MultiIndexes.

In this session, we will take a closer look at the pandas.Index, its API, the common variations of indices we work with, and considerations when using an implicitly or explicitly hierarchical index. We will also take a close look at the use of the pandas.MultiIndex and review situations where the MultiIndex might come up, where we may want to embrace use of the MultiIndex, and common situations where we want to stay clear of its complexity!

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you perform analyses on pandas.DataFrames with more complex structures?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar VII: “Patrick Swayze” (serialising data and using pickle)

Materials:

Kevin Scott (Patrick Swayze): I was discharged for striking a radio operator who fell asleep at his post. You’re worried that I have no combat experience. You’re right. There’s no way of proving that I won’t fail in combat. But then again, you can’t prove that I will, either.

Uncommon Valor (1983)

‘Blaster’ (Reb Brown): Most human problems can be solved by an appropriate charge of high explosives.

Uncommon Valor (1983)

Johnny Castle (Patrick Swayze): You just put your pickle on everybody’s plate, college boy, and leave the hard stuff to me.

Dirty Dancing (1987)

James Dalton (Patrick Swayze): All you have to do is follow three simple rules. One: never underestimate your opponent. Expect the unexpected. Two: take it outside. Never start anything inside the bar unless it’s absolutely necessary. And three: be nice.

Road House (1989)

Date Time Track Meeting Link
Fri Oct 1, 2021 9:30 AM EDT Persisting and Serialising Data Seminar VII: “Patrick Swayze”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with a…

During this session, we will endeavour to guide our audience to developing…

Abstract

In previous episodes, we have worked with data sets that we constructed, usually from random data. In practice, our data sets are likely to be stored in on-disk formats (e.g., CSV, Excel, HDF5, Parquet) or are retrieved over the network from remote servers (e.g., SQL databases, Hadoop/Hive stores.) Once we load this data, we may have intermediary or temporary results we want to store, and we may want flexible, fast, and easy ways to store this data.

In this episode, we’ll discuss data serialisation packages and technologies. We’ll discuss this in the context of permanent or long-term storage, as well in the context of transient, temporary, or short-term storage. We’ll take a close look at simple, common approaches provided by pandas (e.g., to_csv or to_pickle) and discuss the benefits as well as the significant limitations of these approaches. In particular, we will look closely at pickle and discuss why it is a “fantastic tool that you should never use.”

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you manage complex analyses that may involve transient or intermediary results that you want to persist between analyses?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar VIII: “Back to the Future” (contemporary Python syntax and features)

Marty McFly (Michael J Fox): Hey, Doc, we better back up. We don’t have enough road to get up to 88.

Dr. Emmett Brown (Christopher Lloyd): Roads? Where we’re going, we don’t need roads.

Back to the Future (1985)

Date Time Track Meeting Link
Fri Oct 15, 2021 9:30 AM EDT New features in Python 3~3.9 Seminar VIII: “Back to the Future”

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with at least a…

During this session, we will endeavour to guide our audience to developing…

Abstract

In previous episodes, we looked at code examples written using the latest versions of Python and pandas, which featured syntax available only in later Python versions. These examples were carefully written to demonstrate a contemporary, fluent Python style, and to reflect the latest approaches endorsed by the Python and PyData communities.

In this episode, we will take a close look at how Python has evolved both since the Python 2 era and within the Python 3 era, looking closely at changes to the language since the release of Python 3.6. We will take a look at new syntax added to the language, the evolution of approaches and thematic changes made to the language to support new styles of programming, and the introduction of new major features.

For those who first picked up Python in the Python 2 era or the Python 3.3 ~ 3.5 era, this presentation will serve as an opportunity for you to brush up to see “what’s (new and) old in Python 3.9!”

To Be Continued…

Did you enjoy this episode? Did you learn something new that you will use more and more in your code, or did you discover a new technique that accomplishes a common task more fluently, more precisely, or more robustly?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar IX: “How to Train Your Dragon” (type-checking with PEP-484 & mypy)

Materials:

Hiccup (Jay Baruchel): Everything we know about you guys is wrong.

Date Time Track Meeting Link
Fri, Nov 12, 2021 9:30 AM EDT Python fundamentals https://primetime.bluejeans.com/a2m/live-event/dwqcpqgq

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees…

During this session, we will endeavour to guide our audience to developing…

…and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

In this episode, we will talk about new techniques and tools for ensuring the correctness of your code. You may even have seen these tools, mypy in particular, adopted more and more in the code you interact with, and may be curious how to interpret the new syntax they introduce and how to let your code benefit from these approaches.

We will start by broadly discussing the challenges of static verification in Python (in relation to other languages and ecosystems you may be familiar with, such as C, C++, or Java.) We cover new syntax added to Python 3 for annotating functions and variables, and how these fit into third-party tooling like mypy. We’ll discuss where mypy can give good guidance for code improvements and can help reïnforce good design principles in your code. Finally, we’ll discuss areas where mypy may not provide significant benefits and where other checking and verification techniques may be superior.

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue or begin to use window methods in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar X: “Shrek” (“holiday stocking stuffer”; bizarre Python language warts)

Materials:

Donkey (Eddie Murphy): Hey, what’s your problem, Shrek, what you got against the whole world anyway, huh?

Shrek (Mike Myers): Look, I’m not the one with the problem, okay? It’s the world that seems to have a problem with ME! People take one look at me and go “Aargh! Help! Run! A big stupid ugly ogre!” They judge me before they even know me - that’s why I’m better off alone…

Donkey (Eddie Murphy): You know, Shrek… when we first met, I didn’t think you were a big, stupid, ugly ogre.

Shrek (Mike Myers): Yeah, I know.

Shrek (2001)

Date Time Track Meeting Link
Fri, Dec 3, 2021 9:30 AM EDT Python fundmentals & misc. https://primetime.bluejeans.com/a2m/live-event/xcsugkge

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees…

During this session, we will endeavour to guide our audience to developing…

…and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

In this special holiday episode, we’re going to have some fun! We’ll discuss design mistakes (“warts”) made in the development of Python and common analysis libraries like pandas. We’ll cover why these exist, how to come to terms with them, how they affect the design and implementation of our own systems and analytical efforts, and what these tell us more broadly about Python and programming.

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue or begin to use window methods in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar XI: “Turbo”

Materials:

White Shadow (Mike Bell): Here one second, gone the next. They call me… the White Shadow! I can move so fast, all you see is my shadow.

Turbo (Ryan Reynolds): I don’t get it.

White Shadow (Mike Bell): I’m fast, like a shadow!

Turbo (Ryan Reynolds): But shadows, they’re not inheretly fast.

White Shadow (Mike Bell): White Shadowwww…

Turbo (Ryan Reynolds): I can still see you.

Turbo (2013)

Date Time Track Meeting Link
Fri, Jan 21, 2022 9:30 AM EDT Performance & Tooling https://primetime.bluejeans.com/a2m/live-event/jkvwcrph

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with…

During this session, we will endeavour to guide our audience to developing…

…and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

In previous episodes, we’ve performed very simple analyses of code, to help motivate a better understanding of the design and use of tools like pandas and numpy. In this episode, we’ll take a closer look at the task of analysing code for performance—we’ll look at tools and techniques for “profiling” our code.

We’ll discuss traditional tools, like cProfile in the standard library, and show how they can be used to spot and address performance issues. We’ll take this further, and discuss limitations to the traditional profiling approach. We’ll introduce newly popular techniques, like the use of sampling profilers such as scalene and pyspy, and show how they can identify and help resolve performance issues in ways traditional tools might struggle.

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue or begin to use window methods in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.

Seminar XII: “The Boss Baby” (time & memory performance in pandas)

Materials:

Tim (Miles Bakshi): Even though I never went to business school I did learn to share in kindergarten. And if there isn’t enough love for the two of us then I wanna give you all of mine.

The Boss Baby (2017)

Date Time Track Meeting Link
Fri Feb 11, 2022 9:30 AM EDT pandas https://primetime.bluejeans.com/a2m/live-event/gtcykzxf

Audience

These sessions are designed for a broad audience of modelers and software programmers of all backgrounds and skill levels.

Our expected audience should comprise attendees with a…

During this session, we will endeavour to guide our audience to developing…

…and we will share additional tips, tricks, and in-depth guidance on all of these topics!

Abstract

In a previous episode, we’ve discussed the design of the pandas API, and performance and optimization limitations arising therefrom. In this episode, we’ll take a close look at common performance problems in pandas code, including problems where our code uses too much memory or problems where our code performs computational tasks inefficiently.

We’ll discuss how to spot common sources of performance (time or memory) problems, how to avoid them, and cases where you may need to employ more sophisticated techniques (such as the use of tools like cython or numba) to improve the speed or reduce the memory-usage of your code.

To Be Continued…

Did you enjoy this episode? Did you learn something new that will help you as you continue or begin to use window methods in your work?

If so, stay tuned for future episodes, which may…

If there are other related topics you’d like to see covered, please reach out to Diego Torres Quintanilla.