ts-python

Python @ Two Sigma

Contents
About
Series I
Series II
Series III
Series IV
Series V

About

Why should you sign up for our newsletter?

Receive Python tips & tricks, brainteasers, code reviews, and other exclusive, original technical content from Don’t Use This Code!
Stay up-to-date with new developments in the projects you love!
Become aware of upcoming events in the global open source community!
Get advance notice and special deals on new offerings from Don’t Use This Code!

Do you want to be a Python expert? Sign up today @ bit.ly/expert-python!

Don’t Use This Code; Training & Consulting

Don’t Use This Code is a professional training, coaching, and consulting company. We are deeply invested in the open source scientific computing community, and we are dedicated to bringing better processes, better tools, and better understanding to the world.

Don’t Use This Code is growing! We are currently seeking new partners, new clients, and new engagements within Two Sigma for our expert consulting and training services.

Our ideal client is a team, large or small, using open source technologies, centering around the PyData stack for scientific and numeric computing. Teams looking to better employ these tools would benefit from the wide range of training courses we offer, ranging from an intensive introduction to Python fundamentals to advanced applications of Python for building large-scale production systems. Working with your team, we can craft targeted curricula to meet your training goals. We are also available for consulting services such as building scientific computing and numerical analysis systems using technologies like Python and React.

We pride ourselves on delivering top-notch training. We are committed to providing quality training, and we do so by investing in three key areas: our content, our processes, and our contributors.

James Powell: Consultant, Instructor, & Presenter

James Powell is a professional Python programmer and enthusiast. He got his start with the language by building reporting and analysis systems for proprietary trading offices; now, he uses his experience as a consultant for a wide range of clients who build data engineering and scientific computing platforms using cutting-edge open source tools like Python and React.

He also currently serves as a Board Director, Chair, and Vice President at NumFOCUS, the 501(c)3 non-profit that supports all the major tools in the Python data analysis ecosystem (i.e., pandas, numpy, jupyter, matplotlib). At NumFOCUS, he helps build global open source communities for data scientists, data engineers, and business analysts. He helps NumFOCUS run the PyData conference series and has sat on speaker selection and organizing committees for 18 conferences. James is also a prolific speaker: since 2013, he has given over seventy (70) conference talks at over fifty (50) Python events worldwide.

Series I

Series I schedule & materials

applied pandas II: Let’s “get in shape” with groupby! (Fri Jan 22, 2021; 9:30 AM EST)
Python Fundamentals II: Eliminate adhoc naming and repetitive code with objects (Thu Jan 14, 2021; 9:30 AM EST)
Reproducibility II: Better notebooks and beyond (Fri Dec 18, 2020; 9:30 AM EST)
Python Fundamentals I: Turn notebook cells into functions, the right way! (Fri Nov 20, 2020; 9:30 AM EST)
applied pandas I: pandas is well designed, actually! (Fri Nov 6, 2020; 9:30 AM EST)
Reproducibility I: Save 3 hours by writing 30 mins of tests (Fri Dec 4, 2020; 9:30 AM EST)

Series II

Series II schedule & materials

“Rear Window” (pandas Window Functions, .rolling, .expanding, .ewm)
“Artists and Models” (the Python Object Model)
“Sunset Boulevard” (better understanding plotting with matplotlib)
“The Caine Mutiny” (effective use of the Python standard library)
“The Abyss” (a deep dive into a pandas problem)
“Ghostbusters” (Index and MultiIndex in pandas)
“Patrick Swayze” (serialising data and using pickle)
“Back to the Future” (contemporary Python syntax and features)
“How to Train Your Dragon” (type-checking with PEP-484 & mypy)
“Shrek” (“holiday stocking stuffer”; bizarre Python language warts)
“Turbo”
“The Boss Baby” (time & memory performance in pandas)

Series III

Series III schedule & materials

“A Data-Cleaning Deep-Dive with pandas”
“Interactive Visualizations in Python”
“Tabular Data Persistence in Python”
“Debugging in Python”
“Concurrency and Parallelism in Python”
“Memory profiling in Python (and pandas)”
“Dashboarding In Python”
“All About Generators”
The When & Why of Object Orientation
Deep Dive into a Time-Series Problem
Seeing pandas in the window: window operations in pandas .rolling, .expanding, .ewm

Series IV

“An Introduction to Pandas for Analysis”
“Avoiding Anti-Patterns in pandas”
“The Right Tool for the Job: pandas vs NumPy vs Xarray”
“Write Better Reports with Python and pandas”
“The Best of Python’s Standard Library”
“PyTorch”
“What can NumPy solve that pandas can’t?”
“Data persistence showdown: .csv vs .pickle vs .parquet (and more!)”
“How do I remember how to use Matplotlib?”
“Make your own data dashboards in Python! (panel, bokeh, & plotly)”
“Profiling your Python & pandas Code”
“From working Prototype to full-fledged API”

Series V

What’s New in pandas 2?

Date: Friday, May 31st, 2024 at 9:30 AM US/Eastern

Topics: pandas 2.0, Python

Materials

Recently Upgrade to the latest version of pandas?

The pandas 2 changelog consists of over 2000 lines of text, code, and bullet points. While the largest changes revolve around the introduction to the PyArrow backend, there are also a plethora of bug fixes, backwards incompatible changes, deprecations, and much more to discuss. With all of the new features and updates it is hard to stay up-to-date with pandas best practices without reading the changelog yourself.

Thankfully, we’ve done that reading for you and have distilled the most important updates and where they will impact your day-to-day work. Join us for “What’s New in pandas 2” to keep up with the best practices in the most veteran DataFrame library in the Python ecosystem.

Building Streamlit Dashboards

Date: Friday, June 28th, 2024 at 9:30 AM US/Eastern

Topics: streamlit, dashboard, dashboarding

Materials

Need to share data insights with others? Then try Streamlit!

The Python space is filled with dashboarding tools to incorporate interactive widgets, charts, and displays to bring your data to life. Today we’re going to take a deep dive into the popular end-to-end framework “Streamlit”.

In this seminar, we’ll discuss the core motivations behind Streamlit’s user-friendly API, which allows Python users to create web apps without needing extensive web development knowledge. We’ll focus on practical examples and discuss how Streamlit can optimize performance, including tips for improving responsiveness and scalability. Whether you’re a developer looking to streamline your workflow or a data scientist aiming to share insights through visualizations, this seminar will provide insights into leveraging Streamlit effectively for your projects.

No `.index`, No pandas

Date: Friday, July 26, 2024 at 9:30 AM US/Eastern

Topics: pandas

Materials

Understanding the .index will change your pandas code forever.

Pandas has seven built-in unique Index types including the CategoricalIndex, DatetimeIndex, TimedeltaIndex, PeriodIndexm IntervalIndex, and more! Do you know which to reach for for a given problem?

In this upcoming seminar, we will focus on the importance of understanding the index in pandas. Not only will we explore each of the standard Index types that pandas offers, but we will also develop an intuition around when you would want to use one over the others. Additionally we will provide guidance on how the Index is used across every single pandas operation and how a better understanding of it can help you make sense of your results. This session is a great opportunity to enhance your skills and gain a deeper understanding of the most widely used DataFrame tool in Python.

Everything about `.groupby`

Date: Friday, Sep 6, 2024 at 9:30 US/Eastern

Topics: pandas

Confidently groupby .aggregate, .apply, and .transform your data!

Materials

Grouped functions are one of the most common operations performed on tabular data. Due to their analytical usefulness, the pandas .groupby operationsneed to be both performant and flexible— two coding concepts that are oftenat odds with one another. So how does the most popular Python DataFramelibrary address this problem? Join us to find out the answer.

In this upcoming seminar, we will discuss the concepts behind pandas .groupby operations— so that you will be able to confidently choose the best method for the problems you work on. Not only will this understanding help make your code more declarative & readable, but you will also develop intuition for fast or slow .groupby operation. This session is a great opportunity to further your understanding and use of pandas to write more maintainable and performant code than before!

How do I make Matplotlib look good?

Date: Friday, Oct 4, 2024 at 9:30 US/Eastern

Topics: matplotlib, data-viz

Communicate your insights with stunning charts in Matplotlib.

Materials

Creating a data visualization is easy, but crafting one that effectively communicates a message and looks great requires deliberate refinement.

Join us for our seminar, “How do I make Matplotlib look good?” where you’ll master Matplotlib’s API and learn how to create charts that impress. We’ll guide you through the essential mechanics of chart building, ensuring your visualizations are not only consistent and aesthetic but also intuitive. Additionally, we’ll explore classic data visualization principles, like the “data-to-ink” ratio, and demonstrate how to apply these concepts in Matplotlib to craft charts that deliver a clear and concise message.

Don’t miss this opportunity to elevate the way you share insights.

What’s Useful in SciPy?

Date: Friday, Oct 25, 2024 at 9:30 US/Eastern

Topics: scipy

Sparse Arrays, Optimization, and Interpolation — Oh My!

Materials

SciPy is a grab bag of scientific computing objects and functions. But what’s there that’s actually useful? Join us to find out!

In this seminar, we will the origins of SciPy and how the package has evolved to become one of the most commonly used scientific computing packages in the Python ecosystem. We will discuss the core technical problems that SciPy helps solve, including sparse arrays, optimization problems, interpolation, clustering, and much more! We will demonstrate practical SciPy solutions to real-world problems so that you can take these approaches and implement them in your own work. If SciPy has you feeling overwhelmed, then this is a seminar that you won’t want to miss.

Python Features Modelers Need to Know

Date: Friday, Nov 15, 2024 at 9:30 US/Eastern

Topics: decorators, generators, context managers, object orientation, asyncio

Materials

Modelers, you probably know pandas and NumPy like the back of your hand, and your code is solid. But even with that expertise, there are likely some inefficiencies that creep into your code from not making use of some important features available in Python.

In this session, we’ll dive into a few of those features that you might have seen, heard of, or read about but haven’t yet integrated into your analytical code. We’ll cover foundational elements like context managers, decorators, and generators and show how they can appear in your analytical code.

We’ll also explore newer additions to Python, including type annotations, the match statement, breakpoint, dataclasses, keyword-only arguments, and some useful third-party libraries like PyArrow (when paired with pandas) and Hypothesis for unit testing. Join us to boost the efficiency and readability of your code!

pandas Mistakes Everyone Makes

Date: Friday, Dec 13, 2024 at 9:30 AM US/Eastern

“Why do my code reviews keep pointing out pandas issues?”

Materials

If you’ve ever had your code sent back with feedback on pandas usage, this seminar is for you. We’ll dive into the most common mistakes that modelers make with pandas during code review. From mishandling apply functions to inefficient joins and unclear indexing, you’ll learn to spot and fix these pitfalls before your reviewers do.

But it’s not just about avoiding mistakes—it’s about developing an intuition for pandas. We’ll discuss the “why” behind best practices, helping you understand the trade-offs in speed, readability, and maintainability. By the end, you’ll have a stronger grasp of pandas workflows and be better equipped to write production-ready code that won’t raise eyebrows during reviews.

Squeezing More Out of pandas

Date: Friday, Jan 31, 2025 at 9:30 AM US/Eastern

For those moments when you’re slightly over maximum RAM.

Materials

When your data sizes just exceed what fits in memory, pandas doesn’t have to grind to a halt. In this seminar, we’ll explore strategies to push pandas a little further without overhauling your entire workflow. We’ll cover tips to help you improve your pandas memory footprint like using memory-efficient datatypes or even incorporating tools like DuckDB and SQLite into your workflow.

We’ll also cover practical tips for profiling memory usage in pandas and understand where the biggest bottlenecks occur. By combining these insights with some additional tooling, you’ll gain the flexibility to process challenging datasets without resorting to big-data frameworks. This seminar is all about working smarter, not harder, while staying in your familiar pandas environment.

What Date/Time is It?

Date: Friday, Feb 28, 2025 at 9:30 AM US/Eastern

Materials

Dates and datetimes are deceptively tricky in Python and pandas. Whether you’re working with timestamps in financial data, scheduling events, or aligning time series, small mistakes can lead to major errors.

In this seminar, we’ll break down the core datetime implementations in Python and pandas, showing how to parse, manipulate, and analyze date-based data effectively. But more importantly, we’ll explore the hidden pitfalls—handling time zones, ambiguous/nonexistent dates, subtle indexing issues, and more—that can cause silent failures in your analysis.

By the end of this session, you’ll walk away with:

A clear understanding of Python’s datetime module and pandas’ datetime handling
Discussion of naive vs aware timestamps when working with timezones
Practical techniques for working with points in time and spans of time
Insights into common datetime mistakes—and how to anticipate and prevent them

If you’ve ever been burned by a timezone bug or an off-by-one-day error, this seminar is for you!

Readable, Testable, Extensible: Writing Code you Won’t Regret

Date: Friday, Mar 28, 2025 at 9:30 AM US/Eastern

Materials

Writing Python code that merely works is one thing—writing code that remains clear, adaptable, and well-tested over time is another. This seminar is all about designing Python code that you won’t regret later, striking the right balance between readability, extensibility, and testability. Whether you’re working solo or contributing to a team, the principles we’ll cover will help ensure that your code remains maintainable and free of unnecessary complexity.

We’ll start by exploring best practices for writing clear, well-documented code, including how to craft docstrings that actually help, choose meaningful names, and structure modules for easier comprehension. We’ll then dive into modular design, discussing how to break down complex logic into reusable, loosely coupled components that are easier to test and extend. Along the way, we’ll highlight common pitfalls—such as hidden side effects, excessive coupling, and overuse of global state—that make code harder to maintain.

From there, we’ll turn to testing strategies that go beyond the basics of unit tests. In addition to writing traditional test cases, we’ll introduce property-based testing with Hypothesis, a powerful tool that generates test cases dynamically, helping you uncover edge cases you might never have thought to check. We’ll also discuss when to use unit tests, integration tests, and property-based testing to maximize coverage without writing excessive boilerplate.

From Memory to Disk and Back Again: Persistent Formats

Date: Friday, Apr 25, 2025 at 9:30 AM US/Eastern

Materials

By the end of this session, you’ll have a concrete set of techniques to write Python code that’s easier to read, modify, and test—saving you and your collaborators from future headaches.

When working with data in Python, choosing the right storage format can make a huge difference in speed, efficiency, and usability. Should you prioritize human readability or fast loading times? How do different formats handle complex data structures? And what’s the best way to ensure data integrity when round-tripping between memory and disk?

In this seminar, we’ll explore the trade-offs between popular data persistence formats, including CSV, Pickle, and Parquet, along with a few others that might surprise you. We’ll discuss their strengths, weaknesses, and ideal use cases—whether you’re working with small datasets, large-scale analytics, or machine learning pipelines. Special attention will be given to Parquet, a powerful columnar format that can dramatically improve storage efficiency and query performance for tabular data.

By the end of this session, you’ll have a solid understanding of when to use each format, how to avoid common pitfalls, and how to make informed, practical decisions about storing and retrieving data efficiently in your projects.

ts-python

Python @ Two Sigma

Contents

About

Newsletter

Don’t Use This Code; Training & Consulting

James Powell: Consultant, Instructor, & Presenter

Series I

Series II

Series III

Series IV

Series V

What’s New in pandas 2?

Building Streamlit Dashboards

No .index, No pandas

Everything about .groupby

How do I make Matplotlib look good?

What’s Useful in SciPy?

Python Features Modelers Need to Know

pandas Mistakes Everyone Makes

Squeezing More Out of pandas

What Date/Time is It?

Readable, Testable, Extensible: Writing Code you Won’t Regret

From Memory to Disk and Back Again: Persistent Formats

No `.index`, No pandas

Everything about `.groupby`