PythonPro #24: Multivariate Time Series with pandas, Cloudflare's Python Workers, N.Korea's Python Payloads, and Dockerized Python Setup

Bite-sized actionable content, practical tutorials, and resources for Python programmers and data scientists

Apr 10, 2024

Welcome to a brand new issue of PythonPro!

In today’s Expert Insight we bring you an excerpt from the recently published,

Deep Learning for Time Series Cookbook, which teaches you how to load, transform, and visually represent multivariate time series data using pandas in Python, focusing on handling datasets with interrelated variables over time.

News Highlights: Cloudflare now allows writing Workers in Python; A 571%

growth in Python’s usage in Snowflake's libraries highlights its essential role in AI; and Dagworks Inc. introduces the Burr framework to streamline GenAI application development and debugging.

Here are my top 5 picks from our learning resources today:

North Korea’s Post-Infection Python Payloads🕵️‍♂️
Python Project-Local Virtualenv Management Redux📦
📊Using Neo4j and Langchain for Knowledge Graph Creation - A Detailed Guide
Testing with Python (part 2) - moving to pytest⚙️
Setting Up A Dockerized Python Environment — The Elegant Way🐳

Dive in, and let me know what you think about this issue in today’s survey!

Stay awesome!

Divya Anne Selvaraj

Editor-in-Chief

P.S: If you have any food for thought, feedback, or would like us to find you a Python learning resource on a particular subject for our next issue, take the survey.

Sign Up | Advertise

🐍 Python in the Tech 💻 Jungle 🌳

🗞️News

Cloudflare now allows writing Workers in Python: This update supports popular Python packages like FastAPI and Numpy, enabling dynamic linking essential for efficient WebAssembly use. Read to learn how to integrate Python into Cloudflare Workers.
Python skills ‘increasingly essential’ to dev teams venturing into advanced AI: Python's role as the go-to language for AI and machine learning is evident through a 571% increase in its use within Snowflake's Snowpark libraries, significantly outpacing Scala and Java. Read to learn how Python skills are becoming crucial for teams engaging in advanced AI.
Burr – Develop Stateful AI Applications: Dagworks Inc. has introduced Burr, an open-source Python framework designed to streamline the building and debugging of GenAI applications. Read if you are looking for efficient tools to create and manage GenAI applications.

💼Case Studies and Experiments🔬

Predicting solar eclipses with Python: Leveraging the Astropy package, the author of this article, is able to quickly compute the angular separation between the sun and the moon from Earth, crucial for identifying eclipses. Read to explore the practical application of programming skills to solve complex astronomical problems.
How AI wasted half my day with the wrong decorator: This article highlights the limitations of relying solely on AI for programming advice, underscoring the importance of human oversight and the gap towards achieving Artificial General Intelligence (AGI). Read for insights into the value of human expertise and validation in the software development process.

📊Analysis

North Korea’s Post-Infection Python Payloads: This article delves into the sophisticated use of Python scripts as part of a malware deployment strategy by a North Korean threat actor group. Read for detailed script analyses and to learn about recent investigations which reveal enhancements in the malware's flexibility.
Streamlit vs. Shiny for Python: Streamlit is easy to start with but hard to expand and customize, whereas Shiny facilitates growth and customization through its structured code and flexibility. Read to learn about the fundamental differences in terms of execution strategy, scalability, and the ability to customize and extend applications.

🎓 Tutorials and Guides 🤓

Easy video transcription and subtitling with Whisper, FFmpeg, and Python: This guide takes you through a step-by-step process for transcribing videos and adding subtitles using OpenAI's Whisper model and FFmpeg, facilitated by Python. Read to learn how to effectively transcribe video content and embed subtitles.
Python Project-Local Virtualenv Management Redux: This guide discusses modern Python virtual environment management strategies, emphasizing the shift towards local virtualenv within project directories, akin to Node's node_modules. Read to learn efficient virtual environment management practices for Python projects.
Using Neo4j and Langchain for Knowledge Graph Creation - A Detailed Guide: Knowledge graphs, representing entities and relationships, enhance ML models and natural language processing by offering a structured knowledge base. Read if you are interested in leveraging knowledge graphs for complex data analysis.
Using Poetry for Python dependency management: Poetry simplifies managing dependencies and publishing Python libraries through a poetry.toml file for dependency specification and a lock file to ensure consistency. Read to learn how to setup and configure the tool, and troubleshoot common issues.
Backtesting a Trading Strategy in Python With Datalore and AI Assistant: In this tutorial, the author introduces a mean reversion trading strategy using Python in Datalore notebooks, emphasizing the approach’s accessibility for beginners through AI Assistant. Read to learn how Datalore and AI assistance can simplify complex data analysis and investment strategy evaluation.
Make Python DevEx: This article offers insights into overcoming Python development setup challenges, by advocating for a standardized, automated approach with Make to enhance developer experience and productivity. Read if your team is facing challenges with command memorization, environment configuration, and maintaining a uniform developer experience.
Web scraping HTML tables with Python: This article introduces an efficient method to scrape HTML tables from websites like Yahoo Finance using pandas in Python, without the need for tools like BeautifulSoup or MechanicalSoup. Read to discover an alternative to traditional web scraping tools for specific data extraction tasks.

🔑 Best Practices and Code Optimization 🔏

Testing with Python (part 2) - moving to pytest: This article introduces pytest as a more productive, flexible, and faster framework for Python testing compared to unittest. Read to learn how to use pytest to improve your testing practices.
Install and Execute Python Applications Using pipx: Pipx is a tool that transforms PyPI into an application marketplace, allowing the safe installation and execution of Python applications without interfering with the global Python interpreter. Read to learn how to ensure safe execution and dependency management.
Everything You Need to Know About Python: This reference guide concisely covers essential Python features, from basics like data structures and type conversions to advanced constructs such as loops, comprehensions, and functions. Read if you are looking for a quick guide on Python's core concepts and syntax.
Enforcing conventions in Django projects with introspection: This article discusses leveraging Python's introspection abilities within Django projects to enforce naming conventions, particularly for DateField and DateTimeField to avoid confusion and maintenance issues. Read if you want to learn how to establish and maintain clear, consistent naming conventions in your projects.
Setting Up A Dockerized Python Environment — The Elegant Way: This article presents a detailed guide for setting up a dockerized Python development environment elegantly using VSCode and the Dev Containers extension. Read if you are looking to streamline your Python development workflow with containerization.

Take the Survey, Request a Learning Resource

🧠 Expert insight 📚

Deep Learning for Time Series Cookbook, Published by Packt, Book Cover

Here’s an excerpt from “Chapter 1: Getting Started with Time Series” in the Deep Learning for Time Series Cookbook

by Vitor Cerqueira and Luís Roque, published in March 2024.

Loading and visualizing a multivariate time series

…This recipe explores how to load a multivariate time series. Before, we used the pandas Series structure to handle univariate time series. Multivariate time series are better structured as pandas DataFrame objects.

Getting ready

A multivariate time series contains multiple variables. The concepts underlying time series analysis are extended to cases where multiple variables evolve over time and are interrelated with each other. The relationship between the different variables can be difficult to model, especially when the number of these variables is large.

In many real-world applications, multiple variables can influence each other and exhibit a temporal dependency. For example, in weather modeling, the incoming solar radiation is correlated with other meteorological variables, such as air temperature or humidity. Considering these variables with a single multivariate model can be fundamental for modeling the dynamics of the data and getting better predictions.

We’ll continue to study the solar radiation dataset. This time series is extended by including extra meteorological information.

How to do it…

We’ll start by reading a multivariate time series. Like in the Loading a time series using pandas recipe, we resort to pandas and read a .csv file into a DataFrame data structure:

import pandas as pd

data = pd.read_csv('path/to/multivariate_ts.csv',

parse_dates=['datetime'],

index_col='datetime')

The parse_dates and index_col arguments ensure that the index of the DataFrame is a DatetimeIndex object. This is important so that pandas treats this object as a time series. After loading the time series, we can transform and visualize it using the plot() method:

data_log = LogTransformation.transform(data)

sample = data_log.tail(1000)

mv_plot = sample.plot(figsize=(15, 8),

title='Multivariate time series',

xlabel='',

ylabel='Value')

mv_plot.legend(fancybox=True, framealpha=1)

The preceding code follows these steps:

First, we transform the data using the logarithm.
We take the last 1,000 observations to make the visualization less cluttered.
Finally, we use the plot() method to create a visualization. We also call legend to configure the legend of the plot.

How it works…

A sample of the multivariate time series is displayed in the following figure:

Figure 1.6: Multivariate time series plot

The process of loading a multivariate time series works like the univariate case. The main difference is that a multivariate time series is stored in Python as a DataFrame object rather than a Series one.

From the preceding plot, we can notice that different variables follow different distributions and have distinct average and dispersion levels.

You can read the entire first chapter for free and buy Deep Learning for Time Series Cookbook by Vitor Cerqueira and Luís Roque here. Packt library subscribers can continue reading the book here.

Get the book!

On a scale of 1-10, how would you rate today’s issue of PythonPro in terms of being informative, engaging, and useful?
lowest 1 2 3 4 5 6 7 8 9 10 highest

And that’s a wrap.

We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most useful here. The complete PythonPro archives can be found here.

If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, leave a comment below.