PythonPro #44: Generative AI with PyTorch, uv Update, Choosing the Best Visualization Type, and FastAPI for Rapid Development
Bite-sized actionable content, practical tutorials, and resources for Python programmers and data scientists
Welcome to a brand new issue of PythonPro!
In today’s Expert Insight we bring you an excerpt from the recently published book, Generative AI Foundations in Python, which provides a hands-on guide to implementing generative AI models—GANs, diffusion models, and transformers—using PyTorch and the diffusers library.
News Highlights: The uv Python packaging tool now offers comprehensive project management, tool installation, and support for single-file scripts; and Tach, written in Rust, enforces strict interfaces and dependency management for Python
Here are my top 5 picks from our learning resources today:
And, in today’s Featured Study, we introduce PyRoboCOP, a Python-based package designed for optimizing robotic control and collision avoidance in complex environments.
Stay awesome!
Divya Anne Selvaraj
Editor-in-Chief
P.S.: We have covered all requests made so far this month, in this issue. This is the final week to complete this month’s survey. Do take the opportunity to tell us what you think of PythonPro, request learning resources, and earn your one Packt Credit for this month.
🐍 Python in the Tech 💻 Jungle 🌳
🗞️News
uv: Unified Python packaging: The tool now offers end-to-end project management, tool installation, Python bootstrapping, and support for single-file scripts with embedded dependencies, all within a unified, fast, and reliable interface.
Tach - Strict interfaces and dep management for Python, written in Rust: Inspired by modular monolithic architecture, Tach allows you to define dependencies and ensures that modules only import from authorized dependencies.
💼Case Studies and Experiments🔬
Using ffmpeg, yt-dlp, and gpt-4o to Automate Extraction and Explanation of Python Code from YouTube Videos: Details downloading video segments, capturing screenshots, extracting code from images using GPT, and then explaining the code with an LLM.
Packaging Python and PyTorch for a Machine Learning Application: Discusses the challenges of packaging Python and PyTorch for the Transformer Lab application, aiming for a seamless user experience across various operating systems.
📊Analysis
🎥Charlie Marsh on Astral, uv, and the Python packaging ecosystem: Discusses insights on the development of Astral's uv tool, a cargo-like tool for Python, following a significant upgrade.
CPython Compiler Hardening: Outlines the author’s process of selecting and testing compiler options, addressing challenges like excessive warnings, performance impacts, and developing tools to track and manage these warnings.
🎓 Tutorials and Guides 🤓
Flatten JSON data with different methods using Python: Techniques discussed include using pandas' json_normalize, recursive functions, the flatten_json library, custom functions, and tools like PySpark and SQL.
FastAPI Tutorial - Build APIs with Python in Minutes: Guides you through setting up a development environment, creating a FastAPI app, building a logistic regression classifier, defining data models with Pydantic, and setting up API endpoints for predictions.
What's the deal with setuptools, setup.py, pyproject.toml, and wheels?: Provides a detailed explanation of Python packaging tools and practices, offering insights and recommendations for how to approach packaging in modern projects.
Python's Preprocessor: Debunks the myth that Python lacks a preprocessor by demonstrating how Python can be extended and customized through the use of custom codecs and path configuration files.
📖Open Access Book | Kalman and Bayesian Filters in Python: Addresses the need for a practical introduction to Kalman filtering, offering accessible explanations and examples, along with exercises with answers and supporting libraries.
Python Backend Development - A Complete Guide for Beginners: Provides a step-by-step guide to building web applications, including advanced topics like asynchronous programming, performance optimization, and real-time data handling.
Working with Excel Spreadsheets in Python: Focuses on automating tasks using the openpyxl module. Read to learn about reading, writing, modifying, and formatting Excel files, and advanced features like plotting charts and integrating images.
🔑Best Practices and Advice🔏
Visualisation 101 - Choosing the Best Visualisation Type: Explores how visualizations improve data-driven decisions, focusing on understanding context, audience, and visual perception. Read to learn how to implement visualizations.
Simone's Creative Cooking Club • If You Haven't Got a Clue What "Pass by Value" or "Pass by Reference" Mean, Read On…: Demonstrates how Python handles function arguments, particularly mutable and immutable objects.
How I ask GPT-4 to make tiny Python scripts in practice: Succinctly describes starting with a basic script, then converting it into a command-line interface using click, and adding features like stdin/stdout handling and error logging.
Linear Algebra Concepts Every Data Scientist Should Know: Introduces key concepts such as vectors, vector operations, vector spaces, and matrices, with visual explanations and code examples to demonstrate their application in real-world data science tasks.
🎥Python From a Java Developer's Perspective: Provides guidance for Java developers to write Python code effectively. Watch to learn how to smoothly transition between Java and Python while leveraging your existing Java knowledge.
🔍Featured Study: Mastering Robotic Control with PyRoboCOP for Complex Tasks💥
In “PyRoboCOP: Python-based Robotic Control & Optimization Package for Manipulation and Collision Avoidance” Raghunathan et al. introduce a Python-based software package designed for the optimisation and control of robotic systems. The package excels in handling complex interactions like contact and collision avoidance, crucial for autonomous robotic manipulation.
Context
Robotic systems often operate in environments with numerous obstacles and objects, making it essential to model and optimise these interactions mathematically. These interactions, defined by complementarity constraints, are challenging to manage because they do not follow standard optimisation assumptions. Most existing physics engines simulate these interactions but do not offer real-time optimisation capabilities. PyRoboCOP addresses this gap by providing a flexible and user-friendly package that allows robots to reason about their environment and optimise their behaviour, which is critical for achieving autonomous manipulation tasks.
Key Features of PyRoboCOP
PyRoboCOP is characterised by its ability to automatically reformulate complex mathematical constraints and integrate seamlessly with powerful optimisation tools. Key features include:
Automatic Reformulation of Complementarity Constraints: Handles difficult constraints that describe object interactions.
Direct Transcription via Orthogonal Collocation: Converts DAEs into a solvable set of nonlinear equations.
Integration with ADOL-C and IPOPT: Supports automatic differentiation and efficient optimisation.
Built-in Support for Contact and Obstacle Avoidance Constraints: Simplifies the setup of complex robotic tasks.
Flexible User Interface: Allows for customisation and adaptation to various robotic systems.
What This Means for You
The package is particularly relevant for researchers, developers, and engineers working in the field of robotics, especially those involved in designing autonomous systems that require precise control and optimisation. PyRoboCOP’s ability to handle complex robotic interactions makes it a valuable tool for developing real-time, model-based control solutions in environments where contact and collision avoidance are critical.
Examining the Details
PyRoboCOP's performance was rigorously tested across several robotic scenarios, including planar pushing, car parking, and belt drive unit assembly. In a planar pushing task, PyRoboCOP optimised the robot's trajectory, balancing a normal force of 0.5 N and a friction coefficient of 0.3, successfully navigating from (0,0,0)(0,0,0)(0,0,0) to (0.5,0.5,0)(0.5,0.5,0)(0.5,0.5,0) and (−0.1,−0.1,3π/2)(−0.1,−0.1,3π/2)(−0.1,−0.1,3π/2). In a car parking scenario, the software optimised movement from (1,4,0,0)(1,4,0,0)(1,4,0,0) to (2,2.5,π/2,0)(2,2.5,π/2,0)(2,2.5,π/2,0), effectively avoiding obstacles. PyRoboCOP also managed the complex task of assembling a belt drive unit, demonstrating its ability to handle intricate manipulations. When benchmarked against CasADi and Pyomo, PyRoboCOP showed comparable performance, solving an acrobot system in a mean time of 2.282 seconds with 1,296 variables, versus CasADi's 1.175 seconds with 900 variables and Pyomo's 2.374 seconds with 909 variables.
You can learn more by reading the entire paper or access the package here.
Take the Survey, Get a Packt Credit!
🧠 Expert insight 📚
Here’s an excerpt from “Chapter 2: Surveying GenAI Types and Modes: An Overview of GANs, Diffusers, and Transformers” in the book, Generative AI Foundations in Python by Carlos Rodriguez, published in July 2024.
Applying GAI models – image generation using GANs, diffusers, and transformers
In this hands-on section…You’ll get a first-hand experience and deep dive into the
actual implementation of generative models, specifically GANs, diffusion models, and transformers….
We’ll be utilizing the highly versatile PyTorch library, a popular choice among machine learning practitioners, to facilitate our operations. PyTorch provides a powerful and dynamic toolset to define and compute gradients, which is central to training these models.
In addition, we’ll also use the diffusers library. It’s a specialized library that provides functionality to implement diffusion models. This library enables us to reproduce state-of-the-art diffusion models directly from our workspace. It underpins the creation, training, and usage of denoising diffusion probabilistic models at an unprecedented level of simplicity, without compromising the models’ complexity.
Through this practical session, we’ll explore how to operate and integrate these libraries and implement and manipulate GANs, diffusers, and transformers using the Python programming language. This hands-on experience will complement the theoretical knowledge we have gained in the chapter, enabling us to see these models in action in the real world….
Working with Jupyter Notebook and Google Colab
Jupyter notebooks enable live code execution, visualization, and explanatory text, suitable for prototyping and data analysis. Google Colab, conversely, is a cloud-based version of Jupyter Notebook, designed for machine learning prototyping. It provides free GPU resources and integrates with Google Drive for file storage and sharing. We’ll leverage Colab as our prototyping environment going forward.
Stable diffusion transformer
We begin with a pre-trained stable diffusion model, a text-to-image latent diffusion model created by researchers and engineers from CompVis, Stability AI, and LAION (Patil et al., 2022). The diffusion process is used to draw samples from complex, high-dimensional distributions, and when it interacts with the text embeddings, it creates a powerful conditional image synthesis model.
The term “stable” in this context refers to the fact that during training, a model maintains certain properties that stabilize the learning process. Stable diffusion models offer rich potential to create entirely new samples from a given data distribution, based on text prompts.
Again, for our practical example, we will Google Colab to alleviate a lot of initial setups. Colab also provides all of the computational resources needed to begin experimenting right away. We start by installing some libraries, and with three simple functions, we will build out a minimal StableDiffusionPipeline using a well-established open-source implementation of the stable diffusion method.
First, let’s navigate to our pre-configured Python environment, Google Colab, and install the diffusers open-source library, which will provide most of the key underlying components we need for our experiment.
In the first cell, we install all dependencies using the following bash command. Note the exclamation point at the beginning of the line, which tells our environment to reach down to its underlying process and install the packages we need:
!pip install pytorch-fid torch diffusers clip transformers accelerate
Next, we import the libraries we’ve just installed to make them available to our Python program:
from typing import List
import torch
import matplotlib.pyplot as plt
from diffusers import StableDiffusionPipeline, DDPMScheduler
Now, we’re ready for our three functions, which will execute the three tasks – loading the pre-trained model, generating the images based on prompting, and rendering the images:
def load_model(model_id: str) -> StableDiffusionPipeline:
"""Load model with provided model_id."""
return StableDiffusionPipeline.from_pretrained(
model_id,
torch_dtype=torch.float16,
revision="fp16",
use_auth_token=False
).to("cuda")
def generate_images(
pipe: StableDiffusionPipeline,
prompts: List[str]
) -> torch.Tensor:
"""Generate images based on provided prompts."""
with torch.autocast("cuda"):
images = pipe(prompts).images
return images
def render_images(images: torch.Tensor):
"""Plot the generated images."""
plt.figure(figsize=(10, 5))
for i, img in enumerate(images):
plt.subplot(1, 2, i + 1)
plt.imshow(img)
plt.axis("off")
plt.show()
In summary, load_model loads a machine learning model identified by model_id onto a GPU for faster processing. The generate_images function takes this model and a list of prompts to create our images. Within this function, you will notice torch.autocast("cuda"), which is a special command that allows PyTorch (our underlying machine learning library) to perform operations faster while maintaining accuracy. Lastly, the render_images function displays these images in a simple grid format, making use of the matplotlib visualization library to render our output.
With our functions defined, we select our model version, define our pipeline, and execute our image generation process:
# Execution
model_id = "CompVis/stable-diffusion-v1-4"
prompts = [
"A hyper-realistic photo of a friendly lion",
"A stylized oil painting of a NYC Brownstone"
]
pipe = load_model(model_id)
images = generate_images(pipe, prompts)
render_images(images)
The output in Figure 2.1 is a vivid example of the imaginativeness and creativity we typically expect from human art, generated entirely by the diffusion process. Except, how do we measure whether the model was faithful to the text provided?
Figure 2.1: Output for the prompts “A hyper-realistic photo of a friendly lion” (left) and “A stylized oil painting of a NYC Brownstone” (right)
The next step is to evaluate the quality and relevance of our generated images in relation to the prompts. This is where CLIP comes into play. CLIP is designed to measure the alignment between text and images by analyzing their semantic similarities, giving us a true quantitative measure of the fidelity of our synthetic images to the prompts.
Scoring with the CLIP model
CLIP is trained to understand the relationship between text and images by learning to place similar images and text near each other in a shared space. When evaluating a generated image, CLIP checks how closely the image aligns with the textual description provided. A higher score indicates a better match, meaning the image accurately represents the text. Conversely, a lower score suggests a deviation from the text, indicating a lesser quality or fidelity to the prompt, providing a quantitative measure of how well the generated image adheres to the intended description.
Again, we will import the necessary libraries:
from typing import List, Tuple
from PIL import Image
import requests
from transformers import CLIPProcessor, CLIPModel
import torch
We begin by loading the CLIP model, processor, and necessary parameters:
# Constants
CLIP_REPO = "openai/clip-vit-base-patch32"
def load_model_and_processor(
model_name: str
) -> Tuple[CLIPModel, CLIPProcessor]:
"""
Loads the CLIP model and processor.
"""
model = CLIPModel.from_pretrained(model_name)
processor = CLIPProcessor.from_pretrained(model_name)
return model, processor
Next, we define a processing function to adjust the textual prompts and images, ensuring that they are in the correct format for CLIP inference:
def process_inputs(
processor: CLIPProcessor, prompts: List[str],
images: List[Image.Image]) -> dict:
"""
Processes the inputs using the CLIP processor.
"""
return processor(text=prompts, images=images,
return_tensors="pt", padding=True)
In this step, we initiate the evaluation process by inputting the images and textual prompts into the CLIP model. This is done in parallel across multiple devices to optimize performance. The model then computes similarity scores, known as logits, for each image-text pair. These scores indicate how well each image corresponds to the text prompts. To interpret these scores more intuitively, we convert them into probabilities, which indicate the likelihood that an image aligns with any of the given prompts:
def get_probabilities(
model: CLIPModel, inputs: dict) -> torch.Tensor:
"""
Computes the probabilities using the CLIP model.
"""
outputs = model(**inputs)
logits = outputs.logits_per_image
# Define temperature - higher temperature will make the distribution more uniform.
T = 10
# Apply temperature to the logits
temp_adjusted_logits = logits / T
probs = torch.nn.functional.softmax(
temp_adjusted_logits, dim=1)
return probs
Lastly, we display the images along with their scores, visually representing how well each image adheres to the provided prompts:
def display_images_with_scores(
images: List[Image.Image], scores: torch.Tensor) -> None:
"""
Displays the images alongside their scores.
"""
# Set print options for readability
torch.set_printoptions(precision=2, sci_mode=False)
for i, image in enumerate(images):
print(f"Image {i + 1}:")
display(image)
print(f"Scores: {scores[i, :]}")
print()
With everything detailed, let’s execute the pipeline as follows:
# Load CLIP model
model, processor = load_model_and_processor(CLIP_REPO)
# Process image and text inputs together
inputs = process_inputs(processor, prompts, images)
# Extract the probabilities
probs = get_probabilities(model, inputs)
# Display each image with corresponding scores
display_images_with_scores(images, probs)
We now have scores for each of our synthetic images that quantify the fidelity of the synthetic image to the text provided, based on the CLIP model, which interprets both image and text data as one combined mathematical representation (or geometric space) and can measure their similarity.
Figure 2.2: CLIP scores
For our “friendly lion,” we computed scores of 83% and 17% for each prompt, which we can interpret as an 83% likelihood that the image aligns with the first prompt.
Packt library subscribers can continue reading the entire book for free. You can buy Generative AI Foundations in Python by Carlos Rodriguez, here.
And that’s a wrap.
We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most useful here. The complete PythonPro archives can be found here.
If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, leave a comment below!