PythonPro #60: XGBoost Multithreading, Python 3.14 Updates, REST API Frameworks, and Quantum ML with sQUlearn

Jan 30, 2025

Happy New Year! We’re back! Did you try your hand at any exciting Python projects over the holidays that you’d like to share? If yes, leave a comment at the end and let me know. If it’s brilliant, we’ll share what you made in next week's issue.

In today’s Expert Insight we bring you an excerpt from the recently published book, XGBoost for Regression Predictive Modeling and Time Series Analysis, which demonstrates the power of XGBoost's multithreaded capabilities, showcasing how adjusting the nthread parameter can significantly accelerate model training by utilizing multiple CPU cores, as illustrated through a practical example with the California housing dataset.

News Highlights: Python 3.14.0 alpha 4 introduces features like PEP 649 for deferred annotations and improved error messages; Python wins Tiobe's Programming Language of the Year 2024 with a 9.3% popularity surge; and a new PEP proposes SBOMs for better package security and dependency tracking.

My top 5 picks from today’s learning resources:

A technical intro to Ibis: The portable Python DataFrame library🐦
How to Split a Python List or Iterable Into Chunks🍰
Build a chatbot web app under 5min in Python🤖
Five Key Lessons for Google Earth Engine Beginners🌍
Choosing your Python REST API framework🔧

And, in From the Cutting Edge, we introduce sQUlearn, a Python library for quantum machine learning that integrates seamlessly with classical tools like scikit-learn, offering high-level APIs, low-level customisation, and robust support for NISQ devices.

Stay awesome!

Divya Anne Selvaraj

Editor-in-Chief

Sign Up | Advertise

🐍 Python in the Tech 💻 Jungle 🌳

🗞️News

Python 3.14.0 alpha 4 is out: This early developer preview showcases new features like PEP 649 (deferred annotation evaluation), PEP 741 (Python configuration C API), and improved error messages.
Python wins Tiobe language of the year honors: Python won Tiobe's Programming Language of the Year 2024 with a 9.3% popularity increase, surpassing other languages like Java and Go.
Software bill-of-materials (SBOMs) docs eyed for Python packages: A new Python Enhancement Proposal (PEP) suggests incorporating SBOM documents into Python packages to improve dependency tracking and vulnerability analysis.

💼Case Studies and Experiments🔬

A Scheme for Network Programmability and Backup Automation Using Python Netmiko Library on Cisco; the Case Study of the Komfo Anokye Teaching Hospital Local Area Network: Presents a Python-based framework that addresses the inefficiencies of manual processes, achieving a 99% reduction in backup time, a 100% success rate, and enhanced resource utilization.
Change Python's syntax with the "# coding:" trick: Demonstrates a playful yet risky manipulation of Python's behavior by creating a codec that converts braces into indentation, enabling alternative syntax like using {} for blocks.

📊Analysis

A technical intro to Ibis: The portable Python DataFrame library: Introduces Ibis which simplifies multi-backend data workflows through lazy evaluation, backend-agnostic code, and seamless backend translation.
A Deeper Look into Node.js Docker Images: Help, My Node Image Has Python!: Analyzes various Node.js Docker images, comparing their sizes, security vulnerabilities, and use cases.

🎓 Tutorials and Guides 🤓

Embedding Python in Rust (for tests): Covers setting up a Python interpreter within a Rust project, exposing Rust functions to Python, handling dynamic types, and building a test runner.
How to Visualize your Python Project’s Dependency Graph: Explains how to visualize a Python project's dependency graph using the Tach tool to define module boundaries, sync dependencies, and visualize the graph in a browser or with GraphViz.
Build a chatbot web app under 5min in Python: Provides a beginner-level tutorial for building a chatbot web app in Python using Dash, Dash-Chat, and OpenAI's GPT models.
Deploying the Python Masonite framework on Lambda: Covers creating a Lambda function, configuring the lambda_handler , and setting up CI/CD with GitHub Actions to automate deployments.
The Shortest Python `import` Tutorial • A Picture Story: Explains the three main ways of importing in Python: importing the whole module, importing specific items, and importing everything with a wildcard.
How to Split a Python List or Iterable Into Chunks: Techniques covered include using Python’s standard library (itertools.batched() ), third-party packages like more_itertools and NumPy, and custom implementations.
Nine Pico PIO Wats with MicroPython (Part 1): Explores surprising behaviors ("Wats") in Raspberry Pi Pico's Programmable Input/Output (PIO) subsystem through a musical theremin project using MicroPython.

🔑Best Practices and Advice🔏

Choosing your Python REST API framework: Evaluates popular frameworks like FastAPI, Django REST framework, Flask-RESTX, Sanic, and Tornado, offering guidance for selecting the most suitable framework.
The Storeroom: Introduces a fix to the White Room analogy, a teaching method for explaining Python’s variable handling and namespaces and addresses the analogy's limitation in representing multiple references to the same object.
Python's Mutable vs Immutable Types: What's the Difference?: Discusses key concepts like object identity, type, and value, along with common pitfalls, such as aliasing variables, mutating function arguments, and using mutable default values.
Five Key Lessons for Google Earth Engine Beginners: Offfers tips illustrated with real-world examples, including calculating water balance and drought in a water basin in Ecuador.
Using Tree Sitter to extract insights from your code and drive your development metrics: Covers three methods for analyzing code: textual matching, syntax linting, and AST traversal, emphasizing the advantages of the latter for accuracy and extracting node values.

🔍From the Cutting Edge: sQUlearn – A Python Library for Quantum Machine LearningI💥

In the paper, "sQUlearn – A Python Library for Quantum Machine Learning," Kreplin et al. introduce sQUlearn, a Python library for quantum machine learning (QML), designed to integrate seamlessly with classical machine learning tools like scikit-learn.

Context

Quantum Machine Learning (QML) combines quantum computing and machine learning to harness quantum principles for computational efficiency and enhanced algorithmic capabilities. However, many current QML tools demand in-depth quantum computing expertise. Noisy Intermediate-Scale Quantum (NISQ) devices, while promising, pose significant challenges due to their limitations in handling deep quantum circuits. To bridge these gaps, sQUlearn focuses on NISQ-compatibility, usability, and integration with classical ML tools, particularly scikit-learn.

Key Features

sQUlearn offers:

High-Level Interfaces: Provides scikit-learn-compatible APIs for quantum kernel methods (e.g., quantum SVMs) and quantum neural networks (QNNs) for classification and regression tasks.
Low-Level Functionalities: Offers tools for designing quantum circuits, customising encodings, and performing advanced differentiation for QML research.
Quantum Kernel Methods: Supports fidelity-based and projected quantum kernels (FQK and PQK) for enhanced data embedding and efficient computation.
Flexible Execution: Enables seamless transitions between simulations and real quantum hardware using Qiskit and PennyLane backends.
Automation Features: Includes session management, result caching, error handling, and automatic restarts to simplify quantum experiment execution.
Customisation Options: Allows users to create and modify data encoding strategies, observables, and outer kernels for tailored solutions.

What This Means for You

sQUlearn simplifies quantum machine learning for both researchers and practitioners. For researchers, it offers a flexible low-level framework for exploring novel QML algorithms and quantum circuit designs. For practitioners it simplifies the deployment of QML solutions with minimal quantum-specific knowledge via high-level interfaces and pre-built models using familiar tools like scikit-learn.

Examining the Details

sQUlearn’s dual-layer architecture enables flexibility, with high-level APIs for seamless integration into machine learning workflows and low-level tools for advanced customisation. The Executor module centralises quantum job execution, handling retries, caching results, and transitioning between simulation and real hardware. It supports quantum kernel methods and neural networks while addressing noise challenges on quantum devices through built-in regularisation techniques. This focus on automation and robustness ensures the library is both reliable for practical applications and adaptable for research needs.

You can learn more by reading the entire paper or accessing the library on GitHub.

🧠 Expert insight💥

Here’s an excerpt from “Chapter 13: Deploying Your XGBoost Model” in the book, XGBoost for Regression Predictive Modeling and Time Series Analysis by Partha Pritam Deka and Joyce Weiner.

Using XGBoost’s multithreaded features

XGBoost has built-in support for multithreaded computing, which allows you to speed up model training by utilizing multiple CPU cores. You can control this by setting the nthread parameter, which determines the number of threads to use. By default,

XGBoost will automatically use the maximum number of available threads.

It’s important to note that if you’re using Dask, any value you set for nthread within XGBoost will take precedence over Dask’s default configuration. The following example demonstrates how the multithreading parameter works. We’ll revisit the California housing dataset that you worked with in Chapter 4:

Create a Python file to demonstrate XGBoost’s multithreaded functionality. We’ve started with a header and named the file multithreaded.py .
Import the necessary modules. You can load the California housing dataset from scikit-learn (sklearn ). You’ll also be using pandas , numpy , a module called time to track how long code execution takes, and, of course, xgboost :

import pandas as pd
import numpy as np
import time
import xgboost as xgb
from sklearn.metrics import r2_score
from sklearn import datasets
from sklearn.model_selection import train_test_split

Now, you can load in the California housing dataset and perform the train-test split using scikit-learn, as you did previously:

housingX, housingy = datasets.fetch_california_housing(
return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
housingX,housingy, test_size=0.2, random_state=17)

Previously, you used the scikit-learn interface for XGBoost. In this example, you’ll use the XGBoost API for Python. One difference is that XGBoost uses a data structure called a DMatrix to manipulate data. So, the first thing you need to do is convert the dataset from numpy or pandas form into Dmatrix form by using the DMatrix function and passing in the data and the labels. In this case, we’ll be using dtrain = xgb.DMatrix(X_train, y_train) for the training dataset; do the same for the test dataset:

dtrain = xgb.DMatrix(X_train, y_train)
dtest = xgb.DMatrix(X_test, y_test)

Now, the data is in a format that XGBoost can manipulate with efficiency. As mentioned in Chapter 3, XGBoost does some sorting and performs other operations on the dataset to speed up execution.

At this point, you’re ready to train a model using the XGBoost API and the multithreading feature. By default, XGBoost uses the maximum number of threads available. To see the difference, train the model with just two threads, and then increase the maximum number of logical processors you have in your computer. You’ll need to use the time module to get the computation time and print it out so that you can compare the results. First, save the start time with the following line of code:

train_start = time.time()

You can set the training parameters for XGBoost by creating a dictionary with the parameters as key-value pairs. You can configure all the parameters listed in the Hyperparameters section of Chapter 5. Here, set eta = 0.3 (the learning rate), booster = gbtree , and nthread = 2 :

param = {"eta": 0.3, "booster": "gbtree",
"nthread": 2}

Now that the training parameters have been set, you can train the model and save the end of the execution time by using the following code:

housevalue_xgb = xgb.train(param, dtrain)
train_end = time.time()

Print the execution time with a formatted print statement while subtracting train_start from train_end and converting it into milliseconds by multiplying by 103:

print ("Training time with 2 threads is :{
0:.3f}".format((train_end - train_start) * 10**3),
"ms")

Now, repeat the code and increase the number of threads XGBoost uses by changing the value of nthread . Since our computer has eight logical processors, I’ve chosen 8 :

train_start = time.time()
param = {"eta": 0.3, "booster": "gbtree",
"nthread": 8}
housevalue_xgb = xgb.train(param, dtrain)
train_end = time.time()
print ("Training time with 8 threads is :{
0:.3f}".format((train_end - train_start) * 10**3),
"ms")

To ensure the model is working as expected, you can make a prediction and check the R2 value. You can also time the prediction. To make a prediction with the Python API, just call the predict method on your model and pass the test dataset:

pred_start = time.time()
ypred = housevalue_xgb.predict(dtest)
pred_end = time.time()
print ("Prediction time is :{0:.3f}".format((
pred_end - pred_start) * 10**3), "ms")
xgb_r2 = r2_score(y_true=y_test, y_pred= ypred)
print ("XGBoost Rsquared is {0:.2f}".format(xgb_r2))

Running this script results in the following output. Please note that the execution time on your computer will be different:

Training time with 2 threads is :237.088 ms
Training time with 8 threads is :130.723 ms
Prediction time is :2.012 ms XGBoost
Rsquared is 0.76

On our computer, going from two to eight threads sped up training by over 44%. This demonstrates the benefit XGBoost provides with multithreading. Recall that by default, it will use the maximum number of threads available. Next, you’ll learn about using XGBoost with distributed compute by using Dask on Linux.

XGBoost for Regression Predictive Modeling and Time Series Analysis was published in December 2024.

Get the eBook for ~~$39.99~~ $27.98

Get the Print Book for $49.99

And that’s a wrap.

We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most useful here. The complete PythonPro archives can be found here.

If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just leave a comment below!