PythonPro #68: Python 3.14 Changes, Google’s Agent Development Kit, Genkit for AI Apps, and Template Strings (PEP 750)
Welcome to a brand new issue of PythonPro!
News Highlights: Python 3.14 set to brings key changes like PEP 765 and deferred annotations; Google’s ADK enables AI agent development in Python with Cloud integration; Genkit adds Python support for building structured, observable AI apps; and PEP 750 proposes Template Strings for cleaner package metadata.
My top 5 picks from today’s learning resources:
And, in From the Cutting Edge, we introduce DataRec, a Python library that standardises dataset handling in recommender systems research, enabling reproducible, transparent, and framework-agnostic data preprocessing, filtering, and splitting.
Stay awesome!
Divya Anne Selvaraj
Editor-in-Chief
🐍 Python in the Tech 💻 Jungle 🌳
🗞️News
🎥Python 3.14 | Upcoming Changes: Previews major changes in Python 3.14 due in a month, including PEP 765 (disallowing
return
,break
,continue
infinally
), deferred type annotation evaluation (PEP 648), and more.Google’s Agent Development Kit (ADK): Google has released an open-source Python toolkit for building, evaluating, and deploying AI agents with fine-grained control, offering code-first orchestration, multi-agent design, and seamless integration with Google Cloud services.
Announcing Genkit for Python and Go: Genkit for Python (Alpha) is an open-source framework for building AI applications with structured output, tool integration, and observability supporting models from Google, OpenAI, and more.
PEP 750 – Template Strings: PEP 750 which proposes a standard metadata field,
Unstable-Interface
, to mark Python packages with unstable APIs, has been accepted.RxInferServer – Remote Bayesian Inference from Python via Julia: This newly released Julia package automates efficient inference in complex probabilistic models, offering performance and extensibility for AI applications with thousands of latent variables.
💼Case Studies and Experiments🔬
"Verified" "Compilation" of "Python" with Knuckledragger, GCC, and Ghidra: Presents a workflow for translating Python functions into C, compiling them, and formally verifying the resulting assembly.
Elliptical Python Programming: A humorous essay exploring Python's flexibility and quirks through intentionally obscure syntax using comparison operations and ellipses to represent integers and executable code.
📊Analysis
Python Performance: Why 'if not list' is 2x Faster Than Using len(): Dissects CPython's bytecode execution, memory layout, and instruction specialisation to explain impacts on performance.
Python at the Speed of Rust: Introduces Function, a compiler that converts Python functions into native code using symbolic tracing and type annotations, achieving near-Rust performance.
🎓 Tutorials and Guides 🤓
Building Transformers from Scratch: Presents a comprehensive, code-driven walkthrough of implementing a GPT-2 style Transformer model entirely from scratch using NumPy, covering tokenization, embeddings, and more.
Open Access Course | Computational Fluid Dynamics (CFD) with high-performance Python programming: A 20-step online course covering core PDEs, array operations with NumPy, and advanced methods like JAX, implicit solvers, and the Lattice Boltzmann Method.
DNS Server in Python: Details the implementation of a custom local DNS server in Python, featuring caching, blocklist support, and upstream resolution.
Dropping Values (#2 in The `itertools` Series • `dropwhile()` and `takewhile()`): Explains how to use these functions to efficiently filter elements from the beginning of an iterable based on a condition, offering more concise and performant alternatives to traditional
for
loops.From Unstructured Text to Interactive Knowledge Graphs Using LLMs: Describes how to use LLMs to extract structured subject–predicate–object triples from unstructured text, standardise and infer relationships, and render the results as interactive knowledge graphs in a browser.
The Magic of Manacher’s Algorithm: explains how Manacher’s Algorithm efficiently finds the longest palindromic substring in O(n) time by transforming the input and leveraging symmetry to minimise redundant computations.
How Much YouTube Is Actually Ads? A Data-Driven Look at Sponsorships: Walks you through using open data and SQL/Python tooling to quantify YouTube sponsor trends, identify high-ad-density channels, and apply time-based algorithms like sweep line to visualise ad placement.
🔑Best Practices and Advice🔏
Speed up exploratory data analysis with Buckaroo: Introduces an open-source Jupyter extension that streamlines exploratory data analysis by displaying histograms, null counts, and summary statistics for DataFrames in a single interactive view.
Graceful API Failure 101 for Data Scientists: Shows how data scientists can use Python decorators to handle API failures in long-running pipelines more cleanly, using retry logic and skip strategies for timeouts and oversized inputs in Gemini API calls.
Visualizing Recursion Trees: Details the author's iterative process of developing an interactive visualization tool for recursive functions, the challenges faced with various technologies, and the insights gained into effective collaboration with LLMs.
Essential Tips for Python Developers - Crafting Standout Documentation: Explains how to create documentation that improves usability, supports diverse user needs, increases adoption, and reduces support burdens.
Python Best Practices Every Coder Should Know: Outlines best practices such as using
ExitStack()
for managing multiple contexts, following consistent naming conventions, avoiding hardcoded secrets, safely accessing dictionary keys with.get()
, and usingmatch
for cleaner conditionals.
🔍From the Cutting Edge: DataRec—A Python Library for Standardized and Reproducible
Data Management in Recommender Systems💥
In "DataRec: A Python Library for Standardized and Reproducible
Data Management in Recommender Systems," Mancino et al. from Politecnico di Bari and Université Paris-Saclay introduce a Python library designed to standardise and simplify data handling in recommender system research. This work was accepted at SIGIR 2025.
Context
Recommender systems are central to modern digital platforms, influencing decisions in e-commerce, media, and social networks. Despite substantial progress in algorithms and evaluation, the reproducibility of experiments remains a challenge—particularly due to inconsistencies in data preprocessing, filtering, and splitting. Existing frameworks each handle these processes differently, leading to fragmented methodologies and results that cannot easily be compared. DataRec addresses this gap by providing a unified, reproducible approach to data management.
Key Features of DataRec
Standardised Data Handling: Provides reproducible routines for dataset preparation, filtering, and splitting, based on practices observed in 55 recent recommendation studies.
Built-in Dataset Access: Direct access to 18 widely-used datasets with explicit versioning and referencing to ensure traceability.
Flexible Input/Output: Supports tabular, inline, and JSON formats, and can export datasets in formats compatible with popular frameworks like RecBole, Cornac, Elliot, and more.
Processing Tools: Includes tools such as binarisation, k-core filtering (user, item, iterative), and rating-based filtering.
Splitting Strategies: Implements multiple splitting methods—random, temporal, leave-one-out, and pre-computed—supporting user-stratified evaluation.
Reproducibility Support: Tracks all operations, allows random seed setting, and generates YAML config files with checksums for full reproducibility.
What This Means for You
DataRec is particularly valuable for researchers, developers, and students working on recommender systems. If you have struggled with reproducing results across different studies or frameworks, or need to ensure traceable dataset handling in your experiments, DataRec provides a consistent foundation. It also simplifies integration with existing pipelines, whether you are using general-purpose frameworks or domain-specific toolkits.
Examining the Details
The library’s development was driven by an extensive survey of 55 papers published between 2020 and 2024, covering areas such as graph neural networks, contrastive learning, and reinforcement learning. This meta-analysis identified inconsistencies in how datasets are referenced, filtered, and split—issues that DataRec explicitly seeks to correct.
Dataset referencing, for example, was found to be unreliable: only 35% of papers referenced original sources; others linked to modified versions or broken links. DataRec counters this with built-in dataset access and public checksums. It supports transformation of raw data using filtering methods that mirror common practice, and provides traceable exports to major frameworks.
In contrast to monolithic recommendation frameworks, which are often non-interoperable, DataRec is modular and library-focused. This enables it to act as a shared layer for dataset handling, without duplicating model training or evaluation logic. The architecture is centred on a primary DataRec
class backed by modules for I/O, processing, and splitting. Version control, detailed logging, and exportable configurations ensure that results can be reliably reproduced across different environments and research groups.
You can learn more by reading the entire paper or accessing the library on GitHub.
And that’s a wrap.
We have an entire range of newsletters with focused content for tech pros. Subscribe to the ones you find the most useful here. The complete PythonPro archives can be found here.
If you have any suggestions or feedback, or would like us to find you a Python learning resource on a particular subject, just leave a comment below!