200 Python Exercises Generated Iteratively with Claude Code
Introduction
As part of my interview prep journey, I needed to practice my Python skills and statistical concepts. Instead of spending dozens of dollars on interview prep platforms or prompting AI for exercises one by one, I decided to scale up my AI‑generated Python questions and create a database of 200 Python exercises I can use to practice.
This database was created with Data Scientists in mind, covering topics relevant to day‑to‑day tasks, less frequent but important work, and deeper technical knowledge that comes up from time to time (we need to flex these muscles now and then, friends).
Hopefully, you'll find this as useful as I have.
What Is Included
200 Python coding exercises, many of which aren’t just about writing functions or wrangling data, but about performing actual DS professional tasks. They’re not meant only to prep you for a Python coding interview; they also help you refresh concepts that are paramount for us to know as professionals. The topics included are:
topic | sub_topic | topic_difficulty |
---|---|---|
data manipulation | DataFrame creation and inspection | beginner |
data manipulation | Filtering with boolean masks | beginner |
data manipulation | GroupBy aggregations | beginner |
data manipulation | Joins and merges | beginner |
data manipulation | Pivot and melt | beginner |
data manipulation | Datetime parsing and resampling | beginner |
Expand to show the remaining topics
topic | sub_topic | topic_difficulty |
---|---|---|
data manipulation | Missing data handling and imputation | beginner |
data manipulation | Window functions: rolling and expanding | intermediate |
data manipulation | Vectorization vs apply | intermediate |
data manipulation | Memory and performance optimization | expert |
statistics and causal inference | Descriptive stats and distributions | beginner |
statistics and causal inference | CLT and sampling distributions | beginner |
statistics and causal inference | Confidence intervals | beginner |
statistics and causal inference | Hypothesis tests (t and z) | beginner |
statistics and causal inference | Power analysis and MDE | intermediate |
statistics and causal inference | Linear regression (OLS) and diagnostics | intermediate |
statistics and causal inference | Logistic regression and odds ratios | intermediate |
statistics and causal inference | Fixed effects and panel regression | intermediate |
statistics and causal inference | Difference-in-differences | expert |
statistics and causal inference | Propensity score methods | expert |
experimentation | Defining units and exposure | beginner |
experimentation | Randomization and hashing | beginner |
experimentation | Sample size and power planning | beginner |
experimentation | AA tests and SRM checks | intermediate |
experimentation | Guardrail metrics selection | intermediate |
experimentation | CUPED variance reduction | intermediate |
experimentation | Multiple testing control (FDR) | intermediate |
experimentation | Sequential testing and alpha spending | expert |
experimentation | Clustered and geo experiments | expert |
experimentation | Switchback designs for platforms | expert |
How I Built This
I used Claude Code to iterate through multiple combinations of topics and difficulty levels, experimenting with different exercise formats and validation approaches. The iterative process helped refine the exercise structure and ensure comprehensive coverage of Python concepts.
Process
-
I explained the task and objectives to Claude, fed it the above table of topics and difficulty levels, and asked it to create a set of 15–20 datasets that I could use.
-
I created a list of tasks, each specifying how many exercises to create given a topic, subtopic, and exercise difficulty:
{ "group_id": 19, "topic": "statistics_and_causal_inference", "subtopic": "Difference-in-differences", "topic_difficulty": "expert", "exercise_count": 9, "difficulty_split": {"hard": 5, "hells_of_flame": 4}, "datasets": ["employee_panel", "geo_experiment", "student_performance"], "id_range": "stat_056 to stat_064" }
- I manually asked Claude to go through the groups, executing the tasks (creating exercises following the guidelines).
gif_image
- Once I had the initial output, I used the Claude API to review each exercise individually—checking for accuracy, appropriate difficulty calibration, and clear problem statements—which improved the overall quality of the exercises.
Instructions
- Initialize the instructor (ensure JSON files are in the same directory):
instructor = PythonInstructor()
- Get an exercise by topic and difficulty:
exercise = instructor.get_exercise(topic='loops', difficulty='beginner')
- Display the exercise prompt:
print(exercise.get('exercise'))
- Attempt to solve the problem:
# ... your solution code here ...
- View the solution when ready:
print(exercise.get('solution')) print(exercise.get('expl'))
How to Get the Package?
Send me a LinkedIn invite and slide into my DMs!