Alright, so yesterday I was messing around trying to figure out how to pull some historical MLB game data – specifically, the Toronto Blue Jays versus the Milwaukee Brewers. Figured it would be a fun little side project.
First things first, I started by googling. Obvious, right? I searched for “MLB API” and “historical baseball data API.” Landed on a few potentially useful pages, but most of them were either outdated or required some serious cash to get access. I’m trying to do this on the cheap, you know?
Then, I remembered hearing about some open-source baseball data repositories. So, I shifted my search to “open source MLB data.” Found a few GitHub repos that looked promising. One that stood out had CSV files with game data going back years. Jackpot!
Next, I cloned the repo to my local machine. Gotta have the data at my fingertips, right? After poking around, I found the CSV files I needed, which had game-by-game results. It was a bit messy – lots of columns I didn’t need, and the formatting wasn’t exactly consistent across all the files, but hey, it was free.
Time to get my hands dirty with Python and Pandas. Loaded up the CSVs into Pandas DataFrames. This is where the fun began. Had to do some cleaning. I started by filtering the data to only include games where the Blue Jays and Brewers played each other. This involved checking the ‘home_team’ and ‘away_team’ columns.
Then came the data wrangling. The dates were in a weird format, so I used Pandas to convert them to datetime objects. Makes it way easier to work with. I also extracted the scores for each team from the relevant columns. Lot of trial and error here, figuring out which columns actually contained the data I needed.

After that, I calculated some basic stats. Stuff like total wins for each team, average score per game, and win percentages when playing at home versus away. Just simple descriptive stuff to get a feel for the data.
Finally, I visualized some of the results. Used Matplotlib to create a few bar charts showing the win distribution and line graphs showing the trend of average scores over the years. Nothing fancy, but it helped to see the data in a more digestible format.
End result? I got a decent overview of the Toronto Blue Jays versus Milwaukee Brewers matches over the years. It was a bit of a grind cleaning the data, but hey, that’s always part of the process. Definitely learned a thing or two about data manipulation in Pandas. Next step? Maybe try to predict the outcome of future games using some machine learning. But that’s a project for another day!