2024 Paris Olympics Time Series Predictions with Snowflake ML

Published on August 28, 2024

/

2024 Paris Olympics Time Series Predictions with Snowflake ML

Published on August 28, 2024 | 1 mins read

We Predicted the 2024 Olympic Medal Outcomes with Snowflake ML— Here’s What Happened

Snowflake ML is a new public offering that allows for no-code predictive modeling directly from the Snowsight UI, and new features are constantly being introduced. We tested out Snowflake ML’s forecasting abilities with some simple time series models aiming to answer one question: What percentage of medals will go to each country competing in the 2024 Paris Summer Olympics?

Note: You can access a quickstart guide, or look for the Snowflake AI & ML Studio under AI & ML in the Snowsight UI (no coding required!).

Data

The data that we used to train our predictive models was sourced from Kaggle. We applied basic data cleaning such as converting Olympic dates into usable time values, null handling, and accounting for countries & sports that no longer exist or were not present in the 2024 Olympics.

Predictions

These predictions are the sum of four separate models generated to forecast per-country medal outcomes both by each individual medal type and by total medals. The training dataset consists only of medal count time series data from previous Olympics for each country, without any exogenous variables (additional independent variables like the sport or competing athletes that help inform predictions). We aimed to keep the model as simple as possible, anticipating that the current iteration of Snowflake ML would excel in straightforward time series modeling with minimal hyperparameter tuning, which involves adjusting the model specifications to influence how it generates predictions.

Total medal counts are not always consistent from year to year due to certain sports entering and exiting the Olympic rotation, so percentages are preferable for displaying this sort of data. Displayed below are the countries with the highest total medal forecasts, with the notable addition of South Korea (we’ll explain why in the Results section).

Rank (Total Medals)	Country	Gold Medals Forecast	Silver Medals Forecast	Bronze Medals Forecast	Total Medals Forecast
1	United States of America	11.7%	12.3%	10.2%	11.3%
2	People’s Republic of China	12.3%	10.3%	8.9%	10.4%
3	Great Britain	8.8%	8.1%	5.2%	7.2%
4	Japan	5.7%	3.5%	5.2%	4.9%
5	Germany	5.0%	2.9%	4.7%	4.3%
6	Italy	4.1%	3.5%	4.2%	4.0%
7	Australia	3.2%	2.6%	5.0%	3.7%
8	France	2.5%	5.5%	3.1%	3.7%
9	Netherlands	2.5%	3.5%	2.4%	2.8%
10	Canada	1.6%	2.3%	3.4%	2.5%
16	South Korea	2.5%	0.0%	2.6%	1.8%

Results

Though the final placements weren’t always spot on, we managed to predict the top ten in total medals, with the exception of South Korea, using a relatively basic predictive model. The models predicted that South Korea would have zero silver medals, a scenario that has not occurred since the 1960 Rome Olympics, with an upper bound of 15. This prediction is likely due to the all-time Olympic finishes used in the training dataset. If we were to iterate on this model, we would introduce date filters and other independent variables, such as the sports category, to reduce the variation in these predictions. In the case of South Korea, they haven’t won fewer than three silver medals since the 1984 Los Angeles Olympics, though they only earned 1-2 silver medals in all previous Olympic Games following the country’s establishment in 1945. This explains much of the variance we saw for South Korea compared to other frontrunner countries.

In summary, this model may not be perfectly accurate and doesn’t have the strongest evaluation metrics, which are statistical values that give us more information on how well the model can generate predictions. However, with minimal data preparation and the user-friendly Snowflake AI & ML Studio, we were able to generate solid baseline predictions that closely aligned with the results of the 2024 Olympics. This is where I believe Snowflake ML solutions truly shine.

Rank (Total Medals)	Country	Gold Medals Actual (±% vs prediction)	Silver Medals Actual (±% vs prediction)	Bronze Medals Actual (±% vs prediction)	Total Medals Actual (±% vs prediction)
1	United States of America	12.2% (+0.5%)	13.5% (+1.2%)	11.0% (+0.8%)	12.2% (+0.9%)
2	People’s Republic of China	12.2% (-0.1%)	8.3% (-2.0%)	6.3% (-2.6%)	8.8% (-1.6%)
3	Great Britain	4.3% (-4.5%)	6.7% (-1.4%)	7.6% (+2.4%)	6.3% (-0.9%)
4	France	4.9% (+2.4%)	8.0% (+2.5%)	5.8% (+2.7%)	6.2% (+2.5%)
5	Australia	5.5% (+2.3%)	5.8% (+3.2%)	4.2% (-0.8%)	5.1% (+1.4%)
6	Japan	6.1% (+0.4%)	3.7% (+0.2%)	3.4% (-1.8%)	4.3% (-0.6%)
7	Italy	3.7% (-0.4%)	4.0% (+0.5%)	3.9% (-0.3%)	3.9% (-0.1%)
8	Netherlands	4.6% (+2.1%)	2.1% (-1.4%)	3.1% (+0.7%)	3.3% (+0.5%)
9	Germany	3.7% (-1.3%)	4.0% (+1.1%)	2.1% (-2.6%)	3.2% (-1.1%)
10	South Korea	4.0% (+1.5%)	2.8% (+2.8%)	2.6% (0.0%)	3.1% (+1.5%)
11	Canada	2.7% (+1.1%)	2.1% (-0.2%)	2.9% (-0.5%)	2.6% (+0.1%)

Limitations

While we were able to generate relatively accurate predictions with limited user input, the data that we based our predictions on was simplistic. The data consisted of historical Olympic medal counts by medal type and total for each country participating in the 2024 Summer Olympics. However, we excluded additional variables because they complicated data preparation within the constraints of Snowflake ML.

For example, we encountered problems predicting medal counts by sport and country because Snowflake ML currently does not accept duplicate independent categorical values for each time value. Specifically, our data had multiple rows for the 2020 Summer Olympics in Rowing, representing the various countries that competed, and Snowflake ML currently struggles to handle those “duplicates.” There are simple ways that we could get around this, such as filtering to each individual sport and generating a different model for each, but that was a rabbit hole we wanted to avoid going down for this project.

Forecasting models with Snowflake ML gave us a solid prediction baseline and allowed us to calculate for feature importance – i.e. how much each input variable affects our final prediction – and model evaluation metrics, while also allowing for tuning the methods of cross validation. For the non-data-scientists, cross validation is when the data is split into groups, with each group being used as the training data set with the rest of the data being used for training so that the model can be trained & averaged based on how it handles unseen data.

However, this model operates more like a black box compared to those generated by other methods, and its flexibility remains very limited. The forecasting documentation outlines how the algorithm functions, but users cannot adjust the algorithm, account for seasonality, or iterate on the models. To make updates or add new data, users must generate a new model. That being said, this is still a relatively new offering, and I anticipate that future releases will address these limitations. Snowflake ML is likely to become a powerful tool for time series modeling, even for those without a strong background in machine learning.

Conclusions and Takeaways

Snowflake ML is a robust new tool that does a great job at creating baseline models based on simplistic data while requiring little to no experience working with Machine Learning. The key limitation of this solution, especially when compared to other no-code platforms like Dataiku and DataRobot, is the absence of model selection and hyperparameter tuning options. Different model types for each application come with their advantages and disadvantages, and hyperparameter tuning allows for adjusting how the model splits the data and makes predictions.

This is still a new offering that will continue to evolve and improve over time. Despite its current limitations, having easy-to-use ML capabilities directly within the Snowsight UI is a significant advancement. With new features such as Cortex Analyst and Cortex Search being added each month, I feel that the future of Snowflake ML remains bright.

DAS42: Your Partner in Leveraging Snowflake’s Innovations

Need assistance with the latest Snowflake features and how to utilize them effectively? As a Snowflake Elite Services Partner, DAS42 can help you unlock the full potential of your data, optimize your current workflows, and transition from legacy platforms to a comprehensive, secure, and tailored cloud-based solution.

Get Started Today

DAS42 is a premier data and analytics consultancy with a modern point of view. We specialize in solving some of the most complex business challenges for the world’s most successful companies. As a Snowflake Elite Partner, DAS42 crafts customized strategies that create a single source of truth and enable enhanced and faster decision-making. DAS42 has a presence across the U.S. with primary offices in New York City and Denver. Connect with us at das42.com and stay updated on LinkedIn. Join us today on our journey to help you realize the possibilities of transforming your business through data and analytics.