Jake Kolliari: Scaling Football Data Analytics in 2026

Jake Kolliari Scaling Football Data Analytics in 2026
Elite football clubs increasingly rely on complex mathematical modelling to identify undervalued talent. They have moved far beyond basic scouting intuition. Bridging the gap between raw match data and actionable recruitment strategies requires robust technical infrastructure. This transition from theory to the pitch is exactly where modern sports technologists operate. By examining the career trajectory and open-source Python scripts of Jake Kolliari, we can map the exact technical requirements needed to succeed in modern sports technology. His journey from modelling aerodynamics at BAE Systems to developing predictive frameworks at Hudl provides a clear blueprint for aspiring analysts.
The Professional Evolution of a Data Scientist UK
Transitioning into elite sports technology requires a strong foundation in applied mathematics and computer science. The pathway is rarely linear. It often involves adapting complex algorithms from other highly technical industries to solve unique problems on the football pitch.
From BAE Systems to Global Football Analytics
A background in aeronautical engineering translates exceptionally well to sports analytics. According to verified records on his Kaggle professional profile, Jake Kolliari began his career as a BAE Systems aeronautical engineer based in London. At BAE Systems, his work involved processing large datasets and simulating physical outcomes.
These engineering skills translate directly to sports analytics. Simulating airflow over an aircraft wing uses similar mathematical principles to Monte-Carlo match simulations. Both require predicting variable outcomes based on historical data inputs. When he transitioned into a dedicated Data Scientist UK role within professional football, he brought this rigorous simulation mindset to Swansea City data infrastructure. Instead of modelling physical stress, the focus shifted to Expected points (xP) simulations and squad optimisation.
Driving Strategy as a Solutions Consultant at Hudl
The role of a data scientist evolves significantly as they move from club level to global technology providers. Operating as a Solutions Consultant within Hudl global football requires a dual skill set. It demands deep technical knowledge alongside advanced sports technology project management capabilities.
In this capacity, the focus shifts from building isolated models to deploying scalable infrastructure for multiple elite organisations. A Solutions Consultant evaluates a club’s existing technical stack and recommends integrations. This might involve setting up automated pipelines that feed directly into a scouting department’s primary database. It ensures that decision makers receive accurate, real time insights without needing to write their own code.
Engineering Player Recruitment Models
Building reliable Player recruitment models is the cornerstone of modern football analytics. Raw event data provides the foundation, but the true value lies in how that data is structured, queried, and visualised for sporting directors.
Building Machine Learning Pipelines for Talent ID
A successful recruitment model relies on consistent and clean data inputs. Analysts frequently handle Statsbomb event data or manage Wyscout data integration. These providers deliver thousands of data points per match, tracking every pass, tackle, and shot.
Building Machine learning pipelines involves extracting this raw data and passing it through algorithms to find similar player profiles. First-team talent ID departments use these pipelines to scout replacements for departing players. If a club loses a high volume progressive passer, the machine learning pipeline queries the global database to flag players in secondary leagues who replicate that specific output.
Passing Networks and Expected Threat (xT) Analysis
Evaluating a player’s impact requires looking beyond basic statistics like goals and assists. Expected goals (xG) modelling is standard practice, but evaluating passing and spatial control offers a deeper understanding of game dominance. Opta passing networks and Pass cluster models help visualising how a team moves the ball through different thirds of the pitch.
Analyst’s Pro-Tip: Many junior analysts make the mistake of relying solely on basic xG models to evaluate midfielders. To truly measure a player’s influence, you must evaluate their Expected threat (xT) analysis. While xG measures the probability of a shot scoring, xT measures how much a specific pass increases the team’s probability of scoring in the next few actions. Utilising K-means clustering football algorithms allows you to group similar passing sequences and identify players who consistently move the ball into high value zones.
An independent analysis published by FootSci on Medium highlighted Jake Kolliari specifically for his work in this area. The publication cited his Expected Threat network visualisations from his tenure at Swansea City as an industry benchmark for evaluating progressive passing.
A Practical Guide to Jake Kolliari GitHub Repositories
For developers looking to enter the sports data industry, reviewing open-source code is the fastest way to learn. The Jake Kolliari GitHub repositories provide a structured look at how professional analytics pipelines are built and maintained.
Reverse-Engineering the Python Football Scripts
A public repository titled ‘Getting Started: Football Data Analytics with Python’ serves as a technical foundation. By reviewing these Python football scripts, we can reverse engineer the exact extraction, transformation, and loading process used in professional environments.
Here is the step by step workflow demonstrated in the codebase:
-
Data Ingestion: The scripts begin by importing raw JSON files from major data providers via API endpoints.
-
Data Normalisation: The code utilises pandas data manipulation frameworks to flatten the nested JSON structures into readable two dimensional dataframes.
-
Coordinate Mapping: Pitch coordinates differ between providers. The scripts standardise these X and Y coordinates to a unified 100×100 pitch map.
-
Metric Calculation: Custom functions apply mathematical formulas to the raw data, calculating metrics like pass completion percentage under pressure or Set-piece analysis modelling effectiveness.
-
Visualisation Export: The final cleaned datasets are formatted for export. These clean files are then connected to Tableau sports dashboards for end user interaction.
Navigating the Sports Technology Project Management Landscape
Building the model is only half the job. The true challenge in sports analytics is communication. The most advanced K-means clustering model is entirely useless if the manager does not understand it or trust it.
Translating Analytics for First-Team Decision Makers
Data professionals must bridge the communication gap between the analytics department and the coaching staff. This requires distilling complex mathematical concepts into actionable football terminology. First-team coaches operate on limited time. They need immediate, visual takeaways rather than lengthy statistical reports.
This emphasis on communication is a recognised industry priority. In January 2026, Jake Kolliari was featured as a Performance Analysis speaker representing Hudl at a Talk Sport careers event hosted by Loughborough University sports analytics department. Verified official directories from the event confirm his presentation focused heavily on bridging this gap. The ability to present complex data clearly is a defining trait of a successful Solutions Consultant.
Frequently Asked Questions (FAQs)
What does a football data scientist do?
A football data scientist processes large volumes of match and tracking data to build predictive models. They assist clubs with tactical analysis, injury prevention, and identifying undervalued players in the transfer market.
How do you start a career in sports analytics?
Starting a career requires strong proficiency in programming languages like Python or R. Aspiring analysts should build public portfolios on platforms like GitHub to showcase their ability to manipulate open-source football data.
What programming languages are best for analysing football data?
Python is the industry standard due to its powerful data manipulation libraries like pandas and machine learning frameworks like scikit-learn. R is also widely used for statistical modelling and visualisations.
How do Expected Goals (xG) and Expected Threat (xT) models work?
Expected Goals (xG) assigns a probability value to a shot resulting in a goal based on historical data. Expected Threat (xT) evaluates the danger of a pass or dribble by measuring how much it moves the ball into a zone with a higher probability of scoring.
What is the role of a Solutions Consultant in sports technology?
A Solutions Consultant acts as a technical advisor for sports organisations. They evaluate a club’s current software and data architecture and implement new technologies to improve their analytical workflows.
How is machine learning used in modern player recruitment?
Clubs use machine learning to cluster players with similar statistical profiles. This allows scouting departments to quickly filter thousands of global players and identify viable transfer targets that fit the manager’s specific tactical requirements.
Where can I find open-source football data to analyse?
Companies like Statsbomb provide free, open-source event data for specific competitions. Additionally, resources on GitHub offer starter datasets and scripts to help beginners learn the basics of data extraction.
How do engineering skills translate to sports analytics?
Engineers are trained to use mathematics to solve complex, real world problems. Skills such as statistical modelling, processing large datasets, and predictive simulation transfer seamlessly into evaluating football performance metrics.



