Project Methodology

About GridironIQ

GridironIQ is an NFL analytics dashboard built around an automated Python data pipeline, MySQL storage layer, PHP API endpoints, and interactive Plotly.js visualisations.

Dataset

The project uses NFL data accessed through nflreadpy, the newer nflverse Python package used to retrieve structured NFL datasets.

GridironIQ currently covers the 2015–2024 NFL seasons, providing a decade-long dataset for team, player, comparison, and game-level analysis. The final hosted database contains 219,105 loaded dashboard rows after cleaning, enrichment, validation, and loading.

  • Schedules and game results
  • Team game statistics
  • Derived team season summaries
  • Player weekly statistics
  • Derived player season summaries
  • Game-level performance metrics
  • Validation results stored as data quality evidence

Technology Stack

Pipeline Python, Pandas, Polars
Source nflreadpy
Pipeline Runtime Krystal Python App Support
Database MySQL
Backend PHP APIs
Web Server LiteSpeed
Frontend HTML, CSS, Bootstrap
Charts Plotly.js

Processing Pipeline

The dataset is processed through a repeatable ETL-style workflow before being loaded into MySQL.

1

Ingest

Pull schedule, team statistics, and player statistics data from nflreadpy into Pandas DataFrames.

2

Clean

Standardise column names, data types, season types, missing values, and historical team abbreviations.

3

Enrich

Join schedules with team statistics and derive dashboard-ready metrics such as point differential, win percentage, yardage totals, player role groups, and season summaries.

4

Validate

Run automated checks for required columns, missing keys, duplicate records, invalid results, impossible percentage values, and unexpected negative values.

5

Load

Load validated outputs into MySQL in memory-safe chunks for the PHP API layer and dashboard to consume.

The full ingestion, cleaning, enrichment, validation, and loading process is repeatable from Python. Due to shared-hosting memory limits, large database inserts are loaded in chunks, particularly for the player weekly statistics table.

Data Quality Decisions

A key cleaning issue was the use of different team abbreviations across historical NFL data. Franchise relocations and rebrands meant that older records could use identifiers such as OAK, SD, or STL, while modern datasets use LV, LAC, and LA. These abbreviations were standardised to support consistent franchise-level comparison across seasons.

Validation was also made domain-aware. Some american football metrics can be negative, such as point differential, rushing yards, receiving yards, EPA, and fantasy points. These values are preserved rather than incorrectly cleaned away.

  • Required identifiers checked before loading.
  • Duplicate game, team-game, and player keys checked.
  • Numeric fields converted and validated.
  • Result values restricted to win, loss, or tie.
  • Percentage fields checked for valid ranges.
  • Historical team abbreviations standardised for long-term comparison.
  • Validation outputs stored as evidence in data_quality_checks.

Visualisation Design

The dashboard is designed around exploratory analysis rather than static reporting. The Overview page provides league-wide data, the Teams page provides drill-down detail, the Players page supports player-level analysis, the Compare page supports same-season and cross-season team comparisons, and the Games page allows game-level filtering.

  • Overview: league-wide rankings, KPIs, filters, and offence/defence comparison.
  • Teams: details-on-demand for individual teams across regular season and postseason.
  • Players: leaderboards, radar profiles, weekly trends, and heatmap exploration.
  • Compare: side-by-side and cross-season team analysis.
  • Games: game-level exploration by week, team, scoreline, and venue.

The dashboard adapts some visualisations based on the selected data. Bye weeks are shown as gaps in weekly line charts rather than connected values, avoiding the implication that a team played during a non-game week.

Postseason selections often contain very small samples, so selected charts switch to compact comparison views where a trend chart would be misleading.