TUTORIAL

How to Get Historical Football Data via API (2026 Guide)

Fetch years of historical football match data via API for analytics and prediction models. TheStatsAPI covers 20+ years of data. Python tutorial included.

Last updated: March 30, 20268 min read

Historical football data is the foundation of prediction models, academic research, betting algorithms, and long-form sports journalism. If you want to know how a team performs away in December, whether a striker's output declines after 30, or how league competitiveness has shifted over the last two decades - you need years of structured match data, not just last weekend's results.

This guide shows you how to fetch historical football data via API, bulk-download multiple seasons, handle rate limits properly, and load everything into Pandas for analysis. We will use TheStatsAPI, which covers 20+ years of historical match data across 1,196 competitions and 84,000+ players.

Who Needs Historical Football Data

Historical data is not just for data scientists. Here are the most common use cases:

Prediction model builders - training machine learning models on match outcomes, goal totals, and player performance requires thousands of historical matches as training data.
Fantasy football platforms - projecting player value requires understanding historical performance trends, injury patterns, and seasonal form.
Academic researchers - sports economics, network analysis of passing, and competitive balance studies all depend on longitudinal data.
Sports journalists - "first team to win here since 2014" requires a reliable data source, not manual Googling.
Betting analysts - backtesting strategies against historical odds and outcomes is the basis of any quantitative betting approach.

What Historical Data Is Available

TheStatsAPI provides historical data going back 20+ years for major competitions. This includes:

Match results - home team, away team, scores, date, venue, and match status for every fixture
Player statistics by season - goals, assists, appearances, minutes, cards, and more, broken down by year and competition
Competition records - standings, seasons, and participating teams for each year
Team histories - which teams participated in which competitions in which seasons

Coverage is deepest for the top European leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1), the Champions League, and major South American competitions. Smaller leagues may have shorter historical windows, but the API currently spans 1,196 competitions overall.

Fetching Matches by Season

The core of any historical data pull is fetching matches for a specific competition and season. Here is how to do it in Python:

import requests

API_KEY = "your_api_key_here"
BASE_URL = "https://api.thestatsapi.com/api"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Accept": "application/json"
}

def fetch_season_matches(competition_id, season):
    """Fetch all matches for a competition in a given season."""
    all_matches = []
    page = 1

    while True:
        response = requests.get(
            f"{BASE_URL}/football/matches",
            headers=headers,
            params={
                "competition_id": competition_id,
                "season": season,
                "page": page
            }
        )

        if response.status_code != 200:
            print(f"Error {response.status_code} on page {page}")
            break

        data = response.json()
        matches = data.get("data", [])
        all_matches.extend(matches)

        last_page = data["meta"]["last_page"]
        if page >= last_page:
            break

        page += 1

    return all_matches


# Fetch 2023-24 Premier League matches
matches = fetch_season_matches(competition_id=1, season=2023)
print(f"Fetched {len(matches)} matches")

for match in matches[:5]:
    date = match["date"][:10]
    home = match["home_team"]["name"]
    away = match["away_team"]["name"]
    print(f"[{date}] {home} {match['home_score']}-{match['away_score']} {away}")

The season parameter accepts the starting year of the season. So season=2023 returns the 2023-24 season.

Bulk Fetching Multiple Seasons and Saving to CSV

For serious analysis, you need multiple seasons. Here is a script that fetches several years of data and saves it to a CSV file:

import requests
import time
import csv

API_KEY = "your_api_key_here"
BASE_URL = "https://api.thestatsapi.com/api"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Accept": "application/json"
}

def fetch_season_matches(competition_id, season):
    """Fetch all matches for a competition in a given season."""
    all_matches = []
    page = 1

    while True:
        response = requests.get(
            f"{BASE_URL}/football/matches",
            headers=headers,
            params={
                "competition_id": competition_id,
                "season": season,
                "page": page
            }
        )

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 60))
            print(f"Rate limited. Waiting {retry_after} seconds...")
            time.sleep(retry_after)
            continue

        if response.status_code != 200:
            print(f"Error {response.status_code} fetching season {season}, page {page}")
            break

        data = response.json()
        all_matches.extend(data.get("data", []))

        if page >= data["meta"]["last_page"]:
            break

        page += 1
        time.sleep(2)  # Respect rate limits

    return all_matches


def save_to_csv(matches, filename):
    """Save match data to a CSV file."""
    if not matches:
        print("No matches to save.")
        return

    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow([
            "match_id", "date", "season", "competition",
            "home_team", "away_team", "home_score", "away_score", "status"
        ])

        for match in matches:
            writer.writerow([
                match["id"],
                match["date"],
                match.get("season", ""),
                match["competition"]["name"],
                match["home_team"]["name"],
                match["away_team"]["name"],
                match.get("home_score", ""),
                match.get("away_score", ""),
                match.get("status", "")
            ])

    print(f"Saved {len(matches)} matches to {filename}")


# Fetch 10 seasons of Premier League data
competition_id = 1
seasons = range(2014, 2024)  # 2014-15 through 2023-24
all_matches = []

for season in seasons:
    print(f"Fetching {season}-{season+1} season...")
    matches = fetch_season_matches(competition_id, season)
    all_matches.extend(matches)
    print(f"  Got {len(matches)} matches (total: {len(all_matches)})")
    time.sleep(5)  # Pause between seasons

save_to_csv(all_matches, "premier_league_2014_2024.csv")

This script fetches 10 seasons of Premier League data and saves it as a clean CSV. For a typical top-flight league with ~380 matches per season, that is about 3,800 matches across roughly 40-80 API requests (depending on page size).

Rate Limit Strategy

When bulk-fetching historical data, you will hit rate limits if you are not careful. Here is how to handle them properly.

Understand your plan limits

Plan	Requests/month	Rate limit
Starter ($50/mo)	100,000	30/min
Growth ($129/mo)	500,000	60/min
Scale ($379/mo)	5,000,000	300/min

Exponential backoff

Instead of a fixed delay after a 429 error, use exponential backoff:

import time

def fetch_with_backoff(url, headers, params, max_retries=5):
    """Fetch with exponential backoff on rate limit errors."""
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers, params=params)

        if response.status_code == 200:
            return response

        if response.status_code == 429:
            wait_time = min(2 ** attempt * 10, 120)  # 10s, 20s, 40s, 80s, 120s
            print(f"Rate limited. Retry {attempt + 1}/{max_retries} in {wait_time}s")
            time.sleep(wait_time)
        else:
            print(f"Unexpected error: {response.status_code}")
            return response

    raise Exception("Max retries exceeded")

Plan selection for bulk downloads

If you are doing a one-time historical data pull across multiple competitions and decades, the Scale plan ($379/month) is the most practical choice. At 300 requests per minute and 5 million requests per month, you can download the entire historical dataset for all 1,196 competitions in a few days. Once downloaded, cancel or downgrade - your local data does not expire.

For smaller pulls (one league, a few seasons), the Starter plan at $50/month is more than sufficient. Ten seasons of a single league requires roughly 50-100 requests, well within the 100,000 monthly limit.

Loading into Pandas

Once you have your CSV, analysis is straightforward with Pandas:

import pandas as pd

df = pd.read_csv("premier_league_2014_2024.csv")

print(f"Total matches: {len(df)}")
print(f"Seasons: {df['season'].nunique()}")
print(f"Teams: {df['home_team'].nunique()}")

# Average goals per match by season
df["total_goals"] = df["home_score"] + df["away_score"]
goals_by_season = df.groupby("season")["total_goals"].mean()
print("\nAverage goals per match by season:")
print(goals_by_season.round(2))

# Home win percentage
df["home_win"] = df["home_score"] > df["away_score"]
home_win_pct = df.groupby("season")["home_win"].mean()
print("\nHome win percentage by season:")
print((home_win_pct * 100).round(1))

# Top scoring teams (total home + away goals)
home_goals = df.groupby("home_team")["home_score"].sum()
away_goals = df.groupby("away_team")["away_score"].sum()
total_goals = (home_goals.add(away_goals, fill_value=0)
               .sort_values(ascending=False))
print("\nTop 10 scoring teams (all seasons):")
print(total_goals.head(10))

This gives you immediate insights: are average goals per match trending up? Is home advantage declining? Which teams have scored the most across a decade? These are the starting points for deeper analysis and model building.

Frequently Asked Questions

How far back does the historical data go?

TheStatsAPI provides over 20 years of historical data for major competitions. The exact start year varies by league - top European leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1) have the deepest history, while smaller or newer competitions may have shorter windows. The API covers 1,196 competitions in total.

Can I download all historical data at once?

There is no single "download everything" endpoint. You fetch data by competition and season, paginating through results. This is by design - it lets you pull exactly what you need without downloading terabytes of data you will never use. The bulk-fetch script in this guide automates the process for multiple seasons.

Will I hit rate limits during a bulk download?

Yes, if you do not add delays between requests. The Starter plan allows 30 requests per minute. Adding a 2-second delay between paginated requests keeps you safely under this limit. For very large downloads (many competitions, many seasons), the Scale plan at 300 requests per minute makes the process significantly faster.

Is the historical data updated retroactively?

Match data is finalized within 1-2 hours of full time and generally does not change after that. In rare cases - such as an administrative decision reversing a result - historical records may be updated. For the vast majority of analysis use cases, you can treat downloaded historical data as stable and immutable.

TheStatsAPI offers a 7-day free trial on all plans, giving full access to all 1,196 competitions and 84,000+ players - making it the best way to evaluate a premium football API before committing. Start your trial at thestatsapi.com and begin building your historical football dataset today.

Start building today

Ready to Power Your Sports App?

Start your 7-day free trial. All endpoints included on every plan.

Start Your Free Trial View Pricing

Cancel anytime

7-day free trial

Setup in 5 minutes