Filling Missing Values in Pandas using fillna and interpolate

Filling Missing Values in Pandas using fillna and interpolate

Missing data is a common problem in real-world datasets, and handling this problem effectively is a crucial step in any data analysis or machine-learning pipeline. In this guide, we’ll take a deep dive into filling missing values in Pandas using fillna and interpolate, two powerful methods the Pandas library provides to handle missing data gracefully.

Whether you’re cleaning survey data, preparing time series for forecasting, or just ensuring your dataset is model-ready, this guide will show you how to intelligently fill missing values in Pandas and why each method matters.

Why Missing Values Occur

Before we explore how to use fillna() and interpolate(), it’s important to understand why missing values in datasets:

  • Data Corruption during collection or transmission
  • Incomplete surveys or forms
  • System errors during data logging
  • Manual entry errors
  • Data filtering or merging with incompatible datasets

Pandas represent missing values using NaN (Not a Number). These NaN values can interfere with computations, visualizations, and machine-learning models. That’s why filling them appropriately is essential.

Understanding Missing Values in Pandas

Before filling in missing values we have to understand missing values so let’s create a sample DataFrame with missing values:

import pandas as pd
import numpy as np

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, np.nan, 30, np.nan, 22],
    'Salary': [50000, 54000, np.nan, 58000, np.nan]
}

df = pd.DataFrame(data)
print(df)
Python

#Output of Above Code

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob   NaN  54000.0
2  Charlie  30.0      NaN
3    David   NaN  58000.0
4      Eve  22.0      NaN
Filling Missing Values in Pandas using fillna and interpolate

Checking for Missing Values

Pandas provide various functions to check null or NaN values in DataFrame:

# Check if DataFrame has any missing values
print(df.isna().any())

# Count total missing values across columns
print(df.isna().sum().sum())

# Count missing values per column
print(df.isna().sum())
Python

# Output

# Check Missing Values
Name      False
Age        True
Salary     True
dtype: bool

# Count Total Missing Values
4

# Count MIssing Values per column
Name      0
Age       2
Salary    2
dtype: int64
Python

Filling Missing Values in Pandas using fillna and interpolate

Filling Missing Values with fillna()

The fillna() function is one of the most straightforward and widely used methods in Pandas for replacing missing values.

Syntax is:

DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None)
Python

Parameters:

  • value: The value utilized to replace any missing entries. This can be a scalar, dictionary, Series, or DataFrame. The default setting is None.
  • method: The interpolation technique applied for numeric data. The default option is None.
  • axis: The axis along which to perform the filling operation. Use 0 for columns and 1 for rows. The default is None.
  • inplace: Indicates whether to alter the DataFrame directly or return a new copy. The default is set to False.
  • limit: For both forward and backward filling, this specifies the maximum number of consecutive periods to fill.

Filling with a Specific Value

df['Age'] = df['Age'].fillna(0)
print(df)
Python

It will replace all NaN values in the Age column with 0. You can use any other default value based on domain knowledge.

#Ouput

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob   0.0  54000.0
2  Charlie  30.0      NaN
3    David   0.0  58000.0
4      Eve  22.0      NaN
Filling Missing Values in Pandas using fillna and interpolate

Forward Fill (Propagation of Last Valid Observation)

df['Age'] = df['Age'].ffill()
print(df)
Python

Fills missing values with the last known non-null values. Ideals for time-series data.

# Output

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob  25.0  54000.0
2  Charlie  30.0      NaN
3    David  30.0  58000.0
4      Eve  22.0      NaN
Python

Backward Fill

df['Age'] = df['Age'].bfill()
print(df)
Python

# Output

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob  30.0  54000.0
2  Charlie  30.0      NaN
3    David  22.0  58000.0
4      Eve  22.0      NaN
Python

Column-wise Mean, Median, or Mode Imputation

df['Salary'] = df['Salary'].fillna(df['Salary'].mean())

print(df)
Python

You can also use .median() or .mode()[0] to replace missing values based on statistical calculations.

# Output

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob   NaN  54000.0
2  Charlie  30.0  54000.0
3    David   NaN  58000.0
4      Eve  22.0  54000.0
Python

Filling Missing Values in Pandas using fillna and interpolate

Interpolating Missing Values with interpolate()

Unlike fillna(), which replaces NaN values with constant or derived values, interpolate() performs estimation based on existing data patterns, which is especially useful for numerical or time-series data.

Syntax:

DataFrame.interpolate(method='linear', axis=0, limit=None, inplace=False, limit_direction='forward', limit_area=None)
Python

The parameters are the same as fillna(), with the interpolation method being the key addition.

Linear Interpolation (default method)

df['Age'] = df['Age'].interpolate()

print(df)
Python

Estimates missing values using linear interpolation. It assumes data is evenly spaced and suitable for numeric sequences.

# Output

      Name   Age   Salary
0    Alice  25.0  50000.0
1      Bob  27.5  54000.0
2  Charlie  30.0      NaN
3    David  26.0  58000.0
4      Eve  22.0      NaN
Python

Time-based Interpolation

Time-based interpolation is useful when your index is a DateTime object.

df['Date'] = pd.date_range('2023-01-01', periods=5, freq='D')
df.set_index('Date', inplace=True)
df['Salary'] = df['Salary'].interpolate(method='time')

print(df)
Python

This approach considers the time gaps between values and interpolates accordingly.

# Output

               Name   Age   Salary
Date                              
2023-01-01    Alice  25.0  50000.0
2023-01-02      Bob   NaN  54000.0
2023-01-03  Charlie  30.0  56000.0
2023-01-04    David   NaN  58000.0
2023-01-05      Eve  22.0  58000.0
Python

Polynomial Interpolation

df['Salary'] = df['Salary'].interpolate(method='polynomial', order=2)

print(df)
Python

Suitable for data that follows a nonlinear pattern.

# Output

      Name   Age        Salary
0    Alice  25.0  50000.000000
1      Bob   NaN  54000.000000
2  Charlie  30.0  56666.666667
3    David   NaN  58000.000000
4      Eve  22.0           NaN
Python

Spine Interpolation

Another smooth interpolation method is ideal for continuous curves.

df['Salary'] = df['Salary'].interpolate(method='spline', order=2)

print(df)
Python

# Output

      Name   Age        Salary
0    Alice  25.0  50000.000000
1      Bob   NaN  54000.000000
2  Charlie  30.0  56666.666667
3    David   NaN  58000.000000
4      Eve  22.0  58000.000000
Python

Real-life Examples of Handling Missing Values in Pandas

Let’s now put theory into practice and explore some real-life examples of Filling Missing Values in Pandas using fillna and interpolate. These examples reflect common scenarios encountered by data analysts and data scientists when working with messy, real-world data.

Cleaning a Messy Dataset with Mixed Missing Values

In many practical cases, missing values aren’t always represented as NaN or None. Sometimes they appear as strings like 'NaN', 'NULL', or even blanks (''). Before you can fill these values, you need to standardize them.

import pandas as pd
import numpy as np

data = {
    'A': [1, np.nan, 'NaN', 4],
    'B': [5, np.nan, 'NaN', 8],
    'C': ['a', 'b', None, 'd']
}
df = pd.DataFrame(data)
print(df)
Python

# Intial Output:

     A    B     C
0    1  5.0     a
1  NaN  NaN     b
2  NaN  NaN  None
3    4  8.0     d
Python

Here, we have a mix of real NaN values, string-based "NaN" entries, and None values. Let’s clean it:

# Convert "NaN" strings to actual np.nan values
df = df.replace('NaN', np.nan)

# Fill numeric columns using forward fill
# Fill string column 'C' with a placeholder for missing
df = df.fillna(method='ffill', numeric_only=False)

# Optionally, fill any remaining string NaNs with 'Missing'
df['C'] = df['C'].fillna('Missing')
print(df)
Python

# Cleaned Output

     A    B       C
0    1  5.0       a
1    1  5.0       b
2    1  5.0  Missing
3    4  8.0       d
Python

Always standardize your missing values first, then use fillna() to fill based on the context.

Filling Time-Series Gaps Using Time-based Interpolation

When working with time-series data, gaps are common due to missing entries or irregular time intervals. The best way to handle this is by using interpolate(method='time').

dates = pd.date_range('2022-01-01', periods=10, freq='W')
values = [1.5, np.nan, 2.1, np.nan, 6.3, np.nan, 4.6, 5.1, np.nan, 8.9]
ser = pd.Series(values, index=dates)

print(ser)
Python

#  Initial Output

2022-01-02    1.5
2022-01-09    NaN
2022-01-16    2.1
2022-01-23    NaN
2022-01-30    6.3
2022-02-06    NaN
2022-02-13    4.6
2022-02-20    5.1
2022-02-27    NaN
2022-03-06    8.9
Freq: W-SUN, dtype: float64
Filling Missing Values in Pandas using fillna and interpolate

Let’s interpolate missing values based on time:

ser = ser.interpolate(method='time')
print(ser)
Python

# Interpolated Output

2022-01-02    1.50
2022-01-09    1.80
2022-01-16    2.10
2022-01-23    4.20
2022-01-30    6.30
2022-02-06    5.45
2022-02-13    4.60
2022-02-20    5.10
2022-02-27    7.00
2022-03-06    8.90
Freq: W-SUN, dtype: float64
Python

Time-aware interpolation captures temporal trends more accurately than linear or static value filling.

Filling Missing Weather Data Using Domain Knowledge

Domain-specific logic often improves the quality of imputation. Let’s look at a weather dataset with missing temperature and weather event data.

weather = {
    'Day': ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'],
    'Temperature': [28.5, np.nan, 27.2, 26.4, np.nan, 25.1, 29.7],
    'Event': ['Sunny', 'Rain', 'Rain', 'Clouds', 'Rain', 'Sunny', 'Sunny']
}

df = pd.DataFrame(weather)
print(df)
Python

# Initial Output

    Day  Temperature   Event
0   Mon        28.5   Sunny
1   Tue         NaN    Rain
2   Wed        27.2    Rain
3   Thu        26.4  Clouds
4   Fri         NaN    Rain
5   Sat        25.1   Sunny
6   Sun        29.7   Sunny
Filling missing values in pandas using interpolate

Now apply intelligent filling strategies:

# Fill temperature gaps using nearest interpolation
df['Temperature'] = df['Temperature'].interpolate(method='nearest')

# Fill missing event with most likely common event (e.g., 'Clouds' in rainy seasons)
df['Event'] = df['Event'].fillna('Clouds')

print(df)
Python

# Cleaned Output

    Day  Temperature   Event
0   Mon        28.5   Sunny
1   Tue        27.2    Rain
2   Wed        27.2    Rain
3   Thu        26.4  Clouds
4   Fri        26.4    Rain
5   Sat        25.1   Sunny
6   Sun        29.7   Sunny
Python

Use real-world context when choosing values to fill missing data, especially for categorical variables.

These real-world examples demonstrate the practical power of Pandas’ fillna() and interpolate() functions. Whether you’re dealing with messy survey results, incomplete logs, or patchy time series, these tools help ensure your dataset is clean, consistent, and analysis-ready.

Related Post:

>> Implementing Breadth-First Search to Traverse a Binary Tree in Python

>> How to Repeat and Tile Array using NumPy in Python

>> Histogramming and Binning Data with NumPy in Python

>> A Comprehensive Guide to Filter Function in Python

Difference Between fillna() and interpolate()

Featuresfillna()interpolate()
Method of ImputationStatic (constant value or strategy)Dynamic (based on data pattern)
Best ForCategorical data, consistent replacementsNumeric or time-series data
CustomizabilityHigh (custom values, ffill, bfill, etc.)Moderate (requires numeric context)
AccuracyLess accurate but more predictableMore accurate if data follows a trend
Table 1: Filling Missing Values in Pandas using fillna and interpolate

Conclusion: Filling Missing Values in Pandas using fillna and interpolate

Handling missing data is a fundamental step in data preprocessing, and choosing the right method can significantly influence your analysis or machine learning results. In this blog, we explored how to Filling Missing Values in Pandas using fillna and interpolate, both essential tools for every data analyst and data scientist.

  • Use fillna() for simpler, rule-based imputation.
  • Use interpolate() for data that has patterns or trends, especially time-series data.

Now that you know the difference and how to apply both, start experimenting with your datasets. Proper imputation = better insights and better models.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.