How to Work with TAR File in Python

how to work with tar file in Python

We’ve all been there: you’re knee-deep in a project, and see a tar file that needs to be opened. Without the key, it’s like discovering a treasure trove! Don’t worry; Python is a reliable companion prepared to hold the key to your riches! The helper will assist you in opening a tar file and exposing its hidden treasures. So, grab your coding gear, and let’s dive into the guide on how to work with tar files in Python.

In the realm of data management and software development, handling compressed files efficiently is crucial. TAR files, commonly used for archiving multiple files into a single file, are prevalent in Unix and Linux environments. Python’s built-in tarfile The module offers a robust solution for creating, reading, and extracting TAR archives.​

This guide explores how to work with TAR files in Python. It offers practical examples and tips to improve your workflow.

What is a TAR File?

TAR stands for Tape Archive Files, and this tar file is used to bundle a set of files into a single file. This is helpful when archiving older files or sending bulk files over the bank.

Python has a standard tarfile module that can be used to work with tar files. This module supports gzip, lzma, and bz2 compression.

Python’s tarfile Module

Python’s tarfile The module provides a straightforward interface to create, read, and extract TAR archives. It supports various compression methods, including gzip, bzip2, and lzma, making it versatile for different use cases.​Python documentation

Key Features:

  • Read and write support for uncompressed and compressed TAR files.
  • Support for various compression methods: gzip (.gz), bzip2 (.bz2), lzma (.xz).
  • Extraction filters to enhance security during extraction.
  • Command-line interface for quick operations.

How to Create a TAR File in Python

Creating a TAR file in Python involves opening a file in write mode and adding files or directories to it.

Example:

import tarfile

with tarfile.open('archive.tar', 'w') as tar:
    tar.add('file1.txt')
    tar.add('folder/')
Python

Here, archive.tar.gz is a gzip-compressed TAR file. You can replace 'w:gz' with 'w:bz2' or 'w:xz' for bzip2 or lzma compression, respectively

How to Read a TAR File in Python

For reading tar files in Python, we have to use tarfile.open that returns a tarfile.TarFile object. This function’s arguments are the filename and operation modes, like write and read modes.

Example:

import tarfile

with tarfile.open("firstSample.tar", "r") as tf:
  print("Opened Tar File")
Python

How to Extract TAR File Content in Python

Now that we know how to open a tar file using Python, let’s learn how to extract its content. Extraction of the tar file is done using tarfile.Tarfile.extractall the method. Path and members are arguments that are accepted by this tarfile.Tarfile.extracttall method.

Path: Path to a directory to which a tar file should be extracted.

Members: specify the files to be extracted; it should be a subset of tarfile.TarFile.getmembers() the output.

Example:

import tarfile

with tarfile.open("firstSample.tar", "r") as tf:
    print("Opened tarfile")
    tf.extractall(path="./extraction_directory")
    print("All files extracted")
Python

How to Extract a Single TAR File in Python

To extract specific files from a tar file, simply pass a reference to the file object or the file path as a string to the tarfile.TarFile.extract method.

If you want to see a list of all files inside a tar file, use this tarfile.TarFile.getmembers method. This will return a list of tarfile.TarInfo instances, each representing a file in the archive.

import tarfile

with tarfile.open("./firstSample.tar", "r") as tf:
    print("Opened tarfile")
    print(tf.getmembers())
    print("Members listed")
Python

Output:

Opened tarfile

[<TarInfo 'sample' at 0x7fe14b53a048>, <TarInfo 'sample/sample_txt1.txt' at 0x7fe14b53a110>, <TarInfo 'sample/sample_txt2.txt' at 0x7fe14b53a1d8>, <TarInfo 'sample/sample_txt3.txt' at 0x7fe14b53a2a0>, <TarInfo 'sample/sample_txt4.txt' at 0x7fe14b53a368>]

Single File Extraction

import tarfile

file_name = "firstSample/firstSample1.txt"
with tarfile.open("firstSample.tar", "r") as tf:
    print("Opened tarfile")
    tf.extract(member=file_name, path="./extraction_directory")
    print(f"{file_name} extracted")
Python

Ensuring Secure Extraction

Extracting TAR files from untrusted sources can pose security risks, such as path traversal attacks. Python 3.12 introduced extraction filters to mitigate these risks.​

import tarfile

with tarfile.open('archive.tar.gz', 'r:gz') as tar:
    tar.extractall(path='safe_extract', filter='data')
Python

The ‘data’ filter makes sure that only regular files are extracted. This stops symbolic links or special files from being created, which could harm system security

How to Write a TAR File in Python

If you want to merge a file into a tar file, then we will open the file in append mode and use tarfile.TarFile.add method takes the file path to be added as a parameter.

import tarfile

file_name = "firstSample1.txt"
with tarfile.open(f"./firstSample.tar", "a") as tf:
    print("Opened tarfile")
    print(f"Members before addition of {file_name}")
    print(tf.getmembers())
    tf.add(f"{file_name}", arcname="sample")
    print(f"Members after addition of {file_name}")
    print(tf.getmembers())
Python

Cleaning Up: Removing Files from TAR Archives

Python’s tarfile module doesn’t support removing files from an archive directly. To remove a file, you need to create a new archive excluding the unwanted file.

import tarfile

with tarfile.open('archive.tar', 'r') as tar:
    members = [m for m in tar.getmembers() if m.name != 'file_to_remove.txt']
    with tarfile.open('new_archive.tar', 'w') as new_tar:
        for member in members:
            file = tar.extractfile(member)
            new_tar.addfile(member, file)
Python

This script creates a new archive new_archive.tar without file_to_remove.txt.​

Related Post:

>> How to Setup SFTP Server on Ubuntu – 9 Steps

>> Encoding and Decoding Using Base64 Strings in Python

Conclusion: How to Work with TAR File in Python

Python’s tarfile Module is a powerful tool for managing TAR archives, offering a range of functionalities from basic creation and extraction to advanced operations like secure extraction and archive manipulation. By understanding and utilizing these features, you can efficiently handle compressed files in your Python projects.

Now that you know how to interact with tar files in Python, you are prepared! Tarfile modules in Python facilitate the process of reading, extracting, or even adding files to tar archives. Tarfile is a powerful tool that may be used to construct your tar archives, extract files, and access a file’s content. This article covers it.

So next time when you want to play with a TAR file, remember that Python has the perfect solution to unlock your treasure.

Happy Coding!

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.