We’ve all been there: you’re knee-deep in a project, and see a tar file that needs to be opened. Without the key, it’s like discovering a treasure trove! Don’t worry; Python is a reliable companion prepared to hold the key to your riches! The helper will assist you in opening a tar file and exposing its hidden treasures. So, grab your coding gear, and let’s dive into the guide on how to work with tar files in Python.
In the realm of data management and software development, handling compressed files efficiently is crucial. TAR files, commonly used for archiving multiple files into a single file, are prevalent in Unix and Linux environments. Python’s built-in tarfile
The module offers a robust solution for creating, reading, and extracting TAR archives.
This guide explores how to work with TAR files in Python. It offers practical examples and tips to improve your workflow.
Table of Contents
What is a TAR File?
TAR stands for Tape Archive Files, and this tar file is used to bundle a set of files into a single file. This is helpful when archiving older files or sending bulk files over the bank.
Python has a standard tarfile module that can be used to work with tar files. This module supports gzip, lzma, and bz2 compression.
Python’s tarfile
Module
Python’s tarfile
The module provides a straightforward interface to create, read, and extract TAR archives. It supports various compression methods, including gzip, bzip2, and lzma, making it versatile for different use cases.Python documentation
Key Features:
- Read and write support for uncompressed and compressed TAR files.
- Support for various compression methods: gzip (
.gz
), bzip2 (.bz2
), lzma (.xz
). - Extraction filters to enhance security during extraction.
- Command-line interface for quick operations.
How to Create a TAR File in Python
Creating a TAR file in Python involves opening a file in write mode and adding files or directories to it.
Example:
import tarfile
with tarfile.open('archive.tar', 'w') as tar:
tar.add('file1.txt')
tar.add('folder/')
PythonHere, archive.tar.gz
is a gzip-compressed TAR file. You can replace 'w:gz'
with 'w:bz2'
or 'w:xz'
for bzip2 or lzma compression, respectively
How to Read a TAR File in Python
For reading tar files in Python, we have to use tarfile.open
that returns a tarfile.TarFile object. This function’s arguments are the filename and operation modes, like write and read modes.
Example:
import tarfile
with tarfile.open("firstSample.tar", "r") as tf:
print("Opened Tar File")
PythonHow to Extract TAR File Content in Python
Now that we know how to open a tar file using Python, let’s learn how to extract its content. Extraction of the tar file is done using tarfile.Tarfile.extractall
the method. Path and members are arguments that are accepted by this tarfile.Tarfile.extracttall
method.
Path: Path to a directory to which a tar file should be extracted.
Members: specify the files to be extracted; it should be a subset of tarfile.TarFile.getmembers()
the output.
Example:
import tarfile
with tarfile.open("firstSample.tar", "r") as tf:
print("Opened tarfile")
tf.extractall(path="./extraction_directory")
print("All files extracted")
PythonHow to Extract a Single TAR File in Python
To extract specific files from a tar file, simply pass a reference to the file object or the file path as a string to the tarfile.TarFile.extract
method.
If you want to see a list of all files inside a tar file, use this tarfile.TarFile.getmembers
method. This will return a list of tarfile.TarInfo
instances, each representing a file in the archive.
import tarfile
with tarfile.open("./firstSample.tar", "r") as tf:
print("Opened tarfile")
print(tf.getmembers())
print("Members listed")
PythonOutput:
Opened tarfile
[<TarInfo 'sample' at 0x7fe14b53a048>, <TarInfo 'sample/sample_txt1.txt' at 0x7fe14b53a110>, <TarInfo 'sample/sample_txt2.txt' at 0x7fe14b53a1d8>, <TarInfo 'sample/sample_txt3.txt' at 0x7fe14b53a2a0>, <TarInfo 'sample/sample_txt4.txt' at 0x7fe14b53a368>]
Single File Extraction
import tarfile
file_name = "firstSample/firstSample1.txt"
with tarfile.open("firstSample.tar", "r") as tf:
print("Opened tarfile")
tf.extract(member=file_name, path="./extraction_directory")
print(f"{file_name} extracted")
PythonEnsuring Secure Extraction
Extracting TAR files from untrusted sources can pose security risks, such as path traversal attacks. Python 3.12 introduced extraction filters to mitigate these risks.
import tarfile
with tarfile.open('archive.tar.gz', 'r:gz') as tar:
tar.extractall(path='safe_extract', filter='data')
PythonThe ‘data’ filter makes sure that only regular files are extracted. This stops symbolic links or special files from being created, which could harm system security
How to Write a TAR File in Python
If you want to merge a file into a tar file, then we will open the file in append mode and use tarfile.TarFile.add method takes the file path to be added as a parameter.
import tarfile
file_name = "firstSample1.txt"
with tarfile.open(f"./firstSample.tar", "a") as tf:
print("Opened tarfile")
print(f"Members before addition of {file_name}")
print(tf.getmembers())
tf.add(f"{file_name}", arcname="sample")
print(f"Members after addition of {file_name}")
print(tf.getmembers())
PythonCleaning Up: Removing Files from TAR Archives
Python’s tarfile
module doesn’t support removing files from an archive directly. To remove a file, you need to create a new archive excluding the unwanted file.
import tarfile
with tarfile.open('archive.tar', 'r') as tar:
members = [m for m in tar.getmembers() if m.name != 'file_to_remove.txt']
with tarfile.open('new_archive.tar', 'w') as new_tar:
for member in members:
file = tar.extractfile(member)
new_tar.addfile(member, file)
PythonThis script creates a new archive new_archive.tar
without file_to_remove.txt
.
Related Post:
>> How to Setup SFTP Server on Ubuntu – 9 Steps
>> Encoding and Decoding Using Base64 Strings in Python
Conclusion: How to Work with TAR File in Python
Python’s tarfile Module is a powerful tool for managing TAR archives, offering a range of functionalities from basic creation and extraction to advanced operations like secure extraction and archive manipulation. By understanding and utilizing these features, you can efficiently handle compressed files in your Python projects.
Now that you know how to interact with tar files in Python, you are prepared! Tarfile modules in Python facilitate the process of reading, extracting, or even adding files to tar archives. Tarfile is a powerful tool that may be used to construct your tar archives, extract files, and access a file’s content. This article covers it.
So next time when you want to play with a TAR file, remember that Python has the perfect solution to unlock your treasure.
Happy Coding!