Python Binary Files
Introduction
While text files store information in human-readable form, binary files store data in the same format that the computer uses internally. Binary files can contain any type of data - images, audio, video, application data, or serialized objects.
Working with binary files is essential when:
- Dealing with non-text data (images, audio, etc.)
- Implementing efficient data storage
- Interacting with external applications or systems
- Processing data that contains bytes that aren't valid in text encodings
In this tutorial, you'll learn how to read and write binary files in Python, understand binary data concepts, and see practical applications of binary file handling.
Understanding Binary Files
Binary files store data as a sequence of bytes rather than text. Unlike text files, where each byte or sequence of bytes represents a character, binary files can contain any value from 0 to 255 in each byte.
Key differences between text and binary files:
- Text files are encoded (e.g., UTF-8, ASCII) and human-readable
- Binary files store raw binary data and are not directly human-readable
- Text files can have platform-dependent line endings; binary files don't have this issue
- Binary files typically require special programs to view or interpret their content
Opening Binary Files in Python
To work with binary files in Python, you use the same open()
function but with a mode that includes 'b'
:
Mode | Description |
---|---|
'rb' | Read binary |
'wb' | Write binary |
'ab' | Append binary |
'rb+' | Read and write binary |
Let's look at a basic example:
# Writing binary data to a file
with open('example.bin', 'wb') as file:
file.write(b'Hello, binary world!') # Note the 'b' prefix
# Reading binary data from a file
with open('example.bin', 'rb') as file:
data = file.read()
print(data) # Output: b'Hello, binary world!'
print(type(data)) # Output: <class 'bytes'>
Notice the b
prefix before the string in b'Hello, binary world!'
. This creates a bytes literal instead of a string. When working with binary files, you'll be dealing with bytes and byte arrays, not strings.
Binary Data Types in Python
Python provides several types for working with binary data:
1. bytes
An immutable sequence of bytes:
# Creating bytes objects
byte_data = bytes([65, 66, 67, 68, 69]) # ASCII values for A, B, C, D, E
print(byte_data) # Output: b'ABCDE'
# Another way to create bytes
byte_data2 = b'ABCDE'
print(byte_data2[0]) # Output: 65 (ASCII value of 'A')
2. bytearray
A mutable sequence of bytes:
# Creating a bytearray
byte_array = bytearray([65, 66, 67, 68, 69])
print(byte_array) # Output: bytearray(b'ABCDE')
# Modifying a bytearray
byte_array[0] = 90 # ASCII value for 'Z'
print(byte_array) # Output: bytearray(b'ZBCDE')
Reading and Writing Binary Files
Writing Binary Data
Here's how to write different types of binary data:
with open('binary_data.bin', 'wb') as file:
# Write a sequence of bytes
file.write(b'Hello')
# Write integers as bytes (need to convert to bytes first)
value = 42
file.write(value.to_bytes(2, byteorder='big')) # 2-byte integer, big-endian
# Write a bytearray
file.write(bytearray([10, 20, 30, 40, 50]))
Reading Binary Data
with open('binary_data.bin', 'rb') as file:
# Read first 5 bytes (Hello)
header = file.read(5)
print(header) # Output: b'Hello'
# Read next 2 bytes and convert to integer
value_bytes = file.read(2)
value = int.from_bytes(value_bytes, byteorder='big')
print(value) # Output: 42
# Read the rest
remaining = file.read()
print(list(remaining)) # Output: [10, 20, 30, 40, 50]
Binary File Navigation
When working with binary files, you might need to navigate through the file:
with open('example.bin', 'rb') as file:
# Get current position
print(file.tell()) # Output: 0
# Move to a specific position (bytes from the start)
file.seek(5)
print(file.tell()) # Output: 5
# Read 3 bytes from current position
data = file.read(3)
# Seek relative to current position (2 bytes forward)
file.seek(2, 1) # 1 = current position
# Seek relative to end of file (5 bytes before end)
file.seek(-5, 2) # 2 = end of file
Practical Examples
Example 1: Copying an Image File
This example demonstrates how to copy an image file in binary mode:
def copy_image(source_file, destination_file):
try:
with open(source_file, 'rb') as source:
with open(destination_file, 'wb') as destination:
# Read in chunks of 1MB
chunk_size = 1024 * 1024
while True:
chunk = source.read(chunk_size)
if not chunk:
break
destination.write(chunk)
print(f"Successfully copied {source_file} to {destination_file}")
except Exception as e:
print(f"Error copying file: {e}")
# Usage
copy_image('original.jpg', 'copy.jpg')
Example 2: Creating a Simple Binary File Format
Let's create a simple binary file format to store a list of people with their names and ages:
import struct
def write_people_data(filename, people):
"""
Write people data to a binary file.
Each person is stored as:
- 1 byte: length of name
- n bytes: name
- 1 byte: age
"""
with open(filename, 'wb') as file:
# Write number of people (4 bytes)
file.write(len(people).to_bytes(4, byteorder='big'))
for name, age in people:
# Write name length and name
name_bytes = name.encode('utf-8') # Convert name to bytes
file.write(len(name_bytes).to_bytes(1, byteorder='big'))
file.write(name_bytes)
# Write age
file.write(bytes([age]))
def read_people_data(filename):
"""Read people data from our custom binary format."""
people = []
with open(filename, 'rb') as file:
# Read number of people
count_bytes = file.read(4)
count = int.from_bytes(count_bytes, byteorder='big')
for _ in range(count):
# Read name length
name_length = int.from_bytes(file.read(1), byteorder='big')
# Read name
name_bytes = file.read(name_length)
name = name_bytes.decode('utf-8')
# Read age
age = ord(file.read(1))
people.append((name, age))
return people
# Usage
people_data = [
("Alice", 28),
("Bob", 32),
("Charlie", 22)
]
write_people_data("people.bin", people_data)
loaded_data = read_people_data("people.bin")
print("People loaded from binary file:")
for name, age in loaded_data:
print(f"{name}, {age} years old")
Output:
People loaded from binary file:
Alice, 28 years old
Bob, 32 years old
Charlie, 22 years old
Example 3: Using struct Module for Binary Data
The struct
module is very useful for working with structured binary data:
import struct
# Pack data: a 4-byte integer, a float, and a string
format_string = ">I f 10s" # > means big-endian, I=unsigned int, f=float, 10s=10-char string
packed_data = struct.pack(format_string,
123456, # unsigned int
3.14159, # float
b"Python\0\0\0\0") # 10-byte string, padded with nulls
print(packed_data)
print(f"Size of packed data: {len(packed_data)} bytes")
# Write to file
with open("struct_data.bin", "wb") as f:
f.write(packed_data)
# Read from file and unpack
with open("struct_data.bin", "rb") as f:
binary_data = f.read()
unpacked = struct.unpack(format_string, binary_data)
print(unpacked)
# Convert the byte string to a regular string (strip null bytes)
text = unpacked[2].decode('ascii').rstrip('\0')
print(f"Integer: {unpacked[0]}, Float: {unpacked[1]}, String: '{text}'")
Output:
b'\x00\x01\xe2@A\t!\xfbPython\x00\x00\x00\x00'
Size of packed data: 18 bytes
(123456, 3.14159, b'Python\x00\x00\x00\x00')
Integer: 123456, Float: 3.14159, String: 'Python'
When to Use Binary Files
Binary files are particularly useful in these situations:
- Storing non-textual data: Images, audio, video, etc.
- Optimizing space: Binary formats often require less storage than their text counterparts
- Performance critical applications: Reading/writing binary data is typically faster
- Preserving exact data formats: When you need to maintain bit-level precision
- Interfacing with external systems/applications: Many file formats are binary
Common Challenges and Best Practices
Challenges:
- Platform Compatibility: Different systems may interpret binary data differently (e.g., endianness)
- Data Corruption: Binary files are sensitive to corruption; a single wrong byte can make the whole file unreadable
- Debugging Difficulty: It's harder to inspect and debug binary files
Best Practices:
- Use Context Managers: Always use
with
statements to ensure files are properly closed - Specify Endianness: When dealing with multi-byte values, always specify the byte order (endianness)
- Handle Exceptions: Binary operations can fail in many ways; implement proper error handling
- Document Your Format: If creating a custom binary format, document its structure thoroughly
- Consider Standard Formats: When possible, use established binary formats or libraries
Summary
In this tutorial, you've learned:
- The fundamental differences between text and binary files
- How to open, read from, and write to binary files in Python
- Working with Python's binary data types (
bytes
andbytearray
) - Navigating through binary files with seek/tell
- Practical applications including copying binary files, creating custom formats, and using the
struct
module - Best practices for binary file handling
Binary files are powerful and efficient, but they require careful handling. With the skills you've learned in this tutorial, you're now equipped to work confidently with binary data in Python!
Exercises
-
Write a program that creates a binary file containing 100 random integers (1-100) and then reads the file to calculate their average.
-
Create a simple binary file format that can store a list of students with their name, ID number, and GPA. Implement functions to write and read this format.
-
Write a program that can determine if a file is a PNG image by checking its file signature (the first 8 bytes of a PNG file are:
89 50 4E 47 0D 0A 1A 0A
). -
Create a binary file viewer that displays the hexadecimal representation of any binary file, along with its ASCII interpretation.
-
Implement a simple XOR encryption/decryption function that works on binary files.
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)