Python File Operations
Working with files is a fundamental skill in Python programming. Whether you're loading datasets for machine learning with PyTorch, saving model outputs, or processing text data, understanding how Python handles file operations is essential.
Introduction to File Operations
In Python, file operations follow a simple pattern:
- Open a file
- Read from or write to the file
- Close the file
This process ensures that you can access external data and store your program's results persistently on disk.
Opening and Closing Files
The open()
Function
The built-in open()
function is your gateway to file operations in Python:
file = open('example.txt', 'r')
# Perform operations
file.close()
The open()
function takes two primary arguments:
- File path: The location of the file you want to access
- Mode: Specifies whether you want to read ('r'), write ('w'), append ('a'), etc.
Common File Modes
Mode | Description |
---|---|
'r' | Read (default) - Opens file for reading |
'w' | Write - Opens file for writing (creates new file or truncates existing) |
'a' | Append - Opens file for writing, appending to the end |
'b' | Binary mode (e.g., 'rb' for reading binary) |
't' | Text mode (default) |
'+' | Read and write mode (e.g., 'r+') |
Using with
Statements (Context Managers)
A better practice is to use the with
statement, which automatically closes files even if errors occur:
with open('example.txt', 'r') as file:
# Operations here
content = file.read()
# File is automatically closed when exiting the with block
Reading Files
Reading Entire File Content
with open('example.txt', 'r') as file:
content = file.read()
print(content)
Output:
This is the content of example.txt.
It has multiple lines.
Python file operations are powerful!
Reading Line by Line
with open('example.txt', 'r') as file:
for line in file:
print(line.strip()) # strip() removes trailing newline
Output:
This is the content of example.txt.
It has multiple lines.
Python file operations are powerful!
Reading All Lines as a List
with open('example.txt', 'r') as file:
lines = file.readlines()
print(lines)
Output:
['This is the content of example.txt.\n', 'It has multiple lines.\n', 'Python file operations are powerful!']
Writing Files
Writing Text
with open('output.txt', 'w') as file:
file.write("Hello, PyTorch learners!\n")
file.write("File operations are essential for data handling.")
This creates (or overwrites) 'output.txt' with the content:
Hello, PyTorch learners!
File operations are essential for data handling.
Appending to Files
with open('output.txt', 'a') as file:
file.write("\nThis line is appended to the file.")
Now 'output.txt' contains:
Hello, PyTorch learners!
File operations are essential for data handling.
This line is appended to the file.
Working with Different File Types
CSV Files
CSV (Comma-Separated Values) files are common for storing tabular data:
import csv
# Writing CSV
with open('data.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Name', 'Age', 'Score'])
writer.writerow(['Alice', 25, 95])
writer.writerow(['Bob', 30, 88])
writer.writerow(['Charlie', 22, 92])
# Reading CSV
with open('data.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)
Output:
['Name', 'Age', 'Score']
['Alice', '25', '95']
['Bob', '30', '88']
['Charlie', '22', '92']
JSON Files
JSON is popular for configuration files and data exchange:
import json
# Data to be written
data = {
'name': 'Alice',
'age': 25,
'scores': [95, 89, 92],
'contact': {
'email': '[email protected]',
'phone': '555-1234'
}
}
# Writing JSON
with open('data.json', 'w') as file:
json.dump(data, file, indent=4)
# Reading JSON
with open('data.json', 'r') as file:
loaded_data = json.load(file)
print(loaded_data)
print(f"Name: {loaded_data['name']}")
print(f"First score: {loaded_data['scores'][0]}")
Output:
{'name': 'Alice', 'age': 25, 'scores': [95, 89, 92], 'contact': {'email': '[email protected]', 'phone': '555-1234'}}
Name: Alice
First score: 95
File System Operations
Python's os
module provides functions for interacting with the file system:
import os
# List files in a directory
files = os.listdir('.')
print(f"Files in current directory: {files}")
# Check if a file exists
if os.path.exists('example.txt'):
print("example.txt exists!")
# Get file size
size = os.path.getsize('example.txt')
print(f"File size: {size} bytes")
# Create a new directory
os.makedirs('new_folder', exist_ok=True)
Practical Example: Data Processing for PyTorch
Let's create a simple example of processing data from a file for use with PyTorch:
import torch
# First, create a data file
with open('tensor_data.txt', 'w') as file:
file.write("1.5 2.3 3.1\n")
file.write("4.0 5.2 6.7\n")
file.write("7.3 8.1 9.4")
# Now read and process the data
data_list = []
with open('tensor_data.txt', 'r') as file:
for line in file:
# Convert each line to a list of floats
values = [float(x) for x in line.strip().split()]
data_list.append(values)
# Convert to a PyTorch tensor
data_tensor = torch.tensor(data_list)
print(data_tensor)
print(f"Shape: {data_tensor.shape}")
print(f"Data type: {data_tensor.dtype}")
Output:
tensor([[1.5000, 2.3000, 3.1000],
[4.0000, 5.2000, 6.7000],
[7.3000, 8.1000, 9.4000]])
Shape: torch.Size([3, 3])
Data type: torch.float32
Error Handling in File Operations
Always handle potential errors when working with files:
try:
with open('nonexistent_file.txt', 'r') as file:
content = file.read()
except FileNotFoundError:
print("File not found! Please check the file path.")
except PermissionError:
print("You don't have permission to access this file.")
except Exception as e:
print(f"An error occurred: {e}")
Summary
File operations in Python are straightforward yet powerful:
- Use the
open()
function to access files, preferably with context managers (with
statements) - Read files with
read()
,readlines()
, or by iteration - Write to files with
write()
andwritelines()
- Handle different file formats with modules like
csv
andjson
- Manage files and directories with the
os
module - Always handle potential errors with try-except blocks
Mastering file operations will help you work with datasets, save models, and process inputs/outputs effectively when using PyTorch for machine learning projects.
Exercises
- Create a program that reads a text file and counts the occurrences of each word
- Write a script to merge multiple CSV files into one
- Create a function that loads numeric data from a text file and converts it into a PyTorch tensor
- Write a program that can save and load a simple PyTorch model's parameters to/from a file
- Create a data preprocessing script that reads raw text data, cleans it, and saves it in a format ready for PyTorch training
Additional Resources
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)