Skip to main content

Python File Operations

Working with files is a fundamental skill in Python programming. Whether you're loading datasets for machine learning with PyTorch, saving model outputs, or processing text data, understanding how Python handles file operations is essential.

Introduction to File Operations

In Python, file operations follow a simple pattern:

  1. Open a file
  2. Read from or write to the file
  3. Close the file

This process ensures that you can access external data and store your program's results persistently on disk.

Opening and Closing Files

The open() Function

The built-in open() function is your gateway to file operations in Python:

python
file = open('example.txt', 'r')
# Perform operations
file.close()

The open() function takes two primary arguments:

  • File path: The location of the file you want to access
  • Mode: Specifies whether you want to read ('r'), write ('w'), append ('a'), etc.

Common File Modes

ModeDescription
'r'Read (default) - Opens file for reading
'w'Write - Opens file for writing (creates new file or truncates existing)
'a'Append - Opens file for writing, appending to the end
'b'Binary mode (e.g., 'rb' for reading binary)
't'Text mode (default)
'+'Read and write mode (e.g., 'r+')

Using with Statements (Context Managers)

A better practice is to use the with statement, which automatically closes files even if errors occur:

python
with open('example.txt', 'r') as file:
# Operations here
content = file.read()

# File is automatically closed when exiting the with block

Reading Files

Reading Entire File Content

python
with open('example.txt', 'r') as file:
content = file.read()
print(content)

Output:

This is the content of example.txt.
It has multiple lines.
Python file operations are powerful!

Reading Line by Line

python
with open('example.txt', 'r') as file:
for line in file:
print(line.strip()) # strip() removes trailing newline

Output:

This is the content of example.txt.
It has multiple lines.
Python file operations are powerful!

Reading All Lines as a List

python
with open('example.txt', 'r') as file:
lines = file.readlines()
print(lines)

Output:

['This is the content of example.txt.\n', 'It has multiple lines.\n', 'Python file operations are powerful!']

Writing Files

Writing Text

python
with open('output.txt', 'w') as file:
file.write("Hello, PyTorch learners!\n")
file.write("File operations are essential for data handling.")

This creates (or overwrites) 'output.txt' with the content:

Hello, PyTorch learners!
File operations are essential for data handling.

Appending to Files

python
with open('output.txt', 'a') as file:
file.write("\nThis line is appended to the file.")

Now 'output.txt' contains:

Hello, PyTorch learners!
File operations are essential for data handling.
This line is appended to the file.

Working with Different File Types

CSV Files

CSV (Comma-Separated Values) files are common for storing tabular data:

python
import csv

# Writing CSV
with open('data.csv', 'w', newline='') as file:
writer = csv.writer(file)
writer.writerow(['Name', 'Age', 'Score'])
writer.writerow(['Alice', 25, 95])
writer.writerow(['Bob', 30, 88])
writer.writerow(['Charlie', 22, 92])

# Reading CSV
with open('data.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row)

Output:

['Name', 'Age', 'Score']
['Alice', '25', '95']
['Bob', '30', '88']
['Charlie', '22', '92']

JSON Files

JSON is popular for configuration files and data exchange:

python
import json

# Data to be written
data = {
'name': 'Alice',
'age': 25,
'scores': [95, 89, 92],
'contact': {
'email': '[email protected]',
'phone': '555-1234'
}
}

# Writing JSON
with open('data.json', 'w') as file:
json.dump(data, file, indent=4)

# Reading JSON
with open('data.json', 'r') as file:
loaded_data = json.load(file)
print(loaded_data)
print(f"Name: {loaded_data['name']}")
print(f"First score: {loaded_data['scores'][0]}")

Output:

{'name': 'Alice', 'age': 25, 'scores': [95, 89, 92], 'contact': {'email': '[email protected]', 'phone': '555-1234'}}
Name: Alice
First score: 95

File System Operations

Python's os module provides functions for interacting with the file system:

python
import os

# List files in a directory
files = os.listdir('.')
print(f"Files in current directory: {files}")

# Check if a file exists
if os.path.exists('example.txt'):
print("example.txt exists!")

# Get file size
size = os.path.getsize('example.txt')
print(f"File size: {size} bytes")

# Create a new directory
os.makedirs('new_folder', exist_ok=True)

Practical Example: Data Processing for PyTorch

Let's create a simple example of processing data from a file for use with PyTorch:

python
import torch

# First, create a data file
with open('tensor_data.txt', 'w') as file:
file.write("1.5 2.3 3.1\n")
file.write("4.0 5.2 6.7\n")
file.write("7.3 8.1 9.4")

# Now read and process the data
data_list = []
with open('tensor_data.txt', 'r') as file:
for line in file:
# Convert each line to a list of floats
values = [float(x) for x in line.strip().split()]
data_list.append(values)

# Convert to a PyTorch tensor
data_tensor = torch.tensor(data_list)
print(data_tensor)
print(f"Shape: {data_tensor.shape}")
print(f"Data type: {data_tensor.dtype}")

Output:

tensor([[1.5000, 2.3000, 3.1000],
[4.0000, 5.2000, 6.7000],
[7.3000, 8.1000, 9.4000]])
Shape: torch.Size([3, 3])
Data type: torch.float32

Error Handling in File Operations

Always handle potential errors when working with files:

python
try:
with open('nonexistent_file.txt', 'r') as file:
content = file.read()
except FileNotFoundError:
print("File not found! Please check the file path.")
except PermissionError:
print("You don't have permission to access this file.")
except Exception as e:
print(f"An error occurred: {e}")

Summary

File operations in Python are straightforward yet powerful:

  • Use the open() function to access files, preferably with context managers (with statements)
  • Read files with read(), readlines(), or by iteration
  • Write to files with write() and writelines()
  • Handle different file formats with modules like csv and json
  • Manage files and directories with the os module
  • Always handle potential errors with try-except blocks

Mastering file operations will help you work with datasets, save models, and process inputs/outputs effectively when using PyTorch for machine learning projects.

Exercises

  1. Create a program that reads a text file and counts the occurrences of each word
  2. Write a script to merge multiple CSV files into one
  3. Create a function that loads numeric data from a text file and converts it into a PyTorch tensor
  4. Write a program that can save and load a simple PyTorch model's parameters to/from a file
  5. Create a data preprocessing script that reads raw text data, cleans it, and saves it in a format ready for PyTorch training

Additional Resources



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)