Skip to main content

Express File Streams

When working with large files in Express applications, using streams provides significant performance benefits and memory efficiency. Unlike loading entire files into memory, streams allow you to process data in chunks, making your application more scalable.

What are File Streams?

In Node.js and Express, streams are objects that let you read data from a source or write data to a destination continuously. Think of them as channels where data flows piece by piece, rather than being loaded all at once.

For file operations, streams are particularly valuable when:

  • Processing large files that would consume too much memory if loaded entirely
  • Building real-time applications where data needs to be processed as it arrives
  • Creating efficient APIs that transfer files between clients and servers

Types of Streams in Node.js

Before diving into Express implementations, let's understand the four fundamental types of streams:

  1. Readable - Sources from which data can be consumed (e.g., reading a file)
  2. Writable - Destinations to which data can be written (e.g., writing to a file)
  3. Duplex - Both readable and writable (e.g., network sockets)
  4. Transform - Duplex streams that can modify data as it's written or read (e.g., compression)

Setting Up Express for File Streaming

First, let's create a basic Express application that we'll use to demonstrate file streaming:

javascript
const express = require('express');
const fs = require('fs');
const path = require('path');

const app = express();
const port = 3000;

// Basic middleware
app.use(express.json());

app.listen(port, () => {
console.log(`Server running on port ${port}`);
});

Streaming Files for Download

One of the most common use cases for streams in Express is sending files to clients. Instead of loading the entire file into memory, we can stream it directly:

javascript
// Streaming a file download
app.get('/download/:filename', (req, res) => {
const filename = req.params.filename;
const filePath = path.join(__dirname, 'files', filename);

// Check if file exists
fs.access(filePath, fs.constants.F_OK, (err) => {
if (err) {
return res.status(404).send('File not found');
}

// Get file stats (including size)
fs.stat(filePath, (err, stats) => {
if (err) {
return res.status(500).send('Error accessing file');
}

// Set appropriate headers
res.setHeader('Content-Length', stats.size);
res.setHeader('Content-Type', 'application/octet-stream');
res.setHeader('Content-Disposition', `attachment; filename="${filename}"`);

// Create read stream and pipe to response
const fileStream = fs.createReadStream(filePath);
fileStream.pipe(res);

// Handle stream errors
fileStream.on('error', (error) => {
console.error('Stream error:', error);
res.status(500).end('File stream error');
});
});
});
});

How This Works:

  1. We create a route that accepts a filename parameter
  2. We check if the file exists using fs.access()
  3. We get the file's information using fs.stat()
  4. We set appropriate HTTP headers for the download
  5. We create a readable stream from the file using fs.createReadStream()
  6. We pipe that stream directly to the response object
  7. We add error handling to manage any streaming issues

This approach never loads the entire file into memory, making it efficient even for very large files.

Uploading Files with Streams

For file uploads, we can use a streaming approach with libraries like multer with disk storage or implement manual streaming:

javascript
const multer = require('multer');
const storage = multer.diskStorage({
destination: (req, file, cb) => {
cb(null, path.join(__dirname, 'uploads'));
},
filename: (req, file, cb) => {
cb(null, Date.now() + '-' + file.originalname);
}
});

const upload = multer({ storage });

app.post('/upload', upload.single('file'), (req, res) => {
if (!req.file) {
return res.status(400).send('No file uploaded');
}

res.send({
message: 'File uploaded successfully',
filename: req.file.filename,
size: req.file.size
});
});

Manual File Upload with Streams

For more control, you might want to handle the streaming manually:

javascript
const fs = require('fs');
const path = require('path');
const busboy = require('busboy');

app.post('/upload-stream', (req, res) => {
// Create busboy instance with request headers
const bb = busboy({ headers: req.headers });

// Handle file stream
bb.on('file', (fieldname, fileStream, filename, encoding, mimetype) => {
console.log(`Processing upload: ${filename.filename}`);

// Create write stream
const saveTo = path.join(__dirname, 'uploads', filename.filename);
const writeStream = fs.createWriteStream(saveTo);

// Pipe file data to write stream
fileStream.pipe(writeStream);

// Handle completion
fileStream.on('end', () => {
console.log(`Upload of ${filename.filename} completed`);
});

// Handle write stream completion
writeStream.on('close', () => {
console.log(`File saved: ${saveTo}`);
});
});

// Handle form field data
bb.on('field', (fieldname, val) => {
console.log(`Field [${fieldname}]: value: ${val}`);
});

// Handle upload completion
bb.on('finish', () => {
res.send('Upload processed successfully');
});

// Pipe request to busboy for processing
req.pipe(bb);
});

Video Streaming Example

A classic use case for streams is video streaming. Here's how to implement a simple video stream endpoint:

javascript
app.get('/stream/video/:filename', (req, res) => {
const filename = req.params.filename;
const videoPath = path.join(__dirname, 'videos', filename);

// Check if file exists
fs.access(videoPath, fs.constants.F_OK, (err) => {
if (err) {
return res.status(404).send('Video not found');
}

// Get video stats
const stat = fs.statSync(videoPath);
const fileSize = stat.size;
const range = req.headers.range;

// Handle range request (partial content)
if (range) {
// Parse range
const parts = range.replace(/bytes=/, '').split('-');
const start = parseInt(parts[0], 10);
const end = parts[1] ? parseInt(parts[1], 10) : fileSize - 1;
const chunkSize = (end - start) + 1;

// Create read stream for the specific range
const stream = fs.createReadStream(videoPath, { start, end });

// Set headers for range response
res.writeHead(206, {
'Content-Range': `bytes ${start}-${end}/${fileSize}`,
'Accept-Ranges': 'bytes',
'Content-Length': chunkSize,
'Content-Type': 'video/mp4',
});

// Pipe the video chunk
stream.pipe(res);
}
// Handle full video request
else {
// Set headers for full response
res.writeHead(200, {
'Content-Length': fileSize,
'Content-Type': 'video/mp4',
});

// Stream the full video
fs.createReadStream(videoPath).pipe(res);
}
});
});

This streaming implementation supports both full video and range requests (partial content), which is essential for video players that allow seeking to different positions.

Transform Streams: Processing Data On-the-fly

Transform streams are powerful for modifying data as it flows. Here's an example that converts a text file to uppercase while streaming:

javascript
const { Transform } = require('stream');

// Create uppercase transform stream
const upperCaseTransform = new Transform({
transform(chunk, encoding, callback) {
// Convert buffer chunk to string, uppercase it, then back to buffer
const upperChunk = chunk.toString().toUpperCase();
this.push(Buffer.from(upperChunk));
callback();
}
});

app.get('/uppercase/:filename', (req, res) => {
const filename = req.params.filename;
const filePath = path.join(__dirname, 'files', filename);

fs.access(filePath, fs.constants.F_OK, (err) => {
if (err) {
return res.status(404).send('File not found');
}

res.setHeader('Content-Type', 'text/plain');
res.setHeader('Content-Disposition', `attachment; filename="uppercase-${filename}"`);

// Create the pipeline: read file -> transform to uppercase -> send response
const readStream = fs.createReadStream(filePath);
readStream
.pipe(upperCaseTransform)
.pipe(res);

readStream.on('error', (error) => {
console.error('Stream error:', error);
res.status(500).end('File stream error');
});
});
});

Error Handling in Streams

Proper error handling is crucial when working with streams. Here's a more complete example showing how to handle various stream errors:

javascript
app.get('/download-safe/:filename', (req, res) => {
const filename = req.params.filename;
const filePath = path.join(__dirname, 'files', filename);

const readStream = fs.createReadStream(filePath);

// Set content headers
res.setHeader('Content-Disposition', `attachment; filename="${filename}"`);
res.setHeader('Content-Type', 'application/octet-stream');

// Pipe the file stream to response
readStream.pipe(res);

// Handle stream errors
readStream.on('error', (error) => {
console.error('Stream error:', error);

// Check if headers have been sent
if (!res.headersSent) {
if (error.code === 'ENOENT') {
return res.status(404).send('File not found');
} else {
return res.status(500).send('Internal server error');
}
} else {
// If headers already sent, we need to close the connection
res.end();
}
});

// Handle client disconnect
req.on('close', () => {
readStream.destroy(); // Clean up the stream
console.log('Client disconnected, stream destroyed');
});
});

Streaming Large CSV Data Processing

Here's a practical example of streaming a large CSV file for processing:

javascript
const csv = require('csv-parser');

app.get('/process-csv/:filename', (req, res) => {
const filename = req.params.filename;
const filePath = path.join(__dirname, 'data', filename);

const results = [];
let rowCount = 0;

res.setHeader('Content-Type', 'application/json');

// Create readable stream
const readStream = fs.createReadStream(filePath)
.on('error', (error) => {
if (error.code === 'ENOENT') {
return res.status(404).send({ error: 'CSV file not found' });
}
return res.status(500).send({ error: 'Error reading CSV file' });
});

// Process the CSV stream
readStream
.pipe(csv())
.on('data', (data) => {
rowCount++;
// For very large files, we might just want to count or process
// without storing everything in memory
if (results.length < 100) { // Store only first 100 rows for preview
results.push(data);
}
})
.on('end', () => {
res.send({
totalRows: rowCount,
preview: results
});
})
.on('error', (error) => {
console.error('CSV parsing error:', error);
res.status(500).send({ error: 'Error parsing CSV data' });
});
});

Performance Considerations

When implementing file streams in Express, remember these best practices:

  1. Stream backpressure: Ensure your streams handle backpressure properly (when the destination can't process data as fast as the source produces it)
  2. Memory usage: Monitor memory usage during stream operations
  3. Stream cleanup: Always destroy streams when errors occur or connections close
  4. Chunk size: For custom stream implementations, choose appropriate chunk sizes for your use case
  5. Error handling: Implement comprehensive error handling for all stream events

Summary

File streams in Express provide an efficient way to handle large files without overwhelming your server's memory. By processing data in chunks, you can build more scalable applications capable of handling files of any size.

We've covered:

  • Basic file streaming concepts
  • Implementing file downloads with streams
  • File uploads using streams
  • Video streaming with range support
  • Transform streams for on-the-fly data processing
  • Error handling patterns for robust applications
  • Real-world examples like CSV processing

Additional Resources

Exercises

  1. Implement a file compression endpoint that uses streams to compress files on-the-fly using the zlib module
  2. Create a streaming image resizing service using the Sharp library
  3. Build a log file analyzer that streams large log files and extracts specific patterns
  4. Implement a stream-based file encryption/decryption service

By mastering file streams in Express, you'll be able to create high-performance applications that efficiently handle data of any size while maintaining excellent memory usage patterns.



If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)