Git Data Recovery
Introduction
Ever accidentally deleted a branch? Lost commits after a bad merge? Overwrote important changes? Git's design as a content-addressable filesystem means that while data may appear lost, it often remains in your repository. This guide will walk you through techniques to recover seemingly lost data in Git.
Understanding Git's Data Storage
Before learning recovery techniques, it helps to understand how Git stores data. Git doesn't just track changes—it preserves a complete history of your project.
When you commit changes, Git:
- Creates immutable objects with unique hashes
- Maintains reference pointers to these objects
- Records all reference changes in the reflog
This means that even when you can't see a commit in git log
, the data often still exists in the repository.
Common Recovery Scenarios
1. Recovering Lost Commits
Using the Reflog
Git's reflog records where your HEAD and branch references have pointed during the past 90 days (by default).
# View the reflog
git reflog
Example output:
734713b HEAD@{0}: commit: Add payment feature
a6f30b1 HEAD@{1}: pull origin main: Fast-forward
c59d056 HEAD@{2}: commit: Fix rendering issue in navbar
3e7b21d HEAD@{3}: reset --hard HEAD~1
8a92d3f HEAD@{4}: commit: Add user authentication
To recover a commit that's no longer referenced:
# Find the lost commit in the reflog
git reflog
# Create a new branch pointing to the lost commit
git branch recovery-branch 8a92d3f
# Or directly check out the commit
git checkout 8a92d3f
Finding Dangling Commits
If a commit is truly "lost" (not in the reflog), you can search for dangling commits:
# Find dangling commits
git fsck --lost-found
Example output:
Checking object directories: 100% (256/256), done.
dangling commit 7a3fb3d02d88c4eb85c0f99195e7a1a4996db32f
dangling blob 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
dangling commit fe43631bf52e348db3e1264e5a6bb7bd657c803d
Examine the content of a dangling commit:
git show 7a3fb3d02d88c4eb85c0f99195e7a1a4996db32f
To recover it:
git branch recovered-branch 7a3fb3d02d88c4eb85c0f99195e7a1a4996db32f
2. Recovering Deleted Branches
When you delete a branch, Git doesn't immediately delete its commits. They remain accessible through the reflog.
# List all entries in the reflog
git reflog
# Recreate the branch at its last position
git branch recovered-feature-branch a6f30b1
3. Recovering After a Hard Reset
If you've run git reset --hard
and lost changes:
# Check the reflog to find the commit before the reset
git reflog
# Create a new branch from that point
git branch pre-reset-state HEAD@{1}
4. Recovering Uncommitted Changes
Using git stash
If you accidentally ran commands that cleared your working directory:
# Check for stashed changes (if you remembered to stash)
git stash list
# Apply the latest stash
git stash apply
Using git fsck for unstaged blobs
Sometimes Git keeps deleted file contents as dangling blobs:
# Find dangling blobs
git fsck --lost-found
To examine a blob's content:
git cat-file -p 3b18e512dba79e4c8300dd08aeb37f8e728b8dad
Advanced Recovery Techniques
Recovering Files from Any Commit
You can extract files from any commit, even if they've been deleted:
# Find when the file existed
git log --all -- path/to/deleted/file.js
# Restore the file from a specific commit
git checkout abc1234 -- path/to/deleted/file.js
Excavating Lost Commits with git-reflog-walk
For complex situations, you can create a script to walk through the reflog:
#!/bin/bash
# save-all-dangling-commits.sh
for commit in $(git fsck --no-reflog | grep "dangling commit" | awk '{print $3}')
do
git log -1 $commit > /tmp/message
message=$(cat /tmp/message | grep -v "^commit" | tr -d '
')
date=$(cat /tmp/message | grep "^Date:" | sed "s/Date: //")
branch_name="recovered/$(date -d "$date" +%Y-%m-%d_%H-%M-%S)"
git branch $branch_name $commit
echo "Saved $commit as $branch_name: $message"
done
Make it executable and run:
chmod +x save-all-dangling-commits.sh
./save-all-dangling-commits.sh
Recovering from Corrupted Repositories
If .git
becomes corrupted, you can often recover using:
# Try automatic repair
git fsck --full
# If index is corrupted
rm .git/index
git reset
Practical Real-World Examples
Example 1: Recovering After an Accidental Force Push
Imagine someone force-pushed to the main branch, overwriting commits:
# Find the last known good state
git reflog show origin/main@{1}
# Create a recovery branch
git checkout -b recovery-main a6f30b1
# Push the recovery branch
git push -u origin recovery-main
Example 2: Retrieving an Earlier Version of a File
To retrieve a file as it existed 5 commits ago:
# View the file's history
git log -p -- path/to/file.js
# Restore the file from 5 commits ago
git checkout HEAD~5 -- path/to/file.js
# Review changes and commit if needed
git diff --staged
git commit -m "Restore file.js to earlier version"
Example 3: Recovering After a Botched Rebase
If you've rebased incorrectly and lost commits:
# Check the reflog for pre-rebase state
git reflog
# Find the last commit before rebase started
git checkout HEAD@{5}
# Create a new branch at this point
git checkout -b pre-rebase-backup
Best Practices to Prevent Data Loss
-
Make backups before risky operations:
bashgit branch backup-before-rebase main
-
Push to remote frequently:
bashgit push origin feature-branch
-
Use Git aliases for safer commands:
bashgit config --global alias.safe-reset 'reset --keep'
-
Extend reflog expiration time:
bashgit config --global gc.reflogExpire "365 days"
Summary
Git's design provides several layers of data protection that help you recover from mistakes:
- Reflog: Your first line of defense, tracking reference changes
- Object database: Stores all commit data until garbage collection
- Remote repositories: Act as distributed backups
When working with Git, remember that "lost" usually means "not currently referenced" rather than "permanently deleted." With the techniques in this guide, you can recover from most common data loss scenarios.
Additional Resources
Exercises
- Create a test repository and practice intentionally "losing" and recovering commits using the reflog.
- Try to recover a file that was deleted 5 commits ago without using the reflog.
- Set up a script that automatically creates backup branches before performing risky operations.
If you spot any mistakes on this website, please let me know at [email protected]. I’d greatly appreciate your feedback! :)