Purging history

Many of my repositories, including this site, contain binary files. This results in repos getting heavier over time, with version after version, every copy ever pushed is still around for me to dig back and recover from — the general light and magic of git. That said, most repos I store tend to be in forward motion, and rarely do I find the need to revert back to an old copy, especially after initial few days of adding something new.

Now, I do not know if this is the best way to manage old baggage, but once in a while, I find purging git history handy — after a particular repo has stabilized in fixes and updates, especially those with obsolete binary files. It makes for faster pulls, leaner on disk space, and it results in generally faster responses from server. To automate this, I wrote a generic script. Here it is.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

# ph.py -- purge (git) history, 2015 ckunte

import os

def main():
    fullpath = os.getcwd()
    foldername = os.path.basename(os.path.normpath(fullpath))
    repo = "git@github.com:ckunte/" + foldername + ".git"
    cmd1 = 'git reset --hard; rm -rf .git/; git init; git add .; git commit -m "first commit.";'
    cmd2 = 'git remote add origin ' + repo + '; git push --force origin master;'
    os.system(cmd1 + cmd2)
    pass

if __name__ == '__main__':
    main()

It needs to be run from the repo’s root. For example, if the repo is at ~/Projects/hcalc, then I run the script from within the hcalc folder.

Another way to do this is as follows:

#!/usr/bin/env zsh
git checkout --orphan newBranch
git add -A
git commit -m 'first commit'
# Deletes the master branch
git branch -D master
# Rename the current branch to master
git branch -m master
# Force push master branch to github
git push -f origin master
# Remove old files
git gc --aggressive --prune=all

Warning: I do not recommend any method of purging repos with valuable and well documented history, particularly open source projects that may have a following, and receive contributions from other project patrons, who may also be interested in project’s history. I use this for my own side projects that are either heavy on binary files, or have uninteresting commit messages.