Purging history

Many of my repositories, including this site, contain binary files. This results in repos getting heavier over time, with version after version, every copy ever pushed is still around for me to dig back and recover from — the general light and magic of git. That said, most repos I store tend to be in forward motion, and rarely do I find the need to revert back to an old copy, especially after initial few days of adding something new.

Now, I do not know if this is the best way to manage old baggage, but once in a while, I find purging git history handy — after a particular repo has stabilized in fixes and updates, especially those with obsolete binary files. It makes for faster pulls, leaner on disk space, and it results in generally faster responses from server. To automate this, I wrote a generic script. Here it is.

#!/usr/bin/env python
# -*- coding: UTF-8 -*-

# ph.py -- purge (git) history, 2015 ckunte

import os

def main():
fullpath = os.getcwd()
foldername = os.path.basename(os.path.normpath(fullpath))
repo = "git@github.com:ckunte/" + foldername + ".git"
cmd1 = 'git reset --hard; rm -rf .git/; git init; git add .; git commit -m "first commit.";'
cmd2 = 'git remote add origin ' + repo + '; git push --force origin master;'
os.system(cmd1 + cmd2)
pass

if __name__ == '__main__':
main()


It needs to be run from the repo’s root. For example, if the repo is at ~/Projects/hcalc, then I run the script from within the hcalc folder.

Another way to do this is as follows:

#!/usr/bin/env zsh
git checkout --orphan newBranch