Python backup/snapshot script with compression (BZ2, 7zip)

While writing my master thesis with LaTeX, I often change source code files, graphics etc. It is important for me to keep track of the major changes as well as I have to prevent losing the whole work due to some data loss issue. To solve these two problems, I wrote a convenient Python backup script.

On invocation, it creates a snapshot of my thesis directorie’s content. It writes everything to another directory, with a timestamp in its name. With these snapshots, I can simply look into older versions of my work later on, just to be prepared if worst comes to worst.

But… what if the hard drive with the snapshots on it crashes? Therefore, I added the option to simply copy the whole directory tree to an additional destination as well. If this is another physical storage, it’s pretty unlikely to lose both snapshot copies at the same time. Feels good.

Over the time, I created more and more files within my project. For convenience, I added an option to compress the directorie’s snapshot into an archive. At this point, the user can choose between Python’s built-in .tar.bz2 compression method or let the script use 7zip, which provides ultra strong compression and is very fast.

After only a few seconds of configuration (directory definition, backup method selection), you only have to doubleclick the script file everytime you want to feel safe :)

For Windows users:
0) Download and install Python 2.6.x: http://python.org/download/
1) Copy the script (below), edit “SETTINGS” section of the script, save it within a .py file
2) Doubleclick as often as you want.

  1. #!/usr/bin/env python
  2. #********************************* README **************************************
  3. #
  4. # SNAPSHOT / BACKUP script for Windows/Linux
  5. # script by Jan-Philip Gehrcke -- jgehrcke@gmail.com -- http://gehrcke.de
  6. #
  7. # 0) DESCRIPTION:
  8. # ===============
  9. # On invocation, the script creates a snapshot of ORIG_DIR's contents and writes
  10. # it to BACKUP_DIR into 1) a new subdirectory or 2) a .tar.bz2 archive or 3) a
  11. # 7zip archive (choose it!). The time of snapshot creation is written into the
  12. # subdirectorie's name / archive file name. An optional second location can
  13. # be defined to which the snapshot will be written additionally.
  14. #
  15. # This script is useful to manually and quickly create snapshots of a multi-file
  16. # project you're working on, enabling _rollbacks_ to an older version of your
  17. # project's files. Furthermore, using the additional backup location on another
  18. # physical storage, the script prevents _data loss_.
  19. #
  20. # Primarily, this script is written for Windows users: simple double-click .py
  21. # file invocation is considered. Should work on Linux systems, too, but the
  22. # "press any key to continue" dialogue is quite un-unixoid.
  23. #
  24. # 1) USAGE:
  25. # =========
  26. #
  27. # Download and install Python 2.6.x: http://python.org/download/
  28. # For 7zip method, download http://www.7-zip.org/download.html
  29. #
  30. # Put the script file into the directory containing the directory you want to
  31. # back up, adjust settings (below) and then run the script (doubleclick on Win).
  32. #
  33. # The snapshot/backup of
  34. #  ./ORIG_DIR/*
  35. # will go to
  36. #  ./BACKUP_DIR/BACKUP_PREFIX_timestring/*       (SIMPLE method, built-in)
  37. # OR to the archive
  38. #  ./BACKUP_DIR/BACKUP_PREFIX_timestring.tar.bz2 (BZ2 method, built-in)
  39. # OR to the archive
  40. #  ./BACKUP_DIR/BACKUP_PREFIX_timestring.7z      (7zip method; ultra strong
  41. #                                                 compression; faster than BZ2;
  42. #                                                 requires 7zip to be available)
  43. #
  44. # Of course, ORIG_DIR and BACKUP_DIR can be absolute paths, too. Then, the
  45. # location of this script does not matter.
  46. #
  47. # 2) SETTINGS:
  48. # ============
  49. # always use SLASHES ("/") in paths, even on Windows -> don't use "\"
  50. ORIG_DIR = "C:/project"          # e.g. "." or "C:/project"
  51. BACKUP_DIR = "C:/backups"        # e.g. "C:/backups"
  52. BACKUP_PREFIX = "thesis_bckp"      # e.g. "thesis_bckp"
  53.  
  54. # choose backup method: 'SIMPLE' OR 'BZ2' OR '7zip':
  55. METHOD = '7zip'     # quick and really strong compression, 7zip.exe required
  56. #METHOD = 'SIMPLE'  # copy directory tree; e.g. if you don't have many files..
  57. #METHOD = 'BZ2'     # builtin method; if you like compression, but no 7zip.
  58.  
  59. # in case of 7zip, specify 7z executable path:
  60. SEVENZIPPATH = "c:/Programs/7-Zip/7z.exe" # e.g. "c:/Programs/7-Zip/7z.exe"
  61.  
  62. # set ADDITIONAL_BACKUP_DIR to double-save backup (e.g. on another hard disk)
  63. # (outcomment the line if this is undesired behavior)
  64. ADDITIONAL_BACKUP_DIR = "F:/backups"   # e.g. "F:/backups"
  65. #*******************************************************************************
  66.  
  67. import os, time, shutil, sys, tarfile, subprocess, traceback
  68.  
  69. def backup_directory_simple(srcdir,dstdir):
  70.     if os.path.exists(dstdir):
  71.         exit_stop("backup path %s already exists!" % dstdir)
  72.     try:
  73.         shutil.copytree(srcdir,dstdir)
  74.     except:
  75.         print "Error while copying tree in %s to %s" % (srcdir,dstdir)
  76.         print "Traceback:\n%s"%traceback.format_exc()
  77.         return False
  78.     return dstdir
  79.  
  80. def backup_directory_bz2(srcdir,tarpath):
  81.     if os.path.exists(tarpath):
  82.         exit_stop("backup path %s already exists!" % tarpath)
  83.     try:
  84.         tar = tarfile.open(tarpath, "w:bz2")
  85.         for filedir in os.listdir(srcdir):
  86.            tar.add(os.path.join(srcdir,filedir),arcname=filedir)
  87.         tar.close()
  88.     except:
  89.         print "Error while creating tar archive: %s" % tarpath
  90.         print "Traceback:\n%s"%traceback.format_exc()
  91.         return False
  92.     return tarpath
  93.  
  94. def backup_directory_7zip(srcdir,arcpath):
  95.     if os.path.exists(arcpath):
  96.         exit_stop("backup path %s already exists!" % arcpath)
  97.     try:
  98.         # -mx9 means maximum compression
  99.         arglist = [SEVENZIPPATH,"a",arcpath,"*","-r","-mx9"]
  100.         print ("try running cmd:\n %s\nin directory\n %s" %
  101.             (' '.join(arglist),srcdir))
  102.         # run 7zip (in the directory to be backupped!)
  103.         sp = subprocess.Popen(
  104.             args=arglist,
  105.             stdout=subprocess.PIPE,
  106.             stderr=subprocess.PIPE,
  107.             cwd=srcdir)
  108.     except:
  109.         print "Error while running 7zip subprocess. Traceback:"
  110.         print "Traceback:\n%s"%traceback.format_exc()
  111.         return False
  112.     # wait for process to terminate, get stdout and stderr
  113.     stdout, stderr = sp.communicate()
  114.     if stdout:
  115.         print ("\n>>> 7zip subprocess STDOUT START:\n%s"
  116.                 ">>> 7zip subprocess STDOUT END\n" % stdout)
  117.     if stderr:
  118.         print "7zip STDERR:\n%s" % stderr
  119.         return False
  120.     return arcpath
  121.  
  122. def any_key():
  123.     print "Press any key to continue."
  124.     getch()
  125.  
  126. def exit_stop(exitstring):
  127.     print exitstring
  128.     any_key()
  129.     sys.exit(exitstring)
  130.  
  131. def so_flushwr(string):
  132.     sys.stdout.write(string)
  133.     sys.stdout.flush()
  134.  
  135. # provide getch() method
  136. # (http://stackoverflow.com/questions/1394956/how-to-do-hit-any-key-in-python
  137. try:
  138.     # Win32
  139.     from msvcrt import getch
  140. except ImportError:
  141.     # UNIX
  142.     def getch():
  143.         import sys, tty, termios
  144.         fd = sys.stdin.fileno()
  145.         old = termios.tcgetattr(fd)
  146.         try:
  147.             tty.setraw(fd)
  148.             return sys.stdin.read(1)
  149.         finally:
  150.             termios.tcsetattr(fd, termios.TCSADRAIN, old)
  151.  
  152. # build timestring, check settings and invoke corresponding backup function
  153. print "*********************************************************************"
  154. print "* snapshot backup script by Jan-Philip Gehrcke -- http://gehrcke.de *"
  155. print "*********************************************************************\n"
  156.  
  157. timestr = time.strftime("_%y%m%d_%H%M%S",time.localtime())
  158. if METHOD not in ["SIMPLE", "BZ2", "7zip"]:
  159.     exit_stop("METHOD not 'SIMPLE' OR 'BZ2' OR '7zip'")
  160. if not os.path.exists(ORIG_DIR):
  161.     exit_stop("ORIG_DIR does not exist: %s" % os.path.abspath(ORIG_DIR))
  162. if not os.path.exists(BACKUP_DIR):
  163.     exit_stop("BACKUP_DIR does not exist: %s" % os.path.abspath(BACKUP_DIR))
  164. else:
  165.     print ("write snapshot of\n  %s\nto\n  %s\nusing the %s method...\n" %
  166.             (os.path.abspath(ORIG_DIR),os.path.abspath(BACKUP_DIR),METHOD))
  167.     if METHOD == "SIMPLE":
  168.         rv = backup_directory_simple(srcdir=ORIG_DIR,
  169.             dstdir=os.path.join(BACKUP_DIR, BACKUP_PREFIX + timestr))
  170.     elif METHOD == "BZ2":
  171.         rv = backup_directory_bz2(srcdir=ORIG_DIR,
  172.             tarpath=os.path.join(BACKUP_DIR,
  173.                 BACKUP_PREFIX + timestr + ".tar.bz2"))
  174.     else:
  175.         try:
  176.             if not os.path.exists(SEVENZIPPATH):
  177.                 exit_stop("7zip executable not found: %s" % SEVENZIPPATH)
  178.         except NameError:
  179.             exit_stop("variable SEVENZIPPATH not defined")
  180.         rv = backup_directory_7zip(srcdir=os.path.abspath(ORIG_DIR),
  181.             arcpath=os.path.abspath(os.path.join(BACKUP_DIR,
  182.                 BACKUP_PREFIX + timestr + ".7z")))
  183.  
  184. if rv:
  185.     print "Snapshot successfully written to\n  %s" % os.path.abspath(rv)
  186. else:
  187.     print "Failure during backup :-("
  188.  
  189. if 'ADDITIONAL_BACKUP_DIR' in globals() and rv:
  190.     if not os.path.exists(ADDITIONAL_BACKUP_DIR):
  191.         exit_stop(("ADDITIONAL_BACKUP_DIR does not exist: %s"
  192.             % os.path.abspath(ADDITIONAL_BACKUP_DIR)))
  193.     so_flushwr("\nwrite additional backup to %s.." % ADDITIONAL_BACKUP_DIR)
  194.     try:
  195.         dst = os.path.join(ADDITIONAL_BACKUP_DIR,os.path.basename(rv))
  196.         if os.path.isdir(rv):
  197.             shutil.copytree(rv,dst) # simple method, copy directory tree
  198.         else:
  199.             shutil.copy(rv,dst) # copy 7zip or bz2 archive
  200.         so_flushwr("success\n")
  201.     except:
  202.         print "Traceback:\n%s"%traceback.format_exc()
  203.         print "Additional backup not written. For diagnosis look above."
  204.  
  205. any_key()
  • Artem Harutyunyan

    Hi Jan-Philip,

    Check out dropbox (www.dropbox.com) for backups. It creates a directory (you can use it as an ‘additional destination’) under your filesystem which is automatically synchronized with their server (as well as with your other dropbox installations on other machines).

    Cheers,
    Artem.

  • Mel

    I really enjoy this little utility. Once I understood what -r meant, I was able to create a backup utility that ran only on one file. I simply dropped the -r and instead of “*” I put in my file name. This worked like a charm.

    Thanks for the code!

  • Tom

    Hi,

    Very good script, il like it.
    I begin python programming and i would like add a GUI for this script.
    Can you help me? Or can you give me ideas for do that?

    Thanks a lot.

    • Hey Tom,

      it’s always good to have motivation and a distinct idea to learn programming. But GUI programming is one thing one should start after having proper knowledge of a specific programming language. Anyways, if you insist.. :-) you should use PyQt as GUI module. Just search for it on the web. You’ll find tons of tutorials.

      Have fun,

      Jan-Philip

  • mhh.. copying the source does not work that good anymore, becaue most browsers introduce two newlines instead of one. Just leave me a comment if you want me to provide the source file.

  • Thanatos

    I like this script. But I checked out the man page for 7z, and it states that 7z should not be used for backup of linux/unix systems because it does not store the owner/group of a file. It also gives a cryptic ‘Do not use -r. It does not do what you think it does.” Otherwise, great job. I am going to use a modified version of the tar backup function.

  • awesome!! thanx a ton for this wonderful script! I added this to my crontab and it worked like a charm!

  • Hadeel JIhad

    when press f5 give me the following error :Knowing I’ve downloaded python & 7zip
    Traceback (most recent call last):
    File “C:\backup.py”, line 93, in
    if METHOD not in [“SIMPLE”, “BZ2”, “7zip”]:
    NameError: name ‘METHOD’ is not defined