Convert all PDF files in a directory to PNG images

I just needed to convert several PDF graphics to PNG raster graphics. The cleanest way to do this is via Ghostscript (example: gs -sDEVICE=png16m -sOutputFile=tiger.png tiger.pdf). For convenience I made a Python script that converts all PDF files in the working directory to PNG files via Ghostscript. I use it under Windows, but this works for “all” operating systems. Let me share it with you.

Python script:
  1. import subprocess
  2. import os
  3. import traceback
  4.  
  5. GHOSTSCRIPTPATH = "C:/Program Files (x86)/gs/gs8.70/bin/gswin32c.exe"
  6.  
  7. def main():
  8.     cwd = os.getcwd()
  9.     for filename in os.listdir(cwd):
  10.         (name, ext) = os.path.splitext(filename)
  11.         if ext.lower().endswith("pdf"):
  12.             print "*** FOUND %s" % filename
  13.             convert_pdf_file_to_png(GHOSTSCRIPTPATH,
  14.                 os.path.join(cwd, filename))
  15.  
  16. def convert_pdf_file_to_png(ghostscriptpath, pdffilepath):
  17.     if not os.path.isfile(pdffilepath):
  18.         print "%s is not a file" % pdffilepath
  19.         return False
  20.     if not os.path.isfile(ghostscriptpath):
  21.         print "%s is not a file" % ghostscriptpath
  22.         return False
  23.  
  24.     pdffiledir = os.path.dirname(pdffilepath)
  25.     pdffilename = os.path.basename(pdffilepath)
  26.     pdfname, ext = os.path.splitext(pdffilepath)
  27.  
  28.     try:    
  29.         # change the "-rXXX" option to set the PNG's resolution.
  30.         # -> http://ghostscript.com/doc/current/Devices.htm#File_formats
  31.         # for other commandline options see
  32.         # http://ghostscript.com/doc/current/Use.htm#Options
  33.         arglist = [ghostscriptpath,
  34.                   "-dBATCH",
  35.                   "-dNOPAUSE",
  36.                   "-sOutputFile=%s.png" % pdfname,
  37.                   "-sDEVICE=png16m",
  38.                   "-r800",
  39.                   pdffilepath]
  40.  
  41.         print "try running cmd:\n%s" % arglist
  42.  
  43.         sp = subprocess.Popen(
  44.             args=arglist,
  45.             stdout=subprocess.PIPE,
  46.             stderr=subprocess.PIPE)
  47.     except:
  48.         print "Error while running Ghostscript subprocess. Traceback:"
  49.         print "Traceback:\n%s"%traceback.format_exc()
  50.         return False
  51.     # wait for process to terminate, get stdout and stderr
  52.     stdout, stderr = sp.communicate()
  53.     print "Ghostscript subprocess STDOUT:\n%s" % stdout
  54.     if stderr:
  55.         print "Ghostscript STDERR:\n%s" % stderr
  56.         return False
  57.     return True
  58.  
  59.  
  60. if __name__ == "__main__":
  61.     main()
Usage:
  • Download & Install Python 2.6.x: http://python.org/download/ (should work with Python 2.4 and above, but not with Python 3)
  • Download & Install Ghostscript: http://mirror.cs.wisc.edu/pub/mirrors/ghost/GPL/current/
  • Copy the script source from above, save it as e.g. make_all_pdfs_to_png.pyw
  • adjust the variable GHOSTSCRIPTPATH in line 5 of the script to the location of your Ghostscript executable. Even for Windows, it’s okay to use normal slashes in the path (or escaped backslashes: \\).
  • If needed, change the resolution option “-rXXX” in line 38 of the script. This determines the resolution of the PNGs (read in the Ghostscript documentation about the exact meaning of the options — links to the docs are given within the source above).
  • Now the Python script is ready to use. Copy it to the directory containing the PDF files you want to convert and start it (under Windows: simply double click the script file). The corresponding PNG files should be created now.

3 comments to Convert all PDF files in a directory to PNG images

  • jonathan

    hi jan-philip. i’ve tried your code for another purpose and it doesn’t seem to work. my aim is to convert color pdf to grayscale pdf, to reduce filesize. i’ve edited this portion to as such:

    arglist = [ghostscriptpath,
    "-dBATCH",
    "-dNOPAUSE",
    "-sOutputFile=%s.pdf" % pdfname,
    "-sDEVICE=pdfwrite",
    "-sColorConversionStrategy=Gray",
    "-dProcessColorModel=/DeviceGray",
    "-dCompatibilityLevel=1.4",
    pdffilepath]

    the python script runs but the converted pdf files don’t show up. also, there should be a delay during each conversion, all of which don’t exist. any thoughts on this? your input’s much appreciated! :D

  • jonathan

    i’ve noticed that the converted pdf file cannot have the same name as the source pdf file else it’ll end up as 3kb file size. if i were to have a folder called “converted” in the same directory, for the line:

    “-sOutputFile=%s.pdf” % pdfname,

    how can i reference that to place the converted files to “converted” folder? i’ve tried many different methods but to no avail. :(

    • Hey Jonathan,

      I do not exactly realize what the problem is — but just two suggestions that could help:

      1) At first, try running ghostscript from the commandline (cmd.exe in Windows) yourself. While doing so, convince yourself that the set of commandline parameters you want to use is working as expected.

      2) When you are sure, which commandline parameters you need, put them into the Python script (as you have already tried)

      3) I would suggest just to append something like ‘_grayscale’ to each filename. Therefore, you could use:

      -sOutputFile=%s_grayscale.pdf % pdfname,

      This would definitely work. For a subdirectory, you could try

      -sOutputFile=converted/%s.pdf % pdfname,

      But I am not sure if it works like this or if you would need a full path instead. Furthermore, in case of a subdirectory you would have to add some code that checks if this directory already exists (and which would create it otherwise).

      Hope this helps,

      Jan-Philip

Leave a Reply

  

  

  


*

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" cssfile="">