Convert all PDF files in a directory to PNG images

I just needed to convert several PDF vector graphics to PNG graphics. A clean way to do this is via Ghostscript (example: gs -sDEVICE=png16m -sOutputFile=tiger.png tiger.pdf). For convenience, I made a Python script that converts PDF files to PNG files via Ghostscript. I use it under Linux and Windows.

Update (December 2012)
: It just appeared to me that I have a more consolidated version of the script in my code repository at bitbucket. I have updated the code below with the current version.

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
#   Copyright (C) 2009-2012 Jan-Philip Gehrcke
#
#   Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
#   Unless required by applicable law or agreed to in writing, software
#   distributed under the License is distributed on an "AS IS" BASIS,
#   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
#   See the License for the specific language governing permissions and
#   limitations under the License.
 
 
import subprocess
import os
import traceback
import sys
 
 
# Absolute path to Ghostscript executable here or command name if Ghostscript is
# in your PATH.
GHOSTSCRIPTCMD = "gs"
 
 
def usage_exit():
        sys.exit("Usage: %s png_resolution pdffile1 pdffile2 ..." %
            os.path.basename(sys.argv[0]))
 
 
def main():
    if not len(sys.argv) >= 3:
        usage_exit()
    try:
        resolution = int(sys.argv[1])
    except ValueError:
        usage_exit()
    for filepath in sys.argv[1:]:
        (name, ext) = os.path.splitext(filepath)
        if ext.lower().endswith("pdf"):
            print "*** Converting %s..." % filepath
            gs_pdf_to_png(os.path.join(os.getcwd(), filepath), resolution)
 
 
def gs_pdf_to_png(pdffilepath, resolution):
    if not os.path.isfile(pdffilepath):
        print "'%s' is not a file. Skip." % pdffilepath
    pdffiledir = os.path.dirname(pdffilepath)
    pdffilename = os.path.basename(pdffilepath)
    pdfname, ext = os.path.splitext(pdffilepath)
 
    try:    
        # Change the "-rXXX" option to set the PNG's resolution.
        # http://ghostscript.com/doc/current/Devices.htm#File_formats
        # For other commandline options see
        # http://ghostscript.com/doc/current/Use.htm#Options
        arglist = [GHOSTSCRIPTCMD,
                  "-dBATCH",
                  "-dNOPAUSE",
                  "-sOutputFile=%s.png" % pdfname,
                  "-sDEVICE=png16m",
                  "-r%s" % resolution,
                  pdffilepath]
        print "Running command:\n%s" % ' '.join(arglist)
        sp = subprocess.Popen(
            args=arglist,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE)
    except OSError:
        sys.exit("Error executing Ghostscript ('%s'). Is it in your PATH?" %
            GHOSTSCRIPTCMD)            
    except:
        print "Error while running Ghostscript subprocess. Traceback:"
        print "Traceback:\n%s"%traceback.format_exc()
 
    stdout, stderr = sp.communicate()
    print "Ghostscript stdout:\n'%s'" % stdout
    if stderr:
        print "Ghostscript stderr:\n'%s'" % stderr
 
 
if __name__ == "__main__":
    main()
  • jonathan

    hi jan-philip. i’ve tried your code for another purpose and it doesn’t seem to work. my aim is to convert color pdf to grayscale pdf, to reduce filesize. i’ve edited this portion to as such:

    arglist = [ghostscriptpath,
    “-dBATCH”,
    “-dNOPAUSE”,
    “-sOutputFile=%s.pdf” % pdfname,
    “-sDEVICE=pdfwrite”,
    “-sColorConversionStrategy=Gray”,
    “-dProcessColorModel=/DeviceGray”,
    “-dCompatibilityLevel=1.4”,
    pdffilepath]

    the python script runs but the converted pdf files don’t show up. also, there should be a delay during each conversion, all of which don’t exist. any thoughts on this? your input’s much appreciated! :D

    • vivek

      @490d49cd910a635a8da2a40b040fac68:disqus : i am facing the same problem did u find out any solution for the problem… my code is running properly bt i am unable to generate images thanks

  • jonathan

    i’ve noticed that the converted pdf file cannot have the same name as the source pdf file else it’ll end up as 3kb file size. if i were to have a folder called “converted” in the same directory, for the line:

    “-sOutputFile=%s.pdf” % pdfname,

    how can i reference that to place the converted files to “converted” folder? i’ve tried many different methods but to no avail. :(

    • Hey Jonathan,

      I do not exactly realize what the problem is — but just two suggestions that could help:

      1) At first, try running ghostscript from the commandline (cmd.exe in Windows) yourself. While doing so, convince yourself that the set of commandline parameters you want to use is working as expected.

      2) When you are sure, which commandline parameters you need, put them into the Python script (as you have already tried)

      3) I would suggest just to append something like ‘_grayscale’ to each filename. Therefore, you could use:

      -sOutputFile=%s_grayscale.pdf % pdfname,

      This would definitely work. For a subdirectory, you could try

      -sOutputFile=converted/%s.pdf % pdfname,

      But I am not sure if it works like this or if you would need a full path instead. Furthermore, in case of a subdirectory you would have to add some code that checks if this directory already exists (and which would create it otherwise).

      Hope this helps,

      Jan-Philip

  • I used it to convert all eps files to png. I think yours is the only script of this kind online. Thanks for sharing.

  • James

    Hi Jan-Philip – thank you for posting this script.

    I was attempting to use your script in my work, but I’m getting an error: “Error executing Ghostscript (‘gs’). Is it in your PATH?” Am I supposed to add the Ghostscript directory to my PYTHONPATH?

    I’m on Windows. I’ve downloaded and installed Ghostscript. Used the link at the top; I installed the 32 bit version.

    Thanks again for whatever help you can offer,
    best,
    James

    • James,

      it has nothing to do with PYTHONPATH.

      The critical line in the script is GHOSTSCRIPTCMD = "gs". On Windows, either change this to the full path of the gswin32 executable (e.g. C:/Programs/ghostscript/bin/gswin32.exe), or change it to just gswin32. The latter will only work if Ghostscript’s bin directory is in your PATH environment variable (the Ghostscript installer might have added it there).

      Note that you can edit the PATH environment variable yourself (google will help :-)). One of the best methods to check whether your PATH modification was successful is to (i) open cmd.exe, (ii) enter the program’s name, and (iii) to see either 'gswin32' is not recognized as an internal or external command or Ghostscript starting up.

      Hope that helps,

      Jan-Philip

  • sheila

    Hi, what are the arguments that I should include when running the code?

    • Hey, as stated in the help message, the calling signature is

      script.py png-resolution pdffile1 pdffile2 ...
      

      The resolution argument is directly passed to Ghostscript as value to the

      -r

      option. It is documented here:
      http://ghostscript.com/doc/current/Devices.htm

      It says:

      This option sets the resolution of the output file in dots per inch.
      The default value if you don't specify this options is usually 72 dpi.
      

      Hope this helps,

      Jan-Philip