Convert an Uploaded PDF to Thumbnail

I recently had a client who had a large number of PDFs which they wanted linked to specific case studies. Now usually you would just put a textual link wherever they wanted to put it but my client also wanted these links to display the first page of their PDF in a graphical format.

That request gives me three options:

  1. Screenshot the first page of each PDF and convert it to the right size and image format.
  2. Figure out a way to make the above step automated.
  3. Tell them a flat “No!”

Me being me; I decided on the more interesting second point, not only because in the long run it is more maintainable (and I don’t have to repeat the process for hundreds of PDFs) but also because its far more fun!

convert_pdf_to_image_example.jpg

So to make this UDT; begin by making sure your server runs “ImageMagick” and “GhostScript” as that is what we use to convert the PDF to an image.

The UDT is as follows:

// Check parameters
 $file = (!empty($params['file'])) ? $params['file'] : '';
 $thumbExt = (!empty($params['thumb_ext'])) ? $params['thumb_ext'] : '.jpg';
 $width = (!empty($params['width']) && is_int($params['width'])) ? (int)$params['width'] : 100;
 $height = (!empty($params['height']) && is_int($params['height'])) ? (int)$params['height'] : 100;
 $onlyExts = (!empty($params['only_exts'])) ? explode(",", str_replace(", ", "", $params['only_exts'])) : '';
 $notExts = (!empty($params['not_exts'])) ? explode(",", str_replace(", ", "", $params['not_exts'])) : '';
 $overWrite = (!empty($params['overwrite'])) ? true : false;

 // Set error variable
 $thumbError = false;

 // Check if thumb extension includes a dot, if not add one
 if (!strpos($thumbExt, '.')) {
  $thumbExt = ".".$thumbExt;
 }

 // If file parameter isn't empty
 if (!empty($file)) {
  $contains = false;

  // Check if the file
  if (!empty($onlyExts) && is_array($onlyExts) && sizeof($onlyExts) > 0) {
   foreach ($onlyExts as $ext) {
    if (stripos($file, $ext) !== false) $contains = true;
   }
  } else $contains = true;

  if ($contains) {
   $contains = false;

   if (!empty($notExts) && is_array($notExts) && sizeof($notExts) > 0) {
    foreach ($notExts as $ext) {
      if (stripos($file, $ext) !== false) $contains = true;  
    }
   } else $contains = false;  

   if (!$contains) {
    // Get default CMSMS variables
    $gCms = cmsms();
    $smarty = &$gCms->GetSmarty();

    // PDF and thumb directory paths
    $pdfDirectory = $gCms->config['uploads_path'].DIRECTORY_SEPARATOR."downloads".DIRECTORY_SEPARATOR;
    $thumbDirectory = $gCms->config['uploads_path'].DIRECTORY_SEPARATOR."downloadThumbs".DIRECTORY_SEPARATOR; 

    // The path to the PDF file
    $pdfWithPath = $pdfDirectory.$file;

    if (file_exists($pdfWithPath)) {
     // Get the files extension
     $fileExt = pathinfo($file, PATHINFO_EXTENSION);

     // Get the name of the file
     $fileName = basename($file, ".".$fileExt);   

     // Name the thumbnail image the same as the pdf file
     $thumb = $fileName; 

     // If PDF's filename includes spaces, replace with - and rename the file   
     if (strpos($fileName,' ')) {
      $newFileName = str_replace(' ', '-', $fileName);

      if (rename($pdfWithPath, $pdfDirectory.$newFileName.$fileExt)) {
       $pdfWithPath = $pdfDirectory.$newFileName.$fileExt;
       $thumb = str_replace(' ', '-', $thumb);
      }
     }      

     // Add the desired extension to the thumbnail
     $thumb = $thumb.$thumbExt;

     // Get ImageMagick convert path
     $convertPath = exec("which convert");
     
    // Get GhostScript path
    $ghostScriptPath = exec("which gs"); 

     // If convert path and GhostScript path is set and file doesn't exist then execute imageMagick's 'convert', setting the color space to RGB and size to requested width/height
     if (!empty($convertPath) && !empty($ghostScriptPath) && !file_exists($thumbDirectory.$thumb)) {
      exec("$convertPath \"{$pdfWithPath}[0]\" -colorspace RGB -geometry {$width}x{$height} $thumbDirectory$thumb", $out, $outCode);

      if ($outCode != 0) $thumbError = true;

     // If file exists, delete and try again
     } else if ($overWrite && unlink($thumbDirectory.$thumb)) {
      // If convert path and GhostScript path is set and file doesn't exist then execute imageMagick's 'convert', setting the color space to RGB and size to requested width/height
      if (!empty($convertPath) && !empty($ghostScriptPath) && !file_exists($thumbDirectory.$thumb)) { 
       exec("$convertPath \"{$pdfWithPath}[0]\" -colorspace RGB -geometry {$width}x{$height} $thumbDirectory$thumb", $out, $outCode);

       if ($outCode != 0) $thumbError = true;

      } else $thumbError = true;
     } else if (!$overWrite) $thumbError = true;
    } else $thumbError = true;
   } else $thumbError = true;
  } else $thumbError = true;
 } else $thumbError = true;

 if ($thumbError === true) {
  $smarty->assign('thumbError', true);
  $smarty->assign('thumbExists', false);
 } else {
  $smarty->assign('thumbError', false);
  $smarty->assign('thumbExists', true);
 } 

So in this UDT we start by checking for the parameters and adding them to their PHP variable equivalents. Then we make sure that the file exists and its extension is allowed, if it passes the initial checks then start the main process.

The UDT then assigns the correct paths to the PDF file and where you would like to place the Thumbnail, so please change these paths accordingly. It also searches for the “convert” function and “gs” used by “ImageMagick” and “GhostScript” to make sure they are installed.

Using the paths provided and the file; the UDT checks if the PDF exists and assigns the files name to the $thumb variable for later. The UDT will also rename (if possible) the PDF to replace any spaces with dashes.

The main step is next; the UDT will look to see if the thumbnail already exists and if the thumbnail doesn’t exist it will attempt to run the “ImageMagick” convert command. If the file exists and the overwrite variable is set to TRUE then the UDT will attempt to delete the existing thumbnail and run the “ImageMagick” convert command again.

The “ImageMagick” convert command is:

 exec("convert \"{$pdfWithPath}[0]\" -colorspace RGB -geometry {$width}x{$height} $thumbDirectory$thumb")

In English this means:
 Execute “Convert” on file’s (PDF) page 0 (remember it’s a 0 based index so page 1 is 0 and so on), set colour type to RGB, set width and height, output to thumbnail.

You can find out more on “ImageMagick” commands here: http://www.imagemagick.org/script/command-line-tools.php

The final part checks to see if the thumb was created and if there were errors. Two variables will then be sent for you to use within the template for checks. I avoided using error messages as I didn’t want error messages appearing everywhere and I have a fall-back image if the UDT fails.

Save this UDT as: create_thumb_from_pdf

To use the UDT in your template you will want to use the following smarty tag:

{create_thumb_from_pdf file="pdffile.pdf" thumb_ext="jpg" width=113 height=162 only_exts="pdf" not_exts=".jpg,.gif" overwrite=1}

You don’t need to use all the options available above apart from the file and thumb_ext options (and possibly the overwrite option).

The options use the following variable types:

  • file  - STRING
  • thumb_ext – STRING
  • width – INT – DEFAULT TO 100
  • height – INT – DEFAULT TO 100
  • only_exts = COMMA SEPERATED STRING
  • not_exts = COMMA SEPARATED STRING
  • overwrite – INT – DEFAULT TO 0

You can see below the results (with some overlay images for a page effect):

Example of Converted PDFs 

NOTE: In theory this UDT could be used to convert not just PDF’s but any image format, probably including PSD’s! To create an image of a different PDF page just change the [0] after {$pdfWithPath} in the “convert” command or you could create an image for every page by removing the [0] completely.

Also please test this UDT thoroughly and remember there are methods to improve this UDT’s security to help prevent against “directory traversal attacks”.

Comments