Scanning Manuals

Overview

Hey everyone … I figured I’d share with you guys the process I use to scan documentation / manuals / schematics. This post assumed you have knowledge of the Linux command line. I do this for several reasons:

  • I often have to deal with old and outdated equipment
  • Paper copies get lost
  • It lets people read documentation remotely
  • OCR’d, and therefore searchable manuals are more useful

The process

I start by using an Epson GT-S80 scanner. Yes, it is expensive, however, it’s also very fast. This scans in duplex, meaning the front and back side of each sheet is scanned in a single pass, automatically. I scan at 400DPI, and set the filename to use a counter, starting from 1. So, the convention starts at img_0001.tif, and goes on from there.

I save all of the files in the TIFF format, which is a lossless file format.

Once a have the files scanned, I process them on a Linux box. This allows me to script removing the hole-punch marks, as well as compress the resulting TIFF files using LZW. Uncompressed TIFFs are huge, and when you have large manuals, this is significant. Furthermore, these images tend to benefit greatly from compression.

The Script

The script starts by using the Linux “find” command, and finding all files ending with .tif. It runs a secondary script, which then compresses each file, and blanks out the left or right portion of the page (depending on which side the binding is on), which is based on whether the number in the filename is even or odd (right or left, respectively). find-ims.sh is intended to be run inside of the current directory with all of your images, and expects to be able to put the resulting output in the “./out” subdirectory of where your files are located.

Requirements

The code is not necessarily pretty, and will likely have to be modified to suit your needs. It assumes you have preexisting knowledge of Linux, and the command line. These things are beyond the scope of this article. You’ll also need the following packages. In CentOS 7, the following should satisfy dependencies:

yum install ImageMagick libtiff-tools -y

The code

find-ims.sh

#!/bin/sh
find '.' -maxdepth 1 -type f -iname "*.tif"  -exec /root/conv-ims.sh {} \;

conv-ims.sh

#!/bin/sh
outdir=`pwd`/out
#horizontal=240
#vertical=150

odd_horizontal=235
odd_vertical=118

even_horizontal=285
even_vertical=110

bgcolor=white

echo $1
if [ -f "$1" ]; then
 #echo "file"
 #exit 0
 newname=cmp_`basename "$1"`
 tiffcp -c lzw "$1" "${newname}"
 exit 0
 #try to get the page number, so we can determine whether the binding that we're trying to strip is on the left of the right side.
 num=`echo $1 | sed 's/.*_//' | sed 's/\..*//'| sed 's/^0*//'`
 echo $num
 if [ $((num%2)) -eq 0 ]; then
 echo "Even Number Page - Strip Right"
 #convert -gravity SouthEast -chop 120x65 $1 $outdir/`basename $1`
 convert -background ${bgcolor} -gravity SouthEast -chop ${even_horizontal}x${even_vertical} -splice ${even_horizontal}x${even_vertical} ${newname} "$outdir/`basename $1`"
 else
 echo "Odd Number Page - Strip Left"
 #convert -gravity SouthWest -chop 120x65 $1 $outdir/`basename $1`
 convert -background ${bgcolor} -gravity SouthWest -chop ${odd_horizontal}x${odd_vertical} -splice ${odd_horizontal}x${odd_vertical} ${newname} "$outdir/`basename $1`"
 fi

else
 echo "no file"
fi

echo -e "\n"

PDF Generation

I use Adobe Acrobat to do this. This is great because it will automatically deskew and OCR the source images.

Printing & Binding (Optional)

So, killing trees isn’t an ideal thing, but sometimes, print copies are useful. To do this, I use a GBC Comb Binder. If you print large amounts of these, punching all of these sheets may make you crazy… In this case, prepunched is the way to go. Finally, some binding combs are good:

Helpful tips

Please Note

We use amazon referral links to any tools that we have personally vetted. That means we have them, use them ourselves, and can recommend them strongly to others. We have absolutely no interest in advertising any tools or equipment that we do not own nor have experience with, have not personally vouched for, and have no experience with. Furthermore, it would likely be illegal for us to do so, given the non-profit model of our space. Proceeds from this are used for a general fund, helping pay for more tools, and equipment, used within the community.