[blindlaw] Batch Recognition of OCR

Tai Tomasi ttomasi at driowa.org
Tue Sep 5 14:32:24 UTC 2017


I am hoping for something more than mere batch scans and saving of images. I would like the batch scans to preserve the original PDFs and make Word copies of those documents, or at the very least to make the original PDF documents into PDF/a (accessible PDF) files.
Ms. Tai Tomasi, J.D.
Pronouns: she/her/hers
Staff Attorney



400 East Court Ave., Ste. 300
Des Moines, Iowa 50309
Tel: 515-278-2502; Toll Free: 1-800-779-2502
FAX: 515-278-0539; Relay 711
E-mail: ttomasi at driowa.org
www.driowa.org

Our Mission:  To defend and promote the human and legal rights of Iowans with disabilities

CONFIDENTIALITY NOTICE

This e-mail and any attachments contain information from the law firm of Disability Rights Iowa and are intended solely for the use of the named recipient(s). This e-mail may contain privileged attorney-client communications or work product. Any dissemination by anyone other than an intended recipient is prohibited. If you are not a named recipient, you are prohibited from any further viewing of the e-mail or any attachments or from making any use of the e-mail or attachments. If you have received this e-mail in error, notify the sender immediately and delete the e-mail, any attachments, and all copies from any drives or storage media and destroy any printouts.



-----Original Message-----
From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Aser Tolentino via BlindLaw
Sent: Sunday, September 03, 2017 12:12 AM
To: steve.jacobson at visi.com; 'Blind Law Mailing List' <blindlaw at nfbnet.org>
Cc: Aser Tolentino <agtolentino at gmail.com>
Subject: Re: [blindlaw] Batch Recognition of OCR

Hello,

Unfortunately, it looks like OmniPage is no longer packaged with K1000 after 14.09. Batch scanning is explained in the manual on Page 63. I've pasted the relevant sections below.

Batch Scanning.

Batch scanning lets you scan a number of pages at once without recognizing or reading them. Instead, you can store them as image files, which are like snapshots of the original document, then perform the recognition process later.

Batch scanning saves time during the actual scanning process, as the system does not recognize each page as it is scanned. Since the recognition process is completely automated, Kurzweil 1000 can perform this step while the system is unattended.

To perform a batch scan using menus: 

1. Open the Settings menu and choose Scanning. In the tab page that appears, press TAB to go to the Mode option. Use the arrow keys to choose Image Scanning Only, then press ENTER.
2. Place your document on the scanner. Open the Scan menu and choose Start New Scan, or press the F9 key. Instead of scanning to documents that you can read, the system scans to image files.

3. When you have finished scanning to image files, you can perform any other tasks that you wish. You can change settings, read documents, and even leave the Kurzweil 1000 altogether. However, you cannot read the image files until the system has recognized them, as described in the next step.

4. Open the Settings menu and choose Scanning. On the Scanning tab page, press TAB to go to the Mode option. Use the arrow keys to choose Recognize Image Files, then press ENTER.

5. Open the Scan menu and choose Start New Scan, or press the F9 key. The system starts recognizing the image files in the Images folder, one after another. As each image file is recognized, it is deleted. Choose Start New Scan or press F9 again at any time to stop recognition.

If you stop recognizing at any point, you should save the current file. You can later reopen the saved file, return to Recognize Image File mode, and begin recognizing again from that point.

KOCRUtil for Automatic File Recognition.

If you have a multi-core processor on your machine, you can use KOCRUtil to recognize files and files in folders automatically and silently.

If the OCR engine can keep the recognized data for the entire document and then convert it, it will attempt to unify its formatting decisions so that the final document is more consistent.

Before using KOCRUtil, however, consider the tradeoffs. Corrections, for example, will not be applied to each page. You won't be able to edit or read the document as it is recognized. Bookmarks will not be captured for PDF files. And the resulting document won't be in KES format (though most of the choices will produce output that can be converted to KES by opening them within Kurzweil 1000).

To run KOCRUtil: 

1.	Select a file, a list of files, or a folder in Windows Explorer. 

2. Bring up the context menu, and select the appropriate menu item. For a folder, that menu item would be "Recognize Images with Kurzweil." For image files (either TIFF, PDF, JPEG, or PNG), you can pick either "Recognize Images with Kurzweil Automatically", or "Recognize Images with Kurzweil Interactively".

When you recognize images automatically, KOCRUtil.exe will run without bringing up a window. It will use current default settings to recognize the selected image files, or to recognize all of the image files found within the selected folder, and then exit. When it exits, you will hear a wave file "KOCRUtil.wav," if that file exists.

If you selected a folder and then activated the "Recognize Images with Kurzweil" menu item, KOCRUtil would look for all image files within that folder (but not, note within sub folders). These files would be organized into one or more group. Files are in the same group if their file names are identical except for digits. So, for example, "Image001.tif", "Image2.tif", and "Image43.tif" are all in the same group, but "Imagea3" is not. Groups of image files are sorted by their name, recognized together, and output into one resulting document.

Output file names are based on the name of the first image file in a group of image files, along with an extension that is appropriate for the output format. Depending upon settings, the output files can be in the same folder as the image files, or can be sent to a specified folder.

The default is for KOCRUtil to use FineReader Engine with English as the only recognition language, creating an RTF file that will be placed in the same folder as the image file or files.

To exit KOCRUtil:

Press Escape or TAB to the Exit button and press Enter.

To change KOCRUtil settings:

Either run KOCRUtil.exe without command line arguments, or use the "Recognize Images with Kurzweil Interactively" context menu. This will bring up KOCRUtil, which has a single dialog.

The dialog controls are described below in tab order. Where applicable, the mnemonic follows.

Image Files group has a text box, ALT+I, where you can specify one or more image files, separated by semicolons. There is also a Browse button which brings up a file open dialog so you can select the desired image files from your system.

Output File group has a text box, ALT+O and a Browse button. In the text box you specify the output file. Note that it can be blank, in which case the output file name is constructed using the first image file name. If no path is specified, the source folder will be used, or the default destination folder will be used, depending on that setting (see below). You can also click the Browse button to bring up a file save dialog in which you can specify the output file.

Format list box, ALT+O, lets you choose the format of the output file. The list of possible formats changes depending on the recognition engine used.

Note: As of October 2016, with a full install of K1000 V14.09 and above the FineReader will be used.


Details button, Alt+D brings up the Format Details dialog in which you can change format settings. The dialog contains: a Layout list (Alt+L) where you can opt to Retain Layout, Formatted Text, or Plain Text. Next is the Paper Size list (Alt+P); choose Automatic, A3, A4, A5, Letter, or Legal. The third list is labeled Pictures (Alt+C); choose to Remove Pictures, Low Resolution (for Web), Medium Resolution (for screen), High Resolution (for printing).
Four check boxes follow the lists. You can opt to Keep Page Breaks (Alt+G), Keep Line Breaks (Alt+N), Keep Text Color (Alt+T), and Keep Headers and Footers (Alt+H). By default, Kurzweil 1000 keeps Formatted Text for the layout, uses Automatic paper size selection, Removes Pictures, Keeps Page Breaks, Text Color, and Headers and Footers, but does not Keep Line Breaks.
These Format Details settings are retained for future sessions until you change them again.

Recognition Engine list box, ALT+R. Choose the recognition engine, FineReader Engine or OmniPage Engine.
Note: As of October 2016, the OmniPage Engine will no longer be available with a full install of Kurzweil 1000 V14.09 and above.

Recognition Languages list view, ALT+L. Check one or more of the recognition languages. The list changes depending on the recognition engine.
Note: As of October 2016, with a full install of K1000 V14.09 and above the FineReader language list will be used.


Start Recognition button, ALT+S. Use it to start recognition if everything else is set up properly.

The next three controls are in a group box labeled Default Destination.

Use Source Folder check box, ALT+U. If set, the folder of the image file will be used to specify the default destination folder (i.e., the folder used if none is specified explicitly along with the output file name).

Unlabeled text box. This is disabled if Use Source Folder is checked.
Otherwise, it allows you to specify a default destination folder.     
Browse button which will bring up a dialog that allows you to select a default destination folder.

Save Defaults button, ALT+V. Use it to save your current settings as default settings. Once you have done this, these are the settings that will be used when you choose to recognize a file or folder automatically.
Status, ALT+S is a read-only text box that tells you when recognition of a page is completed and will include recognition hints if you are using FineReader.

-----Original Message-----
From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Steve Jacobson via BlindLaw
Sent: Saturday, September 2, 2017 8:10 PM
To: 'Blind Law Mailing List' <blindlaw at nfbnet.org>
Cc: Steve Jacobson <steve.jacobson at visi.com>
Subject: Re: [blindlaw] Batch Recognition of OCR

Actually, Kurzweil 1,000 gives one the choice of using the FineReader engine or the Omnipage Engine.  The version I have lets one choose between FineReader 11.0 and OmniPage 19.0.

Best regards,

Steve Jacobson

-----Original Message-----
From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Aaron Cannon via BlindLaw
Sent: Saturday, September 02, 2017 3:56 PM
To: Andrew Webb <awebb2168 at gmail.com>
Cc: Aaron Cannon <cannona at fireantproductions.com>; Blind Law Mailing List <blindlaw at nfbnet.org>
Subject: Re: [blindlaw] Batch Recognition of OCR

It looks like K1000 uses the Finereader Engine under the covers, so it should still be pretty good.

Aaron

--
This message was sent from a mobile device


> On Sep 2, 2017, at 15:50, Andrew Webb <awebb2168 at gmail.com> wrote:
> 
> How does Kurzweil 1000 stack up against these other programs? Is it 
> considered obsolete by this point?
> 
> -----Original Message-----
> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Aaron 
> Cannon via BlindLaw
> Sent: Friday, September 01, 2017 4:50 PM
> To: Blind Law Mailing List
> Cc: Aaron Cannon
> Subject: Re: [blindlaw] Batch Recognition of OCR
> 
> I believe AbbYY Finereader Corporate (not Standard) version has this 
> capability. FineReader also tends to win in accuracy against OmniPage 
> in head-to-head tests.
> 
> Aaron
> 
> --
> This message was sent from a mobile device
> 
> 
>> On Sep 1, 2017, at 16:28, Singh, Nandini via BlindLaw
> <blindlaw at nfbnet.org> wrote:
>> 
>> I am not sure what program you have now, but I use Omni Page by 
>> Nuance,
> and I can convert 10-30 documents from PDF to Word or text depending 
> on
the
> size all in one go. I have tried to do more documents, but that really
slows
> down things. While the conversion  is running in the background, I can
still
> check email, review other documents, etc.
>> 
>> -----Original Message-----
>> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Tai
> Tomasi via BlindLaw
>> Sent: Friday, September 1, 2017 5:21 PM
>> To: Blind Law Mailing List
>> Cc: Tai Tomasi
>> Subject: [blindlaw] Batch Recognition of OCR
>> 
>> Hello all. I am looking for a program that will monitor a given 
>> folder
for
> new PDF files and convert inaccessible PDF files to accessible PDF
> (PDF/a) or Microsoft Word files. Does anyone know of a program that 
> can do this
type
> of automated batch OCR conversion? Right now, I have to initiate the 
> OCR process with a command for each document and rename the new 
> document to
the
> same filename as the original PDF with a .docx estension. This is not 
> an efficient use of my time. Thanks.
>> 
>> 
>> Ms. Tai Tomasi, J.D.
>> Pronouns: she/her/hers
>> Staff Attorney
>> 
>> [Description: DR%20IA%20LawCenter]
>> 
>> 400 East Court Ave., Ste. 300
>> Des Moines, Iowa 50309
>> Tel: 515-278-2502; Toll Free: 1-800-779-2502
>> FAX: 515-278-0539; Relay 711
>> E-mail: ttomasi at driowa.org<mailto:ttomasi at driowa.org>
>> www.driowa.org
>> 
>> Our Mission:  To defend and promote the human and legal rights of 
>> Iowans
> with disabilities
>> 
>> CONFIDENTIALITY NOTICE
>> 
>> This e-mail and any attachments contain information from the law firm 
>> of
> Disability Rights Iowa and are intended solely for the use of the 
> named recipient(s). This e-mail may contain privileged attorney-client 
> communications or work product. Any dissemination by anyone other than 
> an intended recipient is prohibited. If you are not a named recipient, 
> you
are
> prohibited from any further viewing of the e-mail or any attachments 
> or
from
> making any use of the e-mail or attachments. If you have received this 
> e-mail in error, notify the sender immediately and delete the e-mail, 
> any attachments, and all copies from any drives or storage media and 
> destroy
any
> printouts.
>> 
>> 
>> 
>> _______________________________________________
>> BlindLaw mailing list
>> BlindLaw at nfbnet.org
>> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
>> To unsubscribe, change your list options or get your account info for
> BlindLaw:
>> 
>
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/cannona%40fireantprodu
> ctions.com
> 
> _______________________________________________
> BlindLaw mailing list
> BlindLaw at nfbnet.org
> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> BlindLaw:
>
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/awebb2168%40gmail.com
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 

_______________________________________________
BlindLaw mailing list
BlindLaw at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
To unsubscribe, change your list options or get your account info for
BlindLaw:
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/steve.jacobson%40visi.
com



_______________________________________________
BlindLaw mailing list
BlindLaw at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
To unsubscribe, change your list options or get your account info for
BlindLaw:
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/agtolentino%40gmail.co
m


_______________________________________________
BlindLaw mailing list
BlindLaw at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
To unsubscribe, change your list options or get your account info for BlindLaw:
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/ttomasi%40driowa.org




More information about the BlindLaw mailing list