[blindlaw] Batch Recognition of OCR

Gerard Sadlier gerard.sadlier at gmail.com
Sun Sep 3 05:53:16 UTC 2017


Hi all

I'd like to set up Omnipage so that I could:
1. Paste say 100 pdfs into a folder;
2. Go away to allow Omnipage to process the documents; and
3. Return to find the 100 pdfs each processed into a text file or word
document etc.

If this is possible could someone tell me how?

Thanks

Ger

On 9/3/17, Aser Tolentino via BlindLaw <blindlaw at nfbnet.org> wrote:
> Hello,
>
> Unfortunately, it looks like OmniPage is no longer packaged with K1000
> after
> 14.09. Batch scanning is explained in the manual on Page 63. I've pasted
> the
> relevant sections below.
>
> Batch Scanning.
>
> Batch scanning lets you scan a number of pages at once without recognizing
> or reading them. Instead, you can store them as image files, which are like
> snapshots of the original document, then perform the recognition process
> later.
>
> Batch scanning saves time during the actual scanning process, as the system
> does not recognize each page as it is scanned. Since the recognition
> process
> is completely automated, Kurzweil 1000 can perform this step while the
> system is unattended.
>
> To perform a batch scan using menus:
>
> 1. Open the Settings menu and choose Scanning. In the tab page that
> appears,
> press TAB to go to the Mode option. Use the arrow keys to choose Image
> Scanning Only, then press ENTER.
> 2. Place your document on the scanner. Open the Scan menu and choose Start
> New Scan, or press the F9 key. Instead of scanning to documents that you
> can
> read, the system scans to image files.
>
> 3. When you have finished scanning to image files, you can perform any
> other
> tasks that you wish. You can change settings, read documents, and even
> leave
> the Kurzweil 1000 altogether. However, you cannot read the image files
> until
> the system has recognized them, as described in the next step.
>
> 4. Open the Settings menu and choose Scanning. On the Scanning tab page,
> press TAB to go to the Mode option. Use the arrow keys to choose Recognize
> Image Files, then press ENTER.
>
> 5. Open the Scan menu and choose Start New Scan, or press the F9 key. The
> system starts recognizing the image files in the Images folder, one after
> another. As each image file is recognized, it is deleted. Choose Start New
> Scan or press F9 again at any time to stop recognition.
>
> If you stop recognizing at any point, you should save the current file. You
> can later reopen the saved file, return to Recognize Image File mode, and
> begin recognizing again from that point.
>
> KOCRUtil for Automatic File Recognition.
>
> If you have a multi-core processor on your machine, you can use KOCRUtil to
> recognize files and files in folders automatically and silently.
>
> If the OCR engine can keep the recognized data for the entire document and
> then convert it, it will attempt to unify its formatting decisions so that
> the final document is more consistent.
>
> Before using KOCRUtil, however, consider the tradeoffs. Corrections, for
> example, will not be applied to each page. You won't be able to edit or
> read
> the document as it is recognized. Bookmarks will not be captured for PDF
> files. And the resulting document won't be in KES format (though most of
> the
> choices will produce output that can be converted to KES by opening them
> within Kurzweil 1000).
>
> To run KOCRUtil:
>
> 1.	Select a file, a list of files, or a folder in Windows Explorer.
>
> 2. Bring up the context menu, and select the appropriate menu item. For a
> folder, that menu item would be "Recognize Images with Kurzweil." For image
> files (either TIFF, PDF, JPEG, or PNG), you can pick either "Recognize
> Images with Kurzweil Automatically", or "Recognize Images with Kurzweil
> Interactively".
>
> When you recognize images automatically, KOCRUtil.exe will run without
> bringing up a window. It will use current default settings to recognize the
> selected image files, or to recognize all of the image files found within
> the selected folder, and then exit. When it exits, you will hear a wave
> file
> "KOCRUtil.wav," if that file exists.
>
> If you selected a folder and then activated the "Recognize Images with
> Kurzweil" menu item, KOCRUtil would look for all image files within that
> folder (but not, note within sub folders). These files would be organized
> into one or more group. Files are in the same group if their file names are
> identical except for digits. So, for example, "Image001.tif", "Image2.tif",
> and "Image43.tif" are all in the same group, but "Imagea3" is not. Groups
> of
> image files are sorted by their name, recognized together, and output into
> one resulting document.
>
> Output file names are based on the name of the first image file in a group
> of image files, along with an extension that is appropriate for the output
> format. Depending upon settings, the output files can be in the same folder
> as the image files, or can be sent to a specified folder.
>
> The default is for KOCRUtil to use FineReader Engine with English as the
> only recognition language, creating an RTF file that will be placed in the
> same folder as the image file or files.
>
> To exit KOCRUtil:
>
> Press Escape or TAB to the Exit button and press Enter.
>
> To change KOCRUtil settings:
>
> Either run KOCRUtil.exe without command line arguments, or use the
> "Recognize Images with Kurzweil Interactively" context menu. This will
> bring
> up KOCRUtil, which has a single dialog.
>
> The dialog controls are described below in tab order. Where applicable, the
> mnemonic follows.
>
> Image Files group has a text box, ALT+I, where you can specify one or more
> image files, separated by semicolons. There is also a Browse button which
> brings up a file open dialog so you can select the desired image files from
> your system.
>
> Output File group has a text box, ALT+O and a Browse button. In the text
> box
> you specify the output file. Note that it can be blank, in which case the
> output file name is constructed using the first image file name. If no path
> is specified, the source folder will be used, or the default destination
> folder will be used, depending on that setting (see below). You can also
> click the Browse button to bring up a file save dialog in which you can
> specify the output file.
>
> Format list box, ALT+O, lets you choose the format of the output file. The
> list of possible formats changes depending on the recognition engine used.
>
> Note: As of October 2016, with a full install of K1000 V14.09 and above the
> FineReader will be used.
>
>
> Details button, Alt+D brings up the Format Details dialog in which you can
> change format settings. The dialog contains: a Layout list (Alt+L) where
> you
> can opt to Retain Layout, Formatted Text, or Plain Text. Next is the Paper
> Size list (Alt+P); choose Automatic, A3, A4, A5, Letter, or Legal. The
> third
> list is labeled Pictures (Alt+C); choose to Remove Pictures, Low Resolution
> (for Web), Medium Resolution (for screen), High Resolution (for printing).
> Four check boxes follow the lists. You can opt to Keep Page Breaks (Alt+G),
> Keep Line Breaks (Alt+N), Keep Text Color (Alt+T), and Keep Headers and
> Footers (Alt+H). By default, Kurzweil 1000 keeps Formatted Text for the
> layout, uses Automatic paper size selection, Removes Pictures, Keeps Page
> Breaks, Text Color, and Headers and Footers, but does not Keep Line Breaks.
> These Format Details settings are retained for future sessions until you
> change them again.
>
> Recognition Engine list box, ALT+R. Choose the recognition engine,
> FineReader Engine or OmniPage Engine.
> Note: As of October 2016, the OmniPage Engine will no longer be available
> with a full install of Kurzweil 1000 V14.09 and above.
>
> Recognition Languages list view, ALT+L. Check one or more of the
> recognition
> languages. The list changes depending on the recognition engine.
> Note: As of October 2016, with a full install of K1000 V14.09 and above the
> FineReader language list will be used.
>
>
> Start Recognition button, ALT+S. Use it to start recognition if everything
> else is set up properly.
>
> The next three controls are in a group box labeled Default Destination.
>
> Use Source Folder check box, ALT+U. If set, the folder of the image file
> will be used to specify the default destination folder (i.e., the folder
> used if none is specified explicitly along with the output file name).
>
> Unlabeled text box. This is disabled if Use Source Folder is checked.
> Otherwise, it allows you to specify a default destination folder.
> Browse button which will bring up a dialog that allows you to select a
> default destination folder.
>
> Save Defaults button, ALT+V. Use it to save your current settings as
> default
> settings. Once you have done this, these are the settings that will be used
> when you choose to recognize a file or folder automatically.
> Status, ALT+S is a read-only text box that tells you when recognition of a
> page is completed and will include recognition hints if you are using
> FineReader.
>
> -----Original Message-----
> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Steve
> Jacobson via BlindLaw
> Sent: Saturday, September 2, 2017 8:10 PM
> To: 'Blind Law Mailing List' <blindlaw at nfbnet.org>
> Cc: Steve Jacobson <steve.jacobson at visi.com>
> Subject: Re: [blindlaw] Batch Recognition of OCR
>
> Actually, Kurzweil 1,000 gives one the choice of using the FineReader
> engine
> or the Omnipage Engine.  The version I have lets one choose between
> FineReader 11.0 and OmniPage 19.0.
>
> Best regards,
>
> Steve Jacobson
>
> -----Original Message-----
> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Aaron
> Cannon via BlindLaw
> Sent: Saturday, September 02, 2017 3:56 PM
> To: Andrew Webb <awebb2168 at gmail.com>
> Cc: Aaron Cannon <cannona at fireantproductions.com>; Blind Law Mailing List
> <blindlaw at nfbnet.org>
> Subject: Re: [blindlaw] Batch Recognition of OCR
>
> It looks like K1000 uses the Finereader Engine under the covers, so it
> should still be pretty good.
>
> Aaron
>
> --
> This message was sent from a mobile device
>
>
>> On Sep 2, 2017, at 15:50, Andrew Webb <awebb2168 at gmail.com> wrote:
>>
>> How does Kurzweil 1000 stack up against these other programs? Is it
>> considered obsolete by this point?
>>
>> -----Original Message-----
>> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Aaron
>> Cannon via BlindLaw
>> Sent: Friday, September 01, 2017 4:50 PM
>> To: Blind Law Mailing List
>> Cc: Aaron Cannon
>> Subject: Re: [blindlaw] Batch Recognition of OCR
>>
>> I believe AbbYY Finereader Corporate (not Standard) version has this
>> capability. FineReader also tends to win in accuracy against OmniPage
>> in head-to-head tests.
>>
>> Aaron
>>
>> --
>> This message was sent from a mobile device
>>
>>
>>> On Sep 1, 2017, at 16:28, Singh, Nandini via BlindLaw
>> <blindlaw at nfbnet.org> wrote:
>>>
>>> I am not sure what program you have now, but I use Omni Page by
>>> Nuance,
>> and I can convert 10-30 documents from PDF to Word or text depending
>> on
> the
>> size all in one go. I have tried to do more documents, but that really
> slows
>> down things. While the conversion  is running in the background, I can
> still
>> check email, review other documents, etc.
>>>
>>> -----Original Message-----
>>> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Tai
>> Tomasi via BlindLaw
>>> Sent: Friday, September 1, 2017 5:21 PM
>>> To: Blind Law Mailing List
>>> Cc: Tai Tomasi
>>> Subject: [blindlaw] Batch Recognition of OCR
>>>
>>> Hello all. I am looking for a program that will monitor a given
>>> folder
> for
>> new PDF files and convert inaccessible PDF files to accessible PDF
>> (PDF/a) or Microsoft Word files. Does anyone know of a program that
>> can do this
> type
>> of automated batch OCR conversion? Right now, I have to initiate the
>> OCR process with a command for each document and rename the new
>> document to
> the
>> same filename as the original PDF with a .docx estension. This is not
>> an efficient use of my time. Thanks.
>>>
>>>
>>> Ms. Tai Tomasi, J.D.
>>> Pronouns: she/her/hers
>>> Staff Attorney
>>>
>>> [Description: DR%20IA%20LawCenter]
>>>
>>> 400 East Court Ave., Ste. 300
>>> Des Moines, Iowa 50309
>>> Tel: 515-278-2502; Toll Free: 1-800-779-2502
>>> FAX: 515-278-0539; Relay 711
>>> E-mail: ttomasi at driowa.org<mailto:ttomasi at driowa.org>
>>> www.driowa.org
>>>
>>> Our Mission:  To defend and promote the human and legal rights of
>>> Iowans
>> with disabilities
>>>
>>> CONFIDENTIALITY NOTICE
>>>
>>> This e-mail and any attachments contain information from the law firm
>>> of
>> Disability Rights Iowa and are intended solely for the use of the
>> named recipient(s). This e-mail may contain privileged attorney-client
>> communications or work product. Any dissemination by anyone other than
>> an intended recipient is prohibited. If you are not a named recipient,
>> you
> are
>> prohibited from any further viewing of the e-mail or any attachments
>> or
> from
>> making any use of the e-mail or attachments. If you have received this
>> e-mail in error, notify the sender immediately and delete the e-mail,
>> any attachments, and all copies from any drives or storage media and
>> destroy
> any
>> printouts.
>>>
>>>
>>>
>>> _______________________________________________
>>> BlindLaw mailing list
>>> BlindLaw at nfbnet.org
>>> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
>>> To unsubscribe, change your list options or get your account info for
>> BlindLaw:
>>>
>>
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/cannona%40fireantprodu
>> ctions.com
>>
>> _______________________________________________
>> BlindLaw mailing list
>> BlindLaw at nfbnet.org
>> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
>> To unsubscribe, change your list options or get your account info for
>> BlindLaw:
>>
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/awebb2168%40gmail.com
>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>
> _______________________________________________
> BlindLaw mailing list
> BlindLaw at nfbnet.org
> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> BlindLaw:
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/steve.jacobson%40visi.
> com
>
>
>
> _______________________________________________
> BlindLaw mailing list
> BlindLaw at nfbnet.org
> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> BlindLaw:
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/agtolentino%40gmail.co
> m
>
>
> _______________________________________________
> BlindLaw mailing list
> BlindLaw at nfbnet.org
> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> BlindLaw:
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/gerard.sadlier%40gmail.com
>




More information about the BlindLaw mailing list