[blindlaw] Batch Recognition of OCR

Aser Tolentino agtolentino at gmail.com
Sun Sep 3 21:35:43 UTC 2017


The process works the same for PDFs as it does for images. Put the PDFs you
want to OCR in a folder, highlight that folder and bring up the context menu
and select "Recognize Images with Kurzweil." If you want more control,
select the "Recognize Images with Kurzweil Interactively," option to set
what kind of file it produces. Mine defaults to RTF.

-----Original Message-----
From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of
sy.hoekstra--- via BlindLaw
Sent: Sunday, September 3, 2017 12:52 PM
To: 'Blind Law Mailing List' <blindlaw at nfbnet.org>
Cc: sy.hoekstra at gmail.com
Subject: Re: [blindlaw] Batch Recognition of OCR

Yeah, sorry, this is also what I would like to know.  I read batch scanning
before and thought we were talking about batch OCR conversion. It would seem
like batch OCR conversion was doable with current tech.  I just don't know
how.

-----Original Message-----
From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Gerard
Sadlier via BlindLaw
Sent: Sunday, September 3, 2017 1:53 AM
To: Blind Law Mailing List <blindlaw at nfbnet.org>
Cc: Gerard Sadlier <gerard.sadlier at gmail.com>
Subject: Re: [blindlaw] Batch Recognition of OCR

Hi all

I'd like to set up Omnipage so that I could:
1. Paste say 100 pdfs into a folder;
2. Go away to allow Omnipage to process the documents; and 3. Return to find
the 100 pdfs each processed into a text file or word document etc.

If this is possible could someone tell me how?

Thanks

Ger

On 9/3/17, Aser Tolentino via BlindLaw <blindlaw at nfbnet.org> wrote:
> Hello,
>
> Unfortunately, it looks like OmniPage is no longer packaged with K1000 
> after 14.09. Batch scanning is explained in the manual on Page 63.
> I've pasted the relevant sections below.
>
> Batch Scanning.
>
> Batch scanning lets you scan a number of pages at once without 
> recognizing or reading them. Instead, you can store them as image 
> files, which are like snapshots of the original document, then perform 
> the recognition process later.
>
> Batch scanning saves time during the actual scanning process, as the 
> system does not recognize each page as it is scanned. Since the 
> recognition process is completely automated, Kurzweil 1000 can perform 
> this step while the system is unattended.
>
> To perform a batch scan using menus:
>
> 1. Open the Settings menu and choose Scanning. In the tab page that 
> appears, press TAB to go to the Mode option. Use the arrow keys to 
> choose Image Scanning Only, then press ENTER.
> 2. Place your document on the scanner. Open the Scan menu and choose 
> Start New Scan, or press the F9 key. Instead of scanning to documents 
> that you can read, the system scans to image files.
>
> 3. When you have finished scanning to image files, you can perform any 
> other tasks that you wish. You can change settings, read documents, 
> and even leave the Kurzweil 1000 altogether. However, you cannot read 
> the image files until the system has recognized them, as described in 
> the next step.
>
> 4. Open the Settings menu and choose Scanning. On the Scanning tab 
> page, press TAB to go to the Mode option. Use the arrow keys to choose 
> Recognize Image Files, then press ENTER.
>
> 5. Open the Scan menu and choose Start New Scan, or press the F9 key. 
> The system starts recognizing the image files in the Images folder, 
> one after another. As each image file is recognized, it is deleted.
> Choose Start New Scan or press F9 again at any time to stop recognition.
>
> If you stop recognizing at any point, you should save the current 
> file. You can later reopen the saved file, return to Recognize Image 
> File mode, and begin recognizing again from that point.
>
> KOCRUtil for Automatic File Recognition.
>
> If you have a multi-core processor on your machine, you can use 
> KOCRUtil to recognize files and files in folders automatically and
silently.
>
> If the OCR engine can keep the recognized data for the entire document 
> and then convert it, it will attempt to unify its formatting decisions 
> so that the final document is more consistent.
>
> Before using KOCRUtil, however, consider the tradeoffs. Corrections, 
> for example, will not be applied to each page. You won't be able to 
> edit or read the document as it is recognized. Bookmarks will not be 
> captured for PDF files. And the resulting document won't be in KES 
> format (though most of the choices will produce output that can be 
> converted to KES by opening them within Kurzweil 1000).
>
> To run KOCRUtil:
>
> 1.	Select a file, a list of files, or a folder in Windows Explorer.
>
> 2. Bring up the context menu, and select the appropriate menu item. 
> For a folder, that menu item would be "Recognize Images with 
> Kurzweil." For image files (either TIFF, PDF, JPEG, or PNG), you can 
> pick either "Recognize Images with Kurzweil Automatically", or 
> "Recognize Images with Kurzweil Interactively".
>
> When you recognize images automatically, KOCRUtil.exe will run without 
> bringing up a window. It will use current default settings to 
> recognize the selected image files, or to recognize all of the image 
> files found within the selected folder, and then exit. When it exits, 
> you will hear a wave file "KOCRUtil.wav," if that file exists.
>
> If you selected a folder and then activated the "Recognize Images with 
> Kurzweil" menu item, KOCRUtil would look for all image files within 
> that folder (but not, note within sub folders). These files would be 
> organized into one or more group. Files are in the same group if their 
> file names are identical except for digits. So, for example, 
> "Image001.tif", "Image2.tif", and "Image43.tif" are all in the same 
> group, but "Imagea3" is not. Groups of image files are sorted by their 
> name, recognized together, and output into one resulting document.
>
> Output file names are based on the name of the first image file in a 
> group of image files, along with an extension that is appropriate for 
> the output format. Depending upon settings, the output files can be in 
> the same folder as the image files, or can be sent to a specified folder.
>
> The default is for KOCRUtil to use FineReader Engine with English as 
> the only recognition language, creating an RTF file that will be 
> placed in the same folder as the image file or files.
>
> To exit KOCRUtil:
>
> Press Escape or TAB to the Exit button and press Enter.
>
> To change KOCRUtil settings:
>
> Either run KOCRUtil.exe without command line arguments, or use the 
> "Recognize Images with Kurzweil Interactively" context menu. This will 
> bring up KOCRUtil, which has a single dialog.
>
> The dialog controls are described below in tab order. Where 
> applicable, the mnemonic follows.
>
> Image Files group has a text box, ALT+I, where you can specify one or 
> more image files, separated by semicolons. There is also a Browse 
> button which brings up a file open dialog so you can select the 
> desired image files from your system.
>
> Output File group has a text box, ALT+O and a Browse button. In the 
> text box you specify the output file. Note that it can be blank, in 
> which case the output file name is constructed using the first image 
> file name. If no path is specified, the source folder will be used, or 
> the default destination folder will be used, depending on that setting 
> (see below). You can also click the Browse button to bring up a file 
> save dialog in which you can specify the output file.
>
> Format list box, ALT+O, lets you choose the format of the output file. 
> The list of possible formats changes depending on the recognition 
> engine
used.
>
> Note: As of October 2016, with a full install of K1000 V14.09 and 
> above the FineReader will be used.
>
>
> Details button, Alt+D brings up the Format Details dialog in which you 
> can change format settings. The dialog contains: a Layout list (Alt+L) 
> where you can opt to Retain Layout, Formatted Text, or Plain Text.
> Next is the Paper Size list (Alt+P); choose Automatic, A3, A4, A5, 
> Letter, or Legal. The third list is labeled Pictures (Alt+C); choose 
> to Remove Pictures, Low Resolution (for Web), Medium Resolution (for 
> screen), High Resolution (for printing).
> Four check boxes follow the lists. You can opt to Keep Page Breaks 
> (Alt+G), Keep Line Breaks (Alt+N), Keep Text Color (Alt+T), and Keep 
> Headers and Footers (Alt+H). By default, Kurzweil 1000 keeps Formatted 
> Text for the layout, uses Automatic paper size selection, Removes 
> Pictures, Keeps Page Breaks, Text Color, and Headers and Footers, but 
> does
not Keep Line Breaks.
> These Format Details settings are retained for future sessions until 
> you change them again.
>
> Recognition Engine list box, ALT+R. Choose the recognition engine, 
> FineReader Engine or OmniPage Engine.
> Note: As of October 2016, the OmniPage Engine will no longer be 
> available with a full install of Kurzweil 1000 V14.09 and above.
>
> Recognition Languages list view, ALT+L. Check one or more of the 
> recognition languages. The list changes depending on the recognition 
> engine.
> Note: As of October 2016, with a full install of K1000 V14.09 and 
> above the FineReader language list will be used.
>
>
> Start Recognition button, ALT+S. Use it to start recognition if 
> everything else is set up properly.
>
> The next three controls are in a group box labeled Default Destination.
>
> Use Source Folder check box, ALT+U. If set, the folder of the image 
> file will be used to specify the default destination folder (i.e., the 
> folder used if none is specified explicitly along with the output file
name).
>
> Unlabeled text box. This is disabled if Use Source Folder is checked.
> Otherwise, it allows you to specify a default destination folder.
> Browse button which will bring up a dialog that allows you to select a 
> default destination folder.
>
> Save Defaults button, ALT+V. Use it to save your current settings as 
> default settings. Once you have done this, these are the settings that 
> will be used when you choose to recognize a file or folder 
> automatically.
> Status, ALT+S is a read-only text box that tells you when recognition 
> of a page is completed and will include recognition hints if you are 
> using FineReader.
>
> -----Original Message-----
> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Steve 
> Jacobson via BlindLaw
> Sent: Saturday, September 2, 2017 8:10 PM
> To: 'Blind Law Mailing List' <blindlaw at nfbnet.org>
> Cc: Steve Jacobson <steve.jacobson at visi.com>
> Subject: Re: [blindlaw] Batch Recognition of OCR
>
> Actually, Kurzweil 1,000 gives one the choice of using the FineReader 
> engine or the Omnipage Engine.  The version I have lets one choose 
> between FineReader 11.0 and OmniPage 19.0.
>
> Best regards,
>
> Steve Jacobson
>
> -----Original Message-----
> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Aaron 
> Cannon via BlindLaw
> Sent: Saturday, September 02, 2017 3:56 PM
> To: Andrew Webb <awebb2168 at gmail.com>
> Cc: Aaron Cannon <cannona at fireantproductions.com>; Blind Law Mailing 
> List <blindlaw at nfbnet.org>
> Subject: Re: [blindlaw] Batch Recognition of OCR
>
> It looks like K1000 uses the Finereader Engine under the covers, so it 
> should still be pretty good.
>
> Aaron
>
> --
> This message was sent from a mobile device
>
>
>> On Sep 2, 2017, at 15:50, Andrew Webb <awebb2168 at gmail.com> wrote:
>>
>> How does Kurzweil 1000 stack up against these other programs? Is it 
>> considered obsolete by this point?
>>
>> -----Original Message-----
>> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of 
>> Aaron Cannon via BlindLaw
>> Sent: Friday, September 01, 2017 4:50 PM
>> To: Blind Law Mailing List
>> Cc: Aaron Cannon
>> Subject: Re: [blindlaw] Batch Recognition of OCR
>>
>> I believe AbbYY Finereader Corporate (not Standard) version has this 
>> capability. FineReader also tends to win in accuracy against OmniPage 
>> in head-to-head tests.
>>
>> Aaron
>>
>> --
>> This message was sent from a mobile device
>>
>>
>>> On Sep 1, 2017, at 16:28, Singh, Nandini via BlindLaw
>> <blindlaw at nfbnet.org> wrote:
>>>
>>> I am not sure what program you have now, but I use Omni Page by 
>>> Nuance,
>> and I can convert 10-30 documents from PDF to Word or text depending 
>> on
> the
>> size all in one go. I have tried to do more documents, but that 
>> really
> slows
>> down things. While the conversion  is running in the background, I 
>> can
> still
>> check email, review other documents, etc.
>>>
>>> -----Original Message-----
>>> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Tai
>> Tomasi via BlindLaw
>>> Sent: Friday, September 1, 2017 5:21 PM
>>> To: Blind Law Mailing List
>>> Cc: Tai Tomasi
>>> Subject: [blindlaw] Batch Recognition of OCR
>>>
>>> Hello all. I am looking for a program that will monitor a given 
>>> folder
> for
>> new PDF files and convert inaccessible PDF files to accessible PDF
>> (PDF/a) or Microsoft Word files. Does anyone know of a program that 
>> can do this
> type
>> of automated batch OCR conversion? Right now, I have to initiate the 
>> OCR process with a command for each document and rename the new 
>> document to
> the
>> same filename as the original PDF with a .docx estension. This is not 
>> an efficient use of my time. Thanks.
>>>
>>>
>>> Ms. Tai Tomasi, J.D.
>>> Pronouns: she/her/hers
>>> Staff Attorney
>>>
>>> [Description: DR%20IA%20LawCenter]
>>>
>>> 400 East Court Ave., Ste. 300
>>> Des Moines, Iowa 50309
>>> Tel: 515-278-2502; Toll Free: 1-800-779-2502
>>> FAX: 515-278-0539; Relay 711
>>> E-mail: ttomasi at driowa.org<mailto:ttomasi at driowa.org>
>>> www.driowa.org
>>>
>>> Our Mission:  To defend and promote the human and legal rights of 
>>> Iowans
>> with disabilities
>>>
>>> CONFIDENTIALITY NOTICE
>>>
>>> This e-mail and any attachments contain information from the law 
>>> firm of
>> Disability Rights Iowa and are intended solely for the use of the 
>> named recipient(s). This e-mail may contain privileged 
>> attorney-client communications or work product. Any dissemination by 
>> anyone other than an intended recipient is prohibited. If you are not 
>> a named recipient, you
> are
>> prohibited from any further viewing of the e-mail or any attachments 
>> or
> from
>> making any use of the e-mail or attachments. If you have received 
>> this e-mail in error, notify the sender immediately and delete the 
>> e-mail, any attachments, and all copies from any drives or storage 
>> media and destroy
> any
>> printouts.
>>>
>>>
>>>
>>> _______________________________________________
>>> BlindLaw mailing list
>>> BlindLaw at nfbnet.org
>>> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
>>> To unsubscribe, change your list options or get your account info 
>>> for
>> BlindLaw:
>>>
>>
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/cannona%40firean
> tprodu
>> ctions.com
>>
>> _______________________________________________
>> BlindLaw mailing list
>> BlindLaw at nfbnet.org
>> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
>> To unsubscribe, change your list options or get your account info for
>> BlindLaw:
>>
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/awebb2168%40gmai
> l.com
>>
>>
>> ---
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>
> _______________________________________________
> BlindLaw mailing list
> BlindLaw at nfbnet.org
> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> BlindLaw:
>
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/steve.jacobson%40visi.
> com
>
>
>
> _______________________________________________
> BlindLaw mailing list
> BlindLaw at nfbnet.org
> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> BlindLaw:
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/agtolentino%40gm
> ail.co
> m
>
>
> _______________________________________________
> BlindLaw mailing list
> BlindLaw at nfbnet.org
> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> BlindLaw:
> http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/gerard.sadlier%4
> 0gmail.com
>

_______________________________________________
BlindLaw mailing list
BlindLaw at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
To unsubscribe, change your list options or get your account info for
BlindLaw:
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/sy.hoekstra%40gmail.co
m


_______________________________________________
BlindLaw mailing list
BlindLaw at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
To unsubscribe, change your list options or get your account info for
BlindLaw:
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/agtolentino%40gmail.co
m





More information about the BlindLaw mailing list