[blindlaw] Batch Recognition of OCR

Aser Tolentino agtolentino at gmail.com
Sun Sep 3 05:11:35 UTC 2017


Hello,

Unfortunately, it looks like OmniPage is no longer packaged with K1000 after
14.09. Batch scanning is explained in the manual on Page 63. I've pasted the
relevant sections below.

Batch Scanning.

Batch scanning lets you scan a number of pages at once without recognizing
or reading them. Instead, you can store them as image files, which are like
snapshots of the original document, then perform the recognition process
later.

Batch scanning saves time during the actual scanning process, as the system
does not recognize each page as it is scanned. Since the recognition process
is completely automated, Kurzweil 1000 can perform this step while the
system is unattended.

To perform a batch scan using menus: 

1. Open the Settings menu and choose Scanning. In the tab page that appears,
press TAB to go to the Mode option. Use the arrow keys to choose Image
Scanning Only, then press ENTER.
2. Place your document on the scanner. Open the Scan menu and choose Start
New Scan, or press the F9 key. Instead of scanning to documents that you can
read, the system scans to image files.

3. When you have finished scanning to image files, you can perform any other
tasks that you wish. You can change settings, read documents, and even leave
the Kurzweil 1000 altogether. However, you cannot read the image files until
the system has recognized them, as described in the next step.

4. Open the Settings menu and choose Scanning. On the Scanning tab page,
press TAB to go to the Mode option. Use the arrow keys to choose Recognize
Image Files, then press ENTER.

5. Open the Scan menu and choose Start New Scan, or press the F9 key. The
system starts recognizing the image files in the Images folder, one after
another. As each image file is recognized, it is deleted. Choose Start New
Scan or press F9 again at any time to stop recognition.

If you stop recognizing at any point, you should save the current file. You
can later reopen the saved file, return to Recognize Image File mode, and
begin recognizing again from that point.

KOCRUtil for Automatic File Recognition.

If you have a multi-core processor on your machine, you can use KOCRUtil to
recognize files and files in folders automatically and silently.

If the OCR engine can keep the recognized data for the entire document and
then convert it, it will attempt to unify its formatting decisions so that
the final document is more consistent.

Before using KOCRUtil, however, consider the tradeoffs. Corrections, for
example, will not be applied to each page. You won't be able to edit or read
the document as it is recognized. Bookmarks will not be captured for PDF
files. And the resulting document won't be in KES format (though most of the
choices will produce output that can be converted to KES by opening them
within Kurzweil 1000).

To run KOCRUtil: 

1.	Select a file, a list of files, or a folder in Windows Explorer. 

2. Bring up the context menu, and select the appropriate menu item. For a
folder, that menu item would be "Recognize Images with Kurzweil." For image
files (either TIFF, PDF, JPEG, or PNG), you can pick either "Recognize
Images with Kurzweil Automatically", or "Recognize Images with Kurzweil
Interactively".

When you recognize images automatically, KOCRUtil.exe will run without
bringing up a window. It will use current default settings to recognize the
selected image files, or to recognize all of the image files found within
the selected folder, and then exit. When it exits, you will hear a wave file
"KOCRUtil.wav," if that file exists.

If you selected a folder and then activated the "Recognize Images with
Kurzweil" menu item, KOCRUtil would look for all image files within that
folder (but not, note within sub folders). These files would be organized
into one or more group. Files are in the same group if their file names are
identical except for digits. So, for example, "Image001.tif", "Image2.tif",
and "Image43.tif" are all in the same group, but "Imagea3" is not. Groups of
image files are sorted by their name, recognized together, and output into
one resulting document.

Output file names are based on the name of the first image file in a group
of image files, along with an extension that is appropriate for the output
format. Depending upon settings, the output files can be in the same folder
as the image files, or can be sent to a specified folder.

The default is for KOCRUtil to use FineReader Engine with English as the
only recognition language, creating an RTF file that will be placed in the
same folder as the image file or files.

To exit KOCRUtil:

Press Escape or TAB to the Exit button and press Enter.

To change KOCRUtil settings:

Either run KOCRUtil.exe without command line arguments, or use the
"Recognize Images with Kurzweil Interactively" context menu. This will bring
up KOCRUtil, which has a single dialog.

The dialog controls are described below in tab order. Where applicable, the
mnemonic follows.

Image Files group has a text box, ALT+I, where you can specify one or more
image files, separated by semicolons. There is also a Browse button which
brings up a file open dialog so you can select the desired image files from
your system.

Output File group has a text box, ALT+O and a Browse button. In the text box
you specify the output file. Note that it can be blank, in which case the
output file name is constructed using the first image file name. If no path
is specified, the source folder will be used, or the default destination
folder will be used, depending on that setting (see below). You can also
click the Browse button to bring up a file save dialog in which you can
specify the output file.

Format list box, ALT+O, lets you choose the format of the output file. The
list of possible formats changes depending on the recognition engine used.

Note: As of October 2016, with a full install of K1000 V14.09 and above the
FineReader will be used.


Details button, Alt+D brings up the Format Details dialog in which you can
change format settings. The dialog contains: a Layout list (Alt+L) where you
can opt to Retain Layout, Formatted Text, or Plain Text. Next is the Paper
Size list (Alt+P); choose Automatic, A3, A4, A5, Letter, or Legal. The third
list is labeled Pictures (Alt+C); choose to Remove Pictures, Low Resolution
(for Web), Medium Resolution (for screen), High Resolution (for printing).
Four check boxes follow the lists. You can opt to Keep Page Breaks (Alt+G),
Keep Line Breaks (Alt+N), Keep Text Color (Alt+T), and Keep Headers and
Footers (Alt+H). By default, Kurzweil 1000 keeps Formatted Text for the
layout, uses Automatic paper size selection, Removes Pictures, Keeps Page
Breaks, Text Color, and Headers and Footers, but does not Keep Line Breaks.
These Format Details settings are retained for future sessions until you
change them again.

Recognition Engine list box, ALT+R. Choose the recognition engine,
FineReader Engine or OmniPage Engine.
Note: As of October 2016, the OmniPage Engine will no longer be available
with a full install of Kurzweil 1000 V14.09 and above.

Recognition Languages list view, ALT+L. Check one or more of the recognition
languages. The list changes depending on the recognition engine.
Note: As of October 2016, with a full install of K1000 V14.09 and above the
FineReader language list will be used.


Start Recognition button, ALT+S. Use it to start recognition if everything
else is set up properly.

The next three controls are in a group box labeled Default Destination.

Use Source Folder check box, ALT+U. If set, the folder of the image file
will be used to specify the default destination folder (i.e., the folder
used if none is specified explicitly along with the output file name).

Unlabeled text box. This is disabled if Use Source Folder is checked.
Otherwise, it allows you to specify a default destination folder.     
Browse button which will bring up a dialog that allows you to select a
default destination folder.

Save Defaults button, ALT+V. Use it to save your current settings as default
settings. Once you have done this, these are the settings that will be used
when you choose to recognize a file or folder automatically.
Status, ALT+S is a read-only text box that tells you when recognition of a
page is completed and will include recognition hints if you are using
FineReader.

-----Original Message-----
From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Steve
Jacobson via BlindLaw
Sent: Saturday, September 2, 2017 8:10 PM
To: 'Blind Law Mailing List' <blindlaw at nfbnet.org>
Cc: Steve Jacobson <steve.jacobson at visi.com>
Subject: Re: [blindlaw] Batch Recognition of OCR

Actually, Kurzweil 1,000 gives one the choice of using the FineReader engine
or the Omnipage Engine.  The version I have lets one choose between
FineReader 11.0 and OmniPage 19.0.

Best regards,

Steve Jacobson

-----Original Message-----
From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Aaron
Cannon via BlindLaw
Sent: Saturday, September 02, 2017 3:56 PM
To: Andrew Webb <awebb2168 at gmail.com>
Cc: Aaron Cannon <cannona at fireantproductions.com>; Blind Law Mailing List
<blindlaw at nfbnet.org>
Subject: Re: [blindlaw] Batch Recognition of OCR

It looks like K1000 uses the Finereader Engine under the covers, so it
should still be pretty good.

Aaron

--
This message was sent from a mobile device


> On Sep 2, 2017, at 15:50, Andrew Webb <awebb2168 at gmail.com> wrote:
> 
> How does Kurzweil 1000 stack up against these other programs? Is it 
> considered obsolete by this point?
> 
> -----Original Message-----
> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Aaron 
> Cannon via BlindLaw
> Sent: Friday, September 01, 2017 4:50 PM
> To: Blind Law Mailing List
> Cc: Aaron Cannon
> Subject: Re: [blindlaw] Batch Recognition of OCR
> 
> I believe AbbYY Finereader Corporate (not Standard) version has this 
> capability. FineReader also tends to win in accuracy against OmniPage 
> in head-to-head tests.
> 
> Aaron
> 
> --
> This message was sent from a mobile device
> 
> 
>> On Sep 1, 2017, at 16:28, Singh, Nandini via BlindLaw
> <blindlaw at nfbnet.org> wrote:
>> 
>> I am not sure what program you have now, but I use Omni Page by 
>> Nuance,
> and I can convert 10-30 documents from PDF to Word or text depending 
> on
the
> size all in one go. I have tried to do more documents, but that really
slows
> down things. While the conversion  is running in the background, I can
still
> check email, review other documents, etc.
>> 
>> -----Original Message-----
>> From: BlindLaw [mailto:blindlaw-bounces at nfbnet.org] On Behalf Of Tai
> Tomasi via BlindLaw
>> Sent: Friday, September 1, 2017 5:21 PM
>> To: Blind Law Mailing List
>> Cc: Tai Tomasi
>> Subject: [blindlaw] Batch Recognition of OCR
>> 
>> Hello all. I am looking for a program that will monitor a given 
>> folder
for
> new PDF files and convert inaccessible PDF files to accessible PDF 
> (PDF/a) or Microsoft Word files. Does anyone know of a program that 
> can do this
type
> of automated batch OCR conversion? Right now, I have to initiate the 
> OCR process with a command for each document and rename the new 
> document to
the
> same filename as the original PDF with a .docx estension. This is not 
> an efficient use of my time. Thanks.
>> 
>> 
>> Ms. Tai Tomasi, J.D.
>> Pronouns: she/her/hers
>> Staff Attorney
>> 
>> [Description: DR%20IA%20LawCenter]
>> 
>> 400 East Court Ave., Ste. 300
>> Des Moines, Iowa 50309
>> Tel: 515-278-2502; Toll Free: 1-800-779-2502
>> FAX: 515-278-0539; Relay 711
>> E-mail: ttomasi at driowa.org<mailto:ttomasi at driowa.org>
>> www.driowa.org
>> 
>> Our Mission:  To defend and promote the human and legal rights of 
>> Iowans
> with disabilities
>> 
>> CONFIDENTIALITY NOTICE
>> 
>> This e-mail and any attachments contain information from the law firm 
>> of
> Disability Rights Iowa and are intended solely for the use of the 
> named recipient(s). This e-mail may contain privileged attorney-client 
> communications or work product. Any dissemination by anyone other than 
> an intended recipient is prohibited. If you are not a named recipient, 
> you
are
> prohibited from any further viewing of the e-mail or any attachments 
> or
from
> making any use of the e-mail or attachments. If you have received this 
> e-mail in error, notify the sender immediately and delete the e-mail, 
> any attachments, and all copies from any drives or storage media and 
> destroy
any
> printouts.
>> 
>> 
>> 
>> _______________________________________________
>> BlindLaw mailing list
>> BlindLaw at nfbnet.org
>> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
>> To unsubscribe, change your list options or get your account info for
> BlindLaw:
>> 
>
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/cannona%40fireantprodu
> ctions.com
> 
> _______________________________________________
> BlindLaw mailing list
> BlindLaw at nfbnet.org
> http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
> To unsubscribe, change your list options or get your account info for
> BlindLaw:
>
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/awebb2168%40gmail.com
> 
> 
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
> 

_______________________________________________
BlindLaw mailing list
BlindLaw at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
To unsubscribe, change your list options or get your account info for
BlindLaw:
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/steve.jacobson%40visi.
com



_______________________________________________
BlindLaw mailing list
BlindLaw at nfbnet.org
http://nfbnet.org/mailman/listinfo/blindlaw_nfbnet.org
To unsubscribe, change your list options or get your account info for
BlindLaw:
http://nfbnet.org/mailman/options/blindlaw_nfbnet.org/agtolentino%40gmail.co
m





More information about the BlindLaw mailing list