[Dtb-talk] Bookshare.org, internet archive and other automated DAISY production

Greg Kearney gkearney at gmail.com
Thu May 13 01:50:40 UTC 2010


I will point out here that it is possible, I have in fact done it to a few bookshare title to, after the fact correct the issues to which I spoke. And to be fair I have found some Bookshre titles which do nave chapter navigation. Below is outlined how I do this.

1. I take the Bookshare or other .xml file and I process it in DAISY Pipeline to generate a RTF file from it. I set Pipeline to include page numbering.

2. I open the resulting RTF in OpenOffice on a Mac and apply a string of OpenOffice macros I wrote that correct various issues in the text and create the proper page numbers on each page. I make suer the frontmatter of the book has roman numeral numbering and that the bodymatter is numbered correctly. Using regular expression searching I find the chapters and change their style to heading 1 and so on.

3. I export the OpenOffice file to DTBook XML file using odt2DAISY plugin.

4. I then use DAISY Pipeline again to generate a full text/full audio book using Apple Alex as the voice, we use the Apple voice as it has no distribution licensing requirements. I can also generate a DAISY text only book if needed.

The results of all of this is a properly numbered and navigable book. I can also do such things as put footnotes and such into skippable elements. We use OpenOffice as we have found it give us the best results and is cross platform and free so we can give the same program to all of our volunteer producers 

Gregory Kearney | Manager Accessible Media
Association for the Blind of WA - Guide Dogs WA
PO Box 101, Victoria Park WA 6979 | 61 Kitchener Ave, Victoria Park WA 6100
Tel: 08 9311 8246 | Fax: 08 9361 8696 | www.guidedogswa.com.au
Tel: 307-224-4022 (North America)
Email: greg.kearney at guidedogswa.com.au
Email: gkearney at gmail.com

On 13/05/2010, at 8:43 AM, David Andrews wrote:

> They do not "markup in DAISY."  They scan the files and prepare them in Kurzweil, Word, or another program, submitting a rtf file to Bookshare.  they are required to put pages in, at one time they weren't.  I can't speak for BSO, to know if they are going to up the requirements again.
> 
> Dave
> 
> At 06:54 PM 5/12/2010, you wrote:
>> Hello Dave and everyone,
>> 
>>    But are they teaching these people how to properly mark up books in
>> DAISY as Greg suggested?
>> 
>> Peter Donahue
>> 
>> ----- Original Message -----
>> From: "Andrews, David B B (DEED)" <David.B.Andrews at state.mn.us>
>> To: "Discussion of Digital Talking Books" <dtb-talk at nfbnet.org>
>> Sent: Wednesday, May 12, 2010 3:28 PM
>> Subject: Re: [Dtb-talk] Bookshare.org, internet archive and other automated
>> DAISY production
>> 
>> 
>> Peter:
>> 
>> All that sounds good, but the reality is that with a large number of
>> volunteers, both scanners and proofers, and a wide variety of tools,
>> consistent, reliable content is going to be difficult to achieve.
>> 
>> I think though that Bookshare.org is probably moving away from volunteers.
>> They are getting more and more from publishers directly, and I believe they
>> are paying people in India to do input, and staff is doing some too.  It is
>> my guess that ultimately volunteer-produced content will become a small part
>> of their overall operation.
>> 
>> Dave
>> 
>> 
>> 
>> -----Original Message-----
>> From: dtb-talk-bounces at nfbnet.org [mailto:dtb-talk-bounces at nfbnet.org] On
>> Behalf Of Peter Donahue
>> Sent: Wednesday, May 12, 2010 1:36 PM
>> To: Discussion of Digital Talking Books
>> Subject: Re: [Dtb-talk] Bookshare.org, internet archive and other automated
>> DAISY production
>> 
>> Hello everyone,
>> 
>>    It's obvious that bookshare.org and company need to teach their
>> volunteers the proper way to mark up DAISY books in addition to scanning
>> them. I will admit to being very disappointed after reading just how great
>> bookshare.org's DAISY books would be only to find that they were nothing but
>> a practical joke. What a great idea for a resolution.
>> 
>> Peter Donahue
>> 
>> ----- Original Message -----
>> From: "Burke, Dan (DSS)" <burke at mso.umt.edu>
>> To: "Discussion of Digital Talking Books" <dtb-talk at nfbnet.org>
>> Sent: Wednesday, May 12, 2010 10:40 AM
>> Subject: Re: [Dtb-talk] Bookshare.org,internet archive and other automated
>> DAISY production
>> 
>> 
>> Bookshare's limited use of navigation built on inclusion of headings and
>> matching pages is not just an automoation issue -- it is also a matter
>> of the volunteer submissions.  It's pretty easy to scan a book, a good
>> bit trickier to know how to use the pagination  features of MS Word to
>> make the prefatory pages one set of page numbers, and the remaining
>> pages normal page numbers.  And then the headings too ...   I doubt that
>> most volunteers know how to do such things.
>> 
>> That's why we end up with a couple of headings in the beginning of the
>> book and none for the chapters.  Bookshare could promote the use of such
>> improvements in raw scans before submission.  Some of the books I have
>> read from the -- I think publisher contributions -- have been
>> well-formed and highly navigabile.
>> 
>> On the other hand, as I have had more and more experience with creating
>> and reading Daisy books with navigable headings and so forth, my
>> expectations have risen accordingly.  I wish Bookshare would at least do
>> more to promote increased inclusion of of headings in submitted rich
>> text files - they aren't at all difficult to do.
>> 
>> Dan
>> 
>> Dan Burke
>> Assistant Director/Assistive Technology Coordinator
>> 
>> Disability Services for Studentstss
>> The University of Montana
>> Emma B. Lommasson Center 154
>> Missoula, MT 59812
>> 
>> 406.24.4424
>> 406.243.5330 FAX
>> 
>> www.umt.edu/disability
>> 
>> 
>> -----Original Message-----
>> From: dtb-talk-bounces at nfbnet.org [mailto:dtb-talk-bounces at nfbnet.org]
>> On Behalf Of Jim Barbour
>> Sent: Tuesday, May 11, 2010 11:38 PM
>> To: Robert Jaquiss; Discussion of Digital Talking Books
>> Subject: Re: [Dtb-talk] Bookshare.org, internet archive and other
>> automated DAISY production
>> 
>> Hey Everyone,
>> 
>> The challenges Robert outlines below are the same challenges we, as
>> blind
>> college students, have faced for decades.  I'd love to see these
>> issues tackled and solved.  However, even if they aren't solved, daisy
>> books are a huge step forward from cassette tapes.
>> 
>> Automating the daisy production process is a trade off.  On the plus
>> side, there will be more daisy books available, since the production
>> process is less limited by the number of volunteers available.  On the
>> minus side, the quality of the markup will be limited to what can be
>> reliably done using automation.
>> 
>> I personally think that automation wins and that we should not hold up
>> a workable solution while search for a nearer perfect one.
>> 
>> Just my $.02
>> 
>> Jim Barbour
>> 
>> On Tue, May 11, 2010 at 10:53:50PM -0500, Robert Jaquiss wrote:
>> > Hello Greg:
>> >
>> >     It is my opinion that DAISY books should absolutely reflect the
>> > structure of the original printed book. If they don't how could a
>> > student deal with a teacher's instructions to turn to page XX. In
>> > situations where a citation is needed, a reader couldn't produce a
>> > professionally acceptable citation. If the DAISY book is to be used
>> > to produce braille, proper pagination is a must. Sections and
>> > chapters also must be preserved.
>> >
>> > Regards,
>> >
>> > Robert Jaquiss
>> >
>> > ----- Original Message ----- From: "Greg Kearney"
>> > <gkearney at gmail.com>
>> > To: "daisy group" <dmfc-ig at mail.daisy.org>; "Discussion of Digital
>> > Talking Books" <dtb-talk at nfbnet.org>
>> > Sent: Tuesday, May 11, 2010 8:41 PM
>> > Subject: [Dtb-talk] Bookshare.org,internet archive and other
>> > automated DAISY production
>> >
>> >
>> > >I have been thinking of late about the various attempts to
>> > >generate DAISY from fully automated systems such as seems to be
>> > >the case at Bookshare.org, Internet Archive and some other
>> > >sources.
>> > >
>> > >The issue I have with these books is that their DAISY structure
>> > >does not reflect the printed book. In most cases that I have seen
>> > >the book is a single long heading level 1 with perhaps page
>> > >numbering in place. In many cases these page numbers are again not
>> > >reflective of the printed book, for example I have found books
>> > >with duplicate page numbers. For example where there is a page
>> > >number 4 in the front matter and a page number 4 in the body
>> > >matter.
>> > >
>> > >More troubling however is the lack of navigation to chapters and
>> > >other subsections of the books. This is of particular concern in
>> > >non-fiction text.
>> > >
>> > >Am I just being overly picky here? What do you all think?
>> > >
>> > >Gregory Kearney | Manager Accessible Media
>> > >Association for the Blind of WA - Guide Dogs WA
>> > >PO Box 101, Victoria Park WA 6979 | 61 Kitchener Ave, Victoria
>> > >Park WA 6100
>> > >Tel: 08 9311 8246 | Fax: 08 9361 8696 | www.guidedogswa.com.au
>> > >Tel: 307-224-4022 (North America)
>> > >Email: greg.kearney at guidedogswa.com.au
>> > >Email: gkearney at gmail.com
>> 
> 
>                        David Andrews:  dandrews at visi.com
> Follow me on Twitter:  http://www.twitter.com/dandrews920
> 
> 
> _______________________________________________
> Dtb-talk mailing list
> Dtb-talk at nfbnet.org
> http://www.nfbnet.org/mailman/listinfo/dtb-talk_nfbnet.org
> To unsubscribe, change your list options or get your account info for Dtb-talk:
> http://www.nfbnet.org/mailman/options/dtb-talk_nfbnet.org/gkearney%40gmail.com





More information about the DTB-Talk mailing list