[BlindMath] Emacspeak --- A Speech Odyssey
David Andrews
dandrews920 at comcast.net
Sun Aug 11 10:32:39 UTC 2024
EMACSPEAK The Complete Audio Desktop - Wednesday, July 31, 2024 at 4:25 PM
Emacspeak --- A Speech Odyssey
Emacspeak: A Speech Odyssey
1. Dedication: To My Guiding Eyes Aster, Hubbell and Tilden
<https://emacspeak.sourceforge.net/raman/aster-labrador/>
Aster Labrador
(2/15/1987
12/05/1999)<https://emacspeak.sourcceforge.net/raman/hubbell-labrador/>
Hubbell Labrador
(12/21/1997
4/111/2011)<https://emacspeak.sourceforge.net/raman/tilden-labrador/>
Tilden Labrador
(8/4/2009 9/3/2022)
2. Key Insights
*
<https://www.drdobbs.com/user-interface-a-means-to-an-end/184410453>User
interface is a means to an end.
* Open Source is essential for discovering new interaction paradigms.
* This is not mere idealism. Openness is a
key enabler for creating user journeys that were
not envisioned by a systems designers.
* <https://www.gnu.org/s/emacs/>Emacs and
<https://en.wikipedia.org/wiki/TeX>TeX are good
exemplars. They permit maximal freedom when seen
from the viewpoint of user extensibility and
creativity. TeX enabled
<https://emacspeak.blogspot.com/2022/12/aster-spoken-math-on-emacspeak-audio_21.html>Audio
System For Technical Readings (AsTeR); Emacs
enabled <https://emacspeak.sourceforge.net>Emacspeak.
* Rapid, reliable task completion is the most
important metric and trumps secondary items such
as eye-candy the latter only leads to bloatt as evinced by the HTML Web.
* Having a well-identified problem when designing a system is paramount.
* Usability is important, but to matter, the
system needs to be useful first.
* Ease of use by itself is often marketing hype.
* Useful systems are fun to learn and give
back more than what you put in with respect to time and effort.
* A steep learning curve in and of itself is
not to be feared it can be fun to learn and gets you farther faster.
* True empowerment: Ensure that the user grows continuously.
3. Emacspeak The Complete Audio Desktop
* Emacspeak, started in September 1994, was
released as Open Source in
<https://tvraman.github.io/emacspeak//web/releases/release-3.0.html>April
1995.
* The goal was to create a system for daily
use that doubled as a research work-bench for
developing an auditory interface.
* Speech and auditory output would be treated as first-class citizens.
* The time felt right with respect to
building a system that enabled eyes-free access to the emerging Web.
*
<https://emacspeak.sourceforge.net/turning-twenty.html>Emacspeak
At Twenty was published in September 2014 and
traced the evolution of the project.
* Now, this article gives a birds-eye
overview of the last 10 years by loosely
following the logical structure of the Turning Twenty paper.
* In the process, we identify the dreams that
have come to pass as well as the expectations
that have failed to materialize both attributable
to developments in the larger Internet eco-system.
* But never fear, though some of these may be
superficially disappointing, they likely herald
the nature of bigger and better things to come!
* As a proof-point, in 1994, I could not have
imagined the impact that the world of
Internet-centered computing and the accompanying
information revolution would have on the state of information access.
* Conversely, I boldly (and incorrectly)
predicted that the arrival of mobile devices and
mainstream speech interfaces would herald the
move to a Web of information where there would be
a clean separation between application back-ends
and various client-specific front-ends. See
<https://emacspeak.sourceforge.net/raman/publications/specialized-browsers/>Specialized
Browsers and
<https://youtube.com/watch?v=TZKwvBkS5cs>The Web,
The Way You Want. Distinguished Lecture Series, UW Oct 2007.
* The above still makes sense from the view
of scalable software architecture. However the
rapid growth of the Web economy has also resulted
in an even faster race to the bottom where
applications continue to be built and re-built
every few years for the next best thing welcome
to the write once, debug everywhere world all over again!
* Case in point; today we have smart phones,
smart watches and smart speakers, but each of
these require targeted front-ends if one wishes
to bring the riches of the Internet to them.
* So the larger the Web gets, the fewer
devices it becomes available on a classic downward spiral.
Share And Enjoy The Best Is Yet To Come!
3.1. How To Read This Document
* I recommend reading the Turning Twenty paper to get a full overview.
* Then, read this paper a section at a time,
while referring back to the parallel section in
the Turning Twenty paper to understand how things have evolved.
* Make sure to skim or deep-dive into the references in both papers.
4. Using UNIX With Speech Output 2024
* In 2024 UNIX equates mostly to various
Linux distributions, and from the Emacspeak
perspective, they are all made mostly equal.
* Variations do exist and running
bleeding-edge distributions can come with issues,
e.g., unstable versions of the underlying audio infrastructure.
* Yes, 30 years and counting, Linux Audio is
still a work in progress though I hope Pipewire
will be the last of these tidal shifts.
* Linux is moving to Wayland and expect that transition to be choppy.
* Native applications are mostly gone bar the
shouting. In this context, where most users
access things through a mainstream Web browser,
Emacspeak users access everything through Emacs.
* The above when done right is hugely
empowering; when done badly, its extremely
limiting see lateer sections of this paper on the
continuing evolution of the Web.
5. Key Enabler Emacs And Lisp Advice
* Advice in Emacs as implemented in advice is rock-solid.
* There is a newer nadvice that is part of
Emacs that Emacspeak does not use.
* There are no plans to migrate to nadvice
since that is a lot of busy work in my view and
any such migration would be difficult to test for correctness.
* The classic advice package may be removed
from Emacs at some point in the future, but never
fear; itll be bundled with Emacspeak if that
becomes necessary. This is a feature of Free
Software and is a great example of what that Freedom entails.
6. Key Component Text To Speech (TTS)
* Speech output especially unencumbered
text-to-speech is just as much a challenge as it was 30 years ago.
* In the bigger picture, early instances of
using TTS for voice assistants has driven the
industry toward natural sounding voices.
* The above sounds attractive on the surface,
but a price we have paid is the loss of
fine-grained control over voice parameters,
emotion, stress and other supra-linguistic features.
* I believe these to be essential for
delivering good auditory interfaces and remain
optimistic that these will indeed arrive in a
future iteration of speech interaction.
* Things appear to be coming full circle,
Emacspeak started with the hardware Dectalk; now,
the
<https://github.com/dectalk/dectalk.git>Software
Dectalk is increasingly becoming the primary
choice on Linux ssee this
<https://raw.githubusercontent.com/tvraman/emacspeak/master/servers/software-dtk/Readme.org>Readme
for setup instructions.
* Viavoice Outloud from Voxin is still
supported. However, you can no longer buy new
licenses. If you have already purchased a license, itll continue to work.
* The Vocalizer voices that Voxin now sells do not work with Emacspeak.
* The other choice on Linux is ESpeak which
will hopefully continue to be free albeit of much lower quality.
* The future as ever is unpredictable and new
voices may well show up especially those powered
by on-device Large Language Models (LLMs).
* On non-free platforms, there is usable TTS
on the Mac, now supported by the new SwiftMac server for Emacspeak.
7. Emacspeak And Software Development
* Magit as a Git porcelain is perhaps the
biggest leap forward with respect to software development.
* New completion frameworks such as company
and consult come a close second in enhancing productivity.
* Completion strategies such as fuzzy and
flex provide enhanced completion.
*
<https://emacspeak.blogspot.com/2018/06/effective-suggest-and-complete-in-eyes.html>Effective
Suggest And Complete In An Eyes-free Environment
explains the higher-level concept involved in defining such strategies.
* The ability to introspect code via eglot
turns Emacs into a powerful and meaningful IDE I
say meaningful because this brings thee best
features of an integrated development environment
while leaving behind the eye-candy that tends to bloat commercial IDEs.
* Packages like transient enable
discoverable, rapid keyboard access to complex nested-menu driven interfaces.
*
<https://emacspeak.blogspot.com/2023/09/emacs-ergonomics-dont-punish-your.html>Ergonomic
keybindings under X using
<https://github.com/alols/xcape>xcape to minimize
chording has been a significant win in the last two years.
* Jupyter is the generalization of IPython
notebooks to Julia, Python and R. The news here
isnt all good; IPython notebooks are
well-designed with respect to not getting locked
into any given implementation. However in
practice, front-ends depend on Javascript in the browser.
* Consequently, Emacs packages for IPython
Notebooks e.g., package ein, are no longer maintained.
* Developing in higher-level languages
continues to be very well supported in Emacspeak.
* The re-emergence of Common Lisp in the last
20 years, thanks to
<https://asdf.common-lisp.dev/asdf.html>asdf and
<https://www.quicklisp.org/>quicklisp as a
network-aware package manager and build tool has
once again made Lisp development using Emacs Slime a productive experience.
* In 2022, I updated
<https://emacspeak.blogspot.com/2022/12/aster-spoken-math-on-emacspeak-audio_21.html>Audio
System For Technical Readings (AsTeR) my PhD
project from 1993 to run under SBCL with a
freshly implemented Emacs front-end.
* So now I can listen to Math content just as
well as I could 30 years ago!
8. Emacspeak And Authoring Documents
* Package org is to authoring as magit is to
software development with respect to productivity gains.
* Org has existed since circa 2006 in my
Emacs setup; but it continues to give and give plentifully.
* Where I once authored technical papers in
LaTeX using auctex, used nxml for HTML, etc., I
now mostly write everything in org-mode and
export to the relevant target format.
* Integrating various search engines in Emacs
makes authoring content extremely productive.
* Integrated access to spell-checking
(flyspell) dictionaries, translation engines, and
other language tools combine for a powerful authoring work-bench.
* Extending org-mode with custom link types
enables smart note taking with hyperlinks to
relevant portions of an audio stream see article
<https://emacspeaak.blogspot.com/2022/10/learn-smarter-by-taking-rich-hypertext.html>Learn
Smarter By Taking Rich Hypertext Notes.
9. Emacspeak And The Web In 2024
* Package shr and eww arrived around 2014.
But in 2024, they can be said to have truly landed.
* 2014 also marked the explicit take-over of
the stewardship of the HTML Web by the browser
vendors from the W3C I say explicit because the
W3C had already thrown in the towel in the preceding decade.
* This has led to a Web of content created
using the assembly language of divs, spans and
Javascript under the flag of HTML5 the result
is a tangled web of spaghetti that everyone loves to hate.
* In this context, see
<https://idlewords.com/talks/website_obesity.htm>Tag
Soup, Scripts And Obfuscation: How The Web Was
Broken for a good overview of HTMLs obesity problem.
* For better or worse, the investment in XML
and display-independent content is now a complete
write-off at least on the surface.
* So what next wait for the spaghetti monster
to show up for lunch? Humor aside that monster
may well be called AI though whether todays Web
givees that monster life, indigestion,
constipation, dysentery or hallucinations is a
story to be written in the coming years.
* I say on the surface above because The
welcome re-emergence of ATOM and RSS feeds is
perhaps a silent acknowledgement that bloated Web
pages are now unusable even for users who can see.
* Package elfeed has emerged as a powerful feed-manager for Emacs.
* Emacspeak implements RSS and ATOM support
using XSLT; those features now shine brighter
with mainstream news sites reviving their support for content feeds.
* Browsers like Mozilla now implement content
filters a euphemism for scrapping off visual
eye-candy and related cruft to reveal the
underlying content. These are now available as
plugins, (see
<https://github.com/eafer/rdrview>RDRView for an
example). Emacspeak leverages this to make the Web more readable.
* Package url-template and
emacspeak-websearch continue to give in plenty,
though they do require continuous updating.
* Web APIs come and go, so that space is in a state of constant change.
* The state of web applications is perhaps
the most concerning from an Emacspeak
perspective, and I do not see that changing in
the short-term. There are no incentives for Web
providers to free their applications from the
tangled Web of spaghetti they have woven around themselves.
* But as with everything else in our
industry, it is precisely when something feels
completely entrenched that users rebel and
innovations emerge to move us to the next phase so fingers crossedd.
10. Audio Formatting Generalizing Aural CSS
* Audio formatting with Aural CSS support is
stable, with new enhancements supporting more TTS engines.
* Support for parallel streams of TTS using
separate outputs to left/right channels is a big
win and enables more efficient interaction.
* Support for various Digital Signal
Processing (DSP) filters enables rich auditory
effects like binaural audio and spatial audio.
*
<https://emacspeak.blogspot.com/2015/12/soundscapes-on-emacspeak-audio-desktop.html>Soundscapes
implemented via package boodler makes for a
pleasant and relaxing auditory environment.
* Enabling virtual sound devices via Pipewire
for 5.1 and 7.1 spatial audio significantly enhances the auditory experience.
11. Conversational Gestures For The Audio Desktop
* Parallel streams of audio, combined with
more ergonomic keybindings are the primary enhancement in this area.
* Parallel streams of speech, e.g., a
separate notification stream on the left or right
ear help increase the band-width of communication.
* Notifications can thus be delivered without
having to stop the primary speech output.
12. Accessing Media Streams
* Emacspeak support for rich multimedia is now much more robust.
* Emacs package empv is a powerful tool for
locating, organizing and playing local and remote
media streams ranging from music, audio books, radio stations and Podcasts.
* This makes media streams from a large
number of providers ranging from the BBC to
Youtube available via a consistent keyboard interface.
* This experience is augmented by a
collection of smart content locators on the
Emacspeak desktop, see the relevant blog article
titled
<https://emacspeak.blogspot.com/2024/03/updated-smart-media-selector-for-audio.html>smart
media selectors.
13. Electronic Books Ubiquitous Access To Books
* Emacspeak modules for Epub and Bookshare
continue to provide good books integration.
* There are smart book locators analogous to
the locators for media content.
* Emacspeak speech-enables Calibre for
working with local electronic libraries.
14. Leveraging Computational Tools From SQL And R To IPython Notebooks
* This area continues to provide a rich collection of packages.
* Newer highlights include sage interaction for symbolic computation.
* Emacspeak speech-enables packages gptel and
ellama for working with local and network LLMs.
15. Social Web Mail, Messaging And Blogging
* This is a space that is definitely regressing.
* The previous decade was marked by open APIs
to many social Web platforms.
* Over time these first regressed with respect to privacy.
* Then they turned into wall-gardens in their own right.
* Finally, the Web APIs, other than the kind
embedded in Javascript have started disappearing.
* Looking back, the only social platform I
now use is Blogger for hosting my Emacspeak Blog,
it has a somewhat usable API, albeit guarded by a
difficult to use OAuth interface that requires
signing in via a mainstream browser.
* IMap continues to survive as an open email
protocol, though its days may well be numbered.
* The dye is already cast with respect to
mere mortals being able to setup and host their
email witness the complexity in setting up the
Emacspeak mailing list in 2023 vs 1993!
* This is an area that is likely to get worse
before it gets better, thanks to the spammers
mores thhe pity, since Internet Email is perhaps
the single-most impactful technology with respect
to leveling the communications playing field.
* The disappearance of APIs mentioned above
also means that today the only usable chat
service on an open platform like Emacspeak is the
venerable Internet Relay Chat (IRC).
16. The RESTful Web Web Wizards And URL Templatess For Faster Access
* This area continues to thrive either
because of or despite the bese best and worst
efforts of application providers on the Web.
* Twenty years on (this feature originally
landed in 2000) Emacspeak has a far richer
collection of filters, preprocessors and
post-processors that enables ever-more powerful
Web wizards. See the relevant
<https://tvraman.github.io/emacspeak/manual/URL-Templates.html>chapter
in the Emacspeak manual for the automatically updated list of URL Templates.
17. Mashing It Up Leveraging AII And The Web
* Developing solutions by combining various
API-based services on the Web has all but
disappeared, unless one is willing to commit
fully to the Javascript-powered Web hosted in a
Web browser, something I hope I never have to accept.
* So for now, Ill keep well away and count my blessings.
* The next chapter of the mash-up story may
well be based around Generative AI using LLMs. In
effect, LLMs trained on Web content define a
platform for generating content mash-ups. The
issue at present is that they are just as likely
to produce meaningless mush something that may
get better as the field getss a handle on cleaning up Web content.
* Notice that we are now back to the
previously unsolved problem of cleaning up the
HTML Web with LLMs, well just have aan order of
magnitude more documents than the 2W postulated
by
<https://emacspeak.sourceforge.net/raman/publications/beyond-web20-cacm-2009/>Beyond
Web 2.0, Communications Of The ACM, 2009.
18. The Final Word Donald E Knuth (DEK)
* The best theory is inspired by practice.
The best practice is inspired by theory.
* The enjoyment of ones tools is an
essential ingredient of successful work.
* Easy things are often amusing and relaxing,
but their value soon fades. Greater pleasure,
deeper satisfaction, and higher wages are
associated with genuine accomplishments, with the
successful fulfillment of a challenging task.
*
<https://www.azquotes.com/author/8177-Donald_Knuth>Computer
Programming Is An Art.
The best example of the above is of course
<https://en.wikipedia.org/wiki/TeX>Knuths TeX
work that was motivaated by his own
dissatisfaction with the tools available to him
at the time for typesetting his magnum opus The
Art Of Computer Programming (TAOCP). It is
something Ive looked up to ever since my time as
a graduate student at Cornell.
The Emacspeak Speech Odyssey outlined in this
paper is, in some small measure, my own personal
experience of the sentiments he expresses.
T. V. Raman, San Jose, CA, August 1, 2024.
>
19. References
*
<https://www.drdobbs.com/user-interface-a-means-to-an-end/184410453>User
Interface is a means to an end, DDJ 1997.
* <https://www.gnu.org/s/emacs/>GNU Emacs
* <https://en.wikipedia.org/wiki/TeX>Knuths TeX
*
<https://emacspeak.blogspot.com/2022/12/aster-spoken-math-on-emacspeak-audio_21.html>Audio
System For Technical Readings
*
<https://tvraman.github.io/emacspeak//web/releases/release-3.0.html>Announcing
Emacspeak: April 1995
*
<https://emacspeak.sourceforge.net/turning-twenty.html>Emacspeak At Twenty
*
<http://www.cs.washington.edu/htbin-post/mvis/mvis?ID=636>The
Web, The Way You Want. Distinguished Lecture Series, UW Oct 2007
*
<https://emacspeak.sourceforge.net/raman/publications/specialized-browsers/>Specialized
Browsers
*
<https://tvraman.github.io/emacspeak/web/01-gemini.ogg>An
Ode To Emacspeak: The Best Is Yet To Come
* <https://github.com/dectalk/dectalk.git>Software Dectalk on Github
*
<https://raw.githubusercontent.com/tvraman/emacspeak/master/servers/software-dtk/Readme.org>Dectalk
setup instructions
*
<https://emacspeak.blogspot.com/2018/06/effective-suggest-and-complete-in-eyes.html>Effective
Suggest And Complete In An Eyes-free Environment
* <https://asdf.common-lisp.dev/asdf.html>Common Lisp: asdf
* <https://www.quicklisp.org/>Common Lisp: Quicklisp
*
<https://emacspeak.blogspot.com/2015/12/soundscapes-on-emacspeak-audio-desktop.html>Soundscapes
on the Emacspeak Audio Desktop
* <https://en.wikipedia.org/wiki/REST>RESTful Web
*
<https://emacspeak.blogspot.com/2023/09/emacs-ergonomics-dont-punish-your.html>Ergonomic
keybindings
* <https://github.com/alols/xcape>Minimize chording with XCape
*
<https://emacspeak.blogspot.com/2022/10/learn-smarter-by-taking-rich-hypertext.html>Learn
Smarter By Taking Rich Hypertext Notes
*
<https://idlewords.com/talks/website_obesity.htm>Tag
Soup, Scripts And Obfuscation: How The Web Was Broken
* <https://github.com/eafer/rdrview>Readable Web Pages: RDRView
*
<https://emacspeak.blogspot.com/2024/03/updated-smart-media-selector-for-audio.html>smart
media selectors
*
<https://emacspeak.sourceforge.net/raman/publications/beyond-web20-cacm-2009/>Beyond
Web 2.0, Communications Of The ACM, 2009
*
<https://tvraman.github.io/emacspeak/manual/URL-Templates.html>Emacspeak
Manual: URL Templates
*
<http://emacspeak.blogspot.com/2007/07/emacspeak-and-beautiful-code.html>Beautiful
Code An overview of the Emacspeak architecture, OReilly Media, 2007.
*
<https://www-cs-faculty.stanford.edu/~knuth/taocp.html>The
Art Of Computer Programming (TAOCP)
https://emacspeak.blogspot.com/2024/07/emacspeak-speech-odyssey.html
More information about the BlindMath
mailing list