Re: Internal data (was Re: Speech-enabling approach)

"Nolan Darilek" <nolan_d@xxxxxxxxxxx> · 15 Mar 1999 17:00:19 -0000

Forwarded from blinux-list.

   Date: Mon, 15 Mar 1999 16:07:14 +1100 (AEDT)
   From: Jason White <jasonw@xxxxxxxxxxxxxxxxxxxxxxxx>

   This kind of approach has been tried by DOS screen reader developers who
   have included macro programming in their products. Jaws for DOS and, as I
   understand it, IBM screen reader have followed such a strategy.

Right, but are these languages as rich and detailed as, say, Python or
 TCL? Are they also object-oriented, allowing for components of the
 interface to be instantiated and modified at willl? I never intended
 to imply that my approach was new, only that it would be new to
 Linux. While SVLPRO and Speakup are certainly good solutions, I
 haven't seen plans to integrate a scripting language into any of
 them.

 It breaks
   down as application complexity increases: it becomes necessary to write
   special macros for each context within an application to read the desired
   information at an appropriate moment. If the visual presentation is
   changed (E.G. the application has been set to use non-default colours)
   then the pattern matching that relies on such uniformities ceases to
   operate correctly.

This strikes me as a limitation placed upon the design of the screen
 reader itself. I used to use Vocal-eyes often, but was frustrated by
 the fact that I often knew what I wanted to do to speech-enable
 something, but I couldn't use the limited menus to accomplish
 it. With a well-designed API, an application should be easy to extend
 without worrying about color schemes. Or, if the possibility exists
 to change colors, it should be easy to code various alternatives into
 the script so it can determine its mode of operation and create
 windows and audio widgets as needed.

   More importantly, this approach fails whenever the
   important distinctions needed for high quality braille or speech output
   are not apparent in the visual interface but do exist in the underlying
   data (for example the SCOPE and HEADERS attributes in HTML 4.0 tables
   which clarify the relations between header and content cells, the labels
   of form fields which, thanks to HTML 4.0, can be provided explicitly in
   the markup but which a graphical browser would not present visually, the
   structure inherent in a marked up document, which might be presented
   differently on the screen depending on the configuration of the browser
   and the style sheet, the semantics reflected in TeX markup of mathematical
   content which can not easily be derived from a graphical presentation,
   etc.).

Granted. Again, we touch on the argument that a single tool won't
 solve every problem. This argument is the main reason why I have
 problems with Emacspeak, and is why my approach won't meet everyone's
 needs, either. A double-edged sword, but one which is
 unavoidable. Emacspeak will still be great for web browsing,
 spreadsheets, and for any tasks which require Emacs. When my screen
 reader is available, I won't even work on an Emacs extension for it;
 I'll just tell users who want to run Emacs to use Emacspeak
 instead. But, if someone wants to install a distribution, run certain
 applications without the overhead of Emacs, or even if someone is
 used to 'stupid' screen readers and doesn't find their output
 confusing, they can use what I develop.

 While very sophisticated and as yet, to my knowledge, non-existent
   artificial intelligence techniques might be able to identify visual cues
   in a wide variety of circumstances and provide an efficient auditory or
   braille presentation, the costs in terms of research, computing resources,
   etc., of developing such a system would make it impractical.

Agreed.

   The better
   approach is to design accessibility into user interfaces themselves so
   that appropriate structural and semantic aspects of the content are made
   available, automatically and in parallel with the visual interface,
   wherefrom an effective and convenient braille or auditory representation
   can be easily constructed.

Again, I agree with your statement, to a point. Yes, user interfaces
should be modified, which is why I have my sites set on Berlin at the
moment (http://www.berlin-consortium.org.) But, that isn't
realistically possible in all cases. In the cases where it isn't, or in
situations where it isn't needed, an excellent alternative needs to exist.

   Interestingly, the UltraSonix approach initially started by trying to
   monitor the data sent to an X server and to derive the auditory
   representation therefrom. This strategy was abandoned after it was
   realised that low-level information concerning the graphical presentation
   was insufficient to provide the basis of a reliable auditory interface. By
   analogy, the same argument can be made in relation to the web, to software
   designed to handle spreadsheets, marked up documents, etc. It is simply
   more cost effective to develop means of exploiting the structural and
   semantic distinctions available in the internal representation than to try
   to infer them from a visual (ultimately a graphical) presentation, in
   which some of the information will inevitably be lost, and could only be
   inferred with great difficulty. 

Right. As stated above, Emacspeak would be best for web browsing,
spreadsheets, etc. My approach is best for those who want to use shell
applications, play games, etc. Things which don't have specific
Emacspeak extensions.

Just as an aside, I'm on spring break at the moment, and will probably
begin hacking on this today. I have some old C code which communicates
with Emacspeak speech servers, but I'm a C++/OO freak, so I'll
probably convert the code to classes later today, since I'll code this
project in C++. Anyhow, I hope to have the ability to communicate with
speech servers as well as read data from the screen at a very rough
state by the end of this week, though I'm not releasing anything until
I have a good, tested scripting API at my disposal, as well as several
test environments which work well. If anyone is interested in
following my efforts or receiving pre-release code snapshots, please
drop me a message. I'm not on resnet at the moment, though when I am I
may set up a CVS server. It'd be nice to pull in additional
developers, though I may wait until I have a strong base of code
available.