Forwarded from blinux-list. Date: Mon, 15 Mar 1999 16:07:14 +1100 (AEDT) From: Jason White <jasonw@xxxxxxxxxxxxxxxxxxxxxxxx> This kind of approach has been tried by DOS screen reader developers who have included macro programming in their products. Jaws for DOS and, as I understand it, IBM screen reader have followed such a strategy. Right, but are these languages as rich and detailed as, say, Python or TCL? Are they also object-oriented, allowing for components of the interface to be instantiated and modified at willl? I never intended to imply that my approach was new, only that it would be new to Linux. While SVLPRO and Speakup are certainly good solutions, I haven't seen plans to integrate a scripting language into any of them. It breaks down as application complexity increases: it becomes necessary to write special macros for each context within an application to read the desired information at an appropriate moment. If the visual presentation is changed (E.G. the application has been set to use non-default colours) then the pattern matching that relies on such uniformities ceases to operate correctly. This strikes me as a limitation placed upon the design of the screen reader itself. I used to use Vocal-eyes often, but was frustrated by the fact that I often knew what I wanted to do to speech-enable something, but I couldn't use the limited menus to accomplish it. With a well-designed API, an application should be easy to extend without worrying about color schemes. Or, if the possibility exists to change colors, it should be easy to code various alternatives into the script so it can determine its mode of operation and create windows and audio widgets as needed. More importantly, this approach fails whenever the important distinctions needed for high quality braille or speech output are not apparent in the visual interface but do exist in the underlying data (for example the SCOPE and HEADERS attributes in HTML 4.0 tables which clarify the relations between header and content cells, the labels of form fields which, thanks to HTML 4.0, can be provided explicitly in the markup but which a graphical browser would not present visually, the structure inherent in a marked up document, which might be presented differently on the screen depending on the configuration of the browser and the style sheet, the semantics reflected in TeX markup of mathematical content which can not easily be derived from a graphical presentation, etc.). Granted. Again, we touch on the argument that a single tool won't solve every problem. This argument is the main reason why I have problems with Emacspeak, and is why my approach won't meet everyone's needs, either. A double-edged sword, but one which is unavoidable. Emacspeak will still be great for web browsing, spreadsheets, and for any tasks which require Emacs. When my screen reader is available, I won't even work on an Emacs extension for it; I'll just tell users who want to run Emacs to use Emacspeak instead. But, if someone wants to install a distribution, run certain applications without the overhead of Emacs, or even if someone is used to 'stupid' screen readers and doesn't find their output confusing, they can use what I develop. While very sophisticated and as yet, to my knowledge, non-existent artificial intelligence techniques might be able to identify visual cues in a wide variety of circumstances and provide an efficient auditory or braille presentation, the costs in terms of research, computing resources, etc., of developing such a system would make it impractical. Agreed. The better approach is to design accessibility into user interfaces themselves so that appropriate structural and semantic aspects of the content are made available, automatically and in parallel with the visual interface, wherefrom an effective and convenient braille or auditory representation can be easily constructed. Again, I agree with your statement, to a point. Yes, user interfaces should be modified, which is why I have my sites set on Berlin at the moment (http://www.berlin-consortium.org.) But, that isn't realistically possible in all cases. In the cases where it isn't, or in situations where it isn't needed, an excellent alternative needs to exist. Interestingly, the UltraSonix approach initially started by trying to monitor the data sent to an X server and to derive the auditory representation therefrom. This strategy was abandoned after it was realised that low-level information concerning the graphical presentation was insufficient to provide the basis of a reliable auditory interface. By analogy, the same argument can be made in relation to the web, to software designed to handle spreadsheets, marked up documents, etc. It is simply more cost effective to develop means of exploiting the structural and semantic distinctions available in the internal representation than to try to infer them from a visual (ultimately a graphical) presentation, in which some of the information will inevitably be lost, and could only be inferred with great difficulty. Right. As stated above, Emacspeak would be best for web browsing, spreadsheets, etc. My approach is best for those who want to use shell applications, play games, etc. Things which don't have specific Emacspeak extensions. Just as an aside, I'm on spring break at the moment, and will probably begin hacking on this today. I have some old C code which communicates with Emacspeak speech servers, but I'm a C++/OO freak, so I'll probably convert the code to classes later today, since I'll code this project in C++. Anyhow, I hope to have the ability to communicate with speech servers as well as read data from the screen at a very rough state by the end of this week, though I'm not releasing anything until I have a good, tested scripting API at my disposal, as well as several test environments which work well. If anyone is interested in following my efforts or receiving pre-release code snapshots, please drop me a message. I'm not on resnet at the moment, though when I am I may set up a CVS server. It'd be nice to pull in additional developers, though I may wait until I have a strong base of code available.