making YouTube (etc) videos more accessible?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The discussion of scanning and converting PDFs reminded me of a topic I've thought about a number of times.  Perhaps some folks here will be interested...

I watch a lot of conference presentations on YouTube (etc).   Typically, these have been edited into a collage, showing the speaker, a display screen, and perhaps some graphics for the event.  The display screen generally shows styled text, bullet points, charts and other graphic images, etc.

Although a blind person can listen to the audio track, they will miss all of the visual content.  So, I've wondered what the prospects might be for improving this situation.  For example, it seems like it should be possible for software to:

- pull static images from the video stream
- recognize the region containing the display screen
- extract text and layout information
- convert this to HTML
- synchronize the HTML to the audio track

Or, in this age of LLMs and such, perhaps the software could analyze the visual content and be prepared to discuss it interactively.   Might anyone know of any work in this area and/or have thoughts about how such a facility "should" work?  

-r

To unsubscribe from this group and stop receiving emails from it, send an email to blinux-list+unsubscribe@xxxxxxxxxx.





[Index of Archives]     [Linux Speakup]     [Fedora]     [Linux Kernel]     [Yosemite News]     [Big List of Linux Books]