You would have the screen reader jabbering over the presenter, so I'm not sure that would take off. Audio description is usually slotted between dialog in tv shows and movies, so I'm not sure this approach would work for the type of content you are talking about. ----- Original Message ----- From: Rich Morin <rdm@xxxxxxxx> To: Linux for blind general discussion <blinux-list@xxxxxxxxxx> Date: Sun, 28 Jul 2024 10:37:27 -0700 Subject: making YouTube (etc) videos more accessible? > The discussion of scanning and converting PDFs reminded me of a topic I've thought about a number of times. Perhaps some folks here will be interested... > > I watch a lot of conference presentations on YouTube (etc). Typically, these have been edited into a collage, showing the speaker, a display screen, and perhaps some graphics for the event. The display screen generally shows styled text, bullet points, charts and other graphic images, etc. > > Although a blind person can listen to the audio track, they will miss all of the visual content. So, I've wondered what the prospects might be for improving this situation. For example, it seems like it should be possible for software to: > > - pull static images from the video stream > - recognize the region containing the display screen > - extract text and layout information > - convert this to HTML > - synchronize the HTML to the audio track > > Or, in this age of LLMs and such, perhaps the software could analyze the visual content and be prepared to discuss it interactively. Might anyone know of any work in this area and/or have thoughts about how such a facility "should" work? > > -r > > To unsubscribe from this group and stop receiving emails from it, send an email to blinux-list+unsubscribe@xxxxxxxxxx. > > To unsubscribe from this group and stop receiving emails from it, send an email to blinux-list+unsubscribe@xxxxxxxxxx.