In a Web Wednesday post, we're highlighting a great tutorial from David Rousset.
David shows us how we can write one browser extension that works in every modern browser (except Safari).
AND he shows how you can use that extension to connect to a number of Microsoft services, such as the Microsoft Cognitive Services Computer Vision API and Bing Text to Speech API too.
Finally, while using shared code, how to also add browser specific features too.
I’ll explain how you can install this extension that supports the web extension model (I.e. Edge, Chrome, Firefox, Opera, Brave and Vivaldi), and provide some simple tips on how to get a unique code base for all of them, but also how to debug in each browser.
Our Extension Link
Let’s build a proof of concept — an extension that uses artificial intelligence (AI) and computer vision to help the blind analyze images on a web page.
We’ll see that, with a few lines of code, we can create some powerful features in the browser. In my case, I’m concerned with accessibility on the web and I’ve already spent some time thinking about how to make a breakout game accessible using web audio and SVG, for instance.
Still, I’ve been looking for something that would help blind people in a more general way. I was recently inspired while listening to a great talk by Chris Heilmann in Lisbon: “Pixels and Hidden Meaning in Pixels.”
Indeed, using today’s AI algorithms in the cloud, as well as text-to-speech technologies, exposed in the browser with the Web Speech API or using a remote cloud service, we can very easily build an extension that analyzes web page images with missing or improperly filled
My little proof of concept simply extracts images from a web page (the one in the active tab) and displays the thumbnails in a list. When you click on one of the images, the extension queries the Computer Vision API to get some descriptive text for the image and then uses either the Web Speech API or Bing Speech API to share it with the visitor.