This, hundreds times this. I can't say how many years in a row I was experimenti...

emidoots · on May 26, 2021

If we want more accessible UIs, we need to demand better standard accessibility APIs from OS vendors Microsoft and Apple.

Unity has no official support for screenreaders or colorblind modes. Unreal has both. If an major game engine developer like Unity cannot be bothered to add support, can we really expect the little developer hacking away on some OpenGL side project to?

Most people working with OpenGL use GLFW (or sometimes SDL), neither have any cross-platform API for supporting screen readers. Why? Because not only do Windows and MacOS have different APIs, but different screen readers, braille displays, etc. have different APIs.

But okay - let's look at the web. It's the most accessible platform of them all, right? What have Google/Apple/Mozilla done to support developers adding screenreader support from WebGL and Canvas-based applications? -> not a whole lot. You need to inject text into a hidden div, which also has performance implications so you need to build a UI to toggle it on/off or detect accessibility the way Flutter Web does.

We should have better accessibility, but as long as we expect developers to run through several hours/days of hoops to get even something basic working - it's just not going to happen, and that is mostly the fault of extremely poor platform APIs.

mwcampbell · on May 26, 2021

> extremely poor platform APIs

As a former member of the Windows accessibility team at Microsoft, I'd appreciate your thoughts on what makes the platform accessibility APIs extremely poor. I have my own thoughts on what makes them difficult to implement, but I'd like to hear your perspective first.

xyzzy_plugh · on May 26, 2021

I've dabbled in this space and it would easily, easily be more code than all of Gio just to get basic accessibility integration in Windows. COM, OLE, a heavy bias towards automatically working with native UI elements which are obviously not used here. It's just not a user friendly API. Hell, that applies to anything that uses COM

_d7dt · on May 26, 2021

Could that be solved by having better golang bindings for COM/OLE? Is it a problem of too much boilerplate?

xyzzy_plugh · on May 26, 2021

That's a big part of it. Another part is that most Go developers want to keep their Cgo dependencies as small as possible (preferably zero) for portability and performance reasons. It's also problematic that Go is not really OOP, and in particular immediate mode programming implies that state is inferred by the pass each frame -- meshing this with the COM/OLE model is bananas. You basically need to do everything the library is doing to draw, cache and render the UI again to render the state in COM. It's a ton of bookkeeping.

Worse still, while the Gio layout engine is pretty good at culling non-visible stuff, ultimately the framework doesn't know what will be visible until we're in GPU land, so there would have to be a ton of bookkeeping added to test if, for example, some text was clipped or if things were overlapping.

In practice, the best way to model things would be to just allow the developer to do something equivalent to pushing fields full of ARIA information like in the browser. But now you're losing all the advantages of the immediate mode and layout engine, etc.

It's a lose/lose situation, unfortunately.

I think the real solution is somewhere in the middle. I think screen readers need to grow some OCR abilities, and libraries like Gio need to learn some better navigation tech, like supporting tab, keyboard nav, etc. Something like that.

_d7dt · on May 26, 2021

Screen readers doing OCR is a big no-no. There is no possible way that will have the effect you want it to have. The text-to-speech part is the absolute bare minimum of what a screen reader is actually doing. For example that would break horribly with the situation you describe, where some piece of text gets clipped.

Just IMO, it's unrealistic to expect to have an accessible GUI that stores no state, does none of that type of bookkeeping, has no OOP model, and only outputs to pixels. The point with these type of assistive technologies is that the user can't work with that type of visual data, they need more state presented to them. Sure, it makes everything easier to develop when you cut it all out and only use immediate mode, but that's exactly the problem: everything else has been cut out, on purpose.

xyzzy_plugh · on May 27, 2021

> For example that would break horribly with the situation you describe, where some piece of text gets clipped.

Forgive my ignorance, but what should happen here? Let say I have a scrollable text pane with an entire novel in it.

I agree there should still be a mechanism to pass more accessibility data than OCR could extract. What is the bare minimum information that is required? Pretend there is no UX model -- arbitrary things could be presented (like in a video game).

In particular, with Gio there isn't necessarily a single set of widgets or UX. Gio has a small set of Material Design compliant widgets, but is mostly a library for composing immediate mode graphics. For example I have several custom widgets, some that only interact with the keyboard, some that don't have any text at all (just graphics or animations), one is a akin to a 2D-scrollable map, click-to-drag Google maps style. I'm not really sure where I'd begin in making these accessible. How should something like Google Earth be made accessible, ideally?

mwcampbell · on May 26, 2021

Have you heard of the Screen Recognition feature in iOS 14? I haven't had a chance to use it myself yet. From what I've heard, it's impressive, but not a complete solution, at least not yet. Apple published a paper about it here:

https://machinelearning.apple.com/research/creating-accessib...

_d7dt · on May 26, 2021

That's quite interesting, thank you. I haven't heard of that. My only concern with that is, I hope some app developers don't see it as an excuse to avoid using the native accessibility APIs and adding the necessary properties to their controls.

jcelerier · on May 26, 2021

Qt supports accessibility on windows without issues afaik, it's a couple thousand lines of code

xyzzy_plugh · on May 26, 2021

Qt is retained, not immediate mode.

egnehots · on May 26, 2021

really the major pain point is more the lack of cross platform APIs.

_d7dt · on May 26, 2021

Such cross platform API would belong in a cross platform toolkit, it really makes no sense for Windows to provide that. For example Qt has a cross platform accessibility API, and GTK is currently working on making theirs cross platform. Maybe somebody should move this functionality out into a standalone library?

mwcampbell · on May 26, 2021

> Maybe somebody should move this functionality out into a standalone library?

I've been thinking about this for a while. Below is an edited excerpt from a message that I wrote to a colleague a few months ago, about how I think I should go about such a project:

---

I've been thinking about the problem of multiple programming languages, and even multiple programming styles within the same language. Do we implement a C library that would make hard-core Unix and Linux folks happy? A library in a subset of C++ that some developers of games and graphics would be comfortable with? A more modern style of C++ that would make some other developers happy? And where would that leave folks working in Java, or C# like the Unity crowd? Of course, different platforms have varying levels of support for different languages. And there are more languages coming into popularity or on the horizon, like Swift and Rust.

So I think what we really want is a cross-platform message format or protocol, with multiple implementations: multiple providers on the application/toolkit side, and multiple client libraries on the platform side. The client libraries could be written in each platform's native language (e.g. C++ for Windows, Java for Android, Swift for Apple platforms, or JavaScript for the web), and separately, provider libraries could be implemented in multiple languages. All major programming languages can work with binary data buffers. And since none of us want to work directly with those raw bytes, we could use an existing standard like Google Protocol Buffers, which already has implementations in several languages. Initially, for desktop and mobile platforms as well as web applications, this protocol would just be used internally between components in the same application process. But I can dream about platforms themselves adopting the protocol someday. There would have to be glue layers between cross-platform providers and platform-specific clients, but if we design this right, the glue could be kept pretty thin, with most of the complexity being kept on one side or the other, so we don't have a multiplication of effort for n toolkits or applications on m platforms.

I think the protocol should be push-based, rather than pull-based like Windows UI Automation and some other accessibility APIs. That is, the application/toolkit would push full information about objects in the accessibility tree when it first creates that tree, and incremental updates when objects are created or destroyed, when properties change, when text content changes, etc. That's probably the only thing that's going to work for the web platform, and I think it's a model that other platforms would do well to adopt. (It's probably too late for Windows, but I dream of replacing the current accessibility model on desktop Linux someday, after my non-compete with Microsoft expires.) The challenge with a push model is making it efficient; we don't want to re-push the whole contents of a large text box when the user types a single character, and when we do need to push all contents, we need to do so efficiently. And sometimes we really do need to push a lot of information. For example, for a text box, we need the screen coordinates of every character, plus all of the boundaries between words.

Luckily, a push-based accessibility architecture has been done before, internally in the Chromium browser engine. As you probably know, Chromium has a multi-process model, where web pages are rendered in sandboxed processes, which communicate with a master browser process that interfaces with the OS. The browser process is not allowed to do blocking IPC requests into the renderer processes, so it can't implement synchronous, pull-based accessibility APIs like UIA in the obvious way. So the Chromium team implemented a protocol where the renderer processes push their accessibility trees, and incremental updates to those trees, over to the browser process, which can then store the trees in memory and then provide information to UIA or other APIs on request. The pushed trees are comprehensive, including the information I mentioned about text. Chromium does this using its own binary protocol called Mojo, which is kind of like Protocol Buffers but strongly tied to Chromium. So, while I'll take design inspiration from Chromium, I probably won't take the actual protocol or code.

I also want my protocol to scale down to embedded platforms. Accessibility on devices running embedded software (i.e. not Windows or another general-purpose OS) is basically an unsolved problem; as far as I know, device makers have to implement their own custom self-voicing interfaces, if they do anything about the problem at all. But imagine a standard where a user can pull out their smartphone, connect to the specialized device over Bluetooth or WiFi, and get an accessible interface to the device on their phone. Yeah, I'm swinging for the fence with this project. To pull this off, I think the protocol would need to be not only push-based but streaming, allowing providers to send out accessibility information without having to build up and maintain much extra state in memory. This would also help with the immediate-mode GUIs that are used in some games and game development tools. If we design the protocol right, these immediate-mode toolkits should be able to push out accessibility information and events at the same time that they're making OpenGL (or similar) calls to draw the current frame, again without having to hold much extra state in memory, which is something that these toolkits try to avoid.

neverartful · on May 27, 2021

I've been thinking of something very similar to this and have been working on the implementations. I currently have front-ends written in Java (Swing and SWT), Pascal (Delphi VCL, Delphi FMX, Lazarus LCL), C++ (wxWidgets and Gtk). I also have a HTML/JS prototype. I have backends written in Java, C#, Go, Crystal, and Nim. For the protocol I'm using JSON via HTTP. The more I work with this approach the more advantages I see. I think you're on to something.

mwcampbell · on May 27, 2021

Based on the set of front-ends, it sounds like the goal of your project is a cross-language GUI toolkit wrapper. Is that correct? The goal of my project is a cross-platform and cross-language accessibility abstraction.

_d7dt · on May 26, 2021

That seems like it could be interesting. It sounds like you would need to have part of a library store and diff the tree state, and then send updates based on that? You could use that part for immediate mode GUIs, and then retained mode GUIs could skip it and maintain their own trees, and just deal with the protocol, maybe even the raw protobuf bindings or whatever it is you choose.

The Linux desktop environments could really use more people working on the accessibility stack, it's really outdated at the moment.

jcelerier · on May 26, 2021

You can use d-bus on Mac, windows and Linux.

iudqnolq · on May 26, 2021

> Unity has no official support for screenreaders or colorblind modes. Unreal has both. If an major game engine developer like Unity cannot be bothered to add support, can we really expect the little developer hacking away on some OpenGL side project to?

Games are special because the means of interaction is often a core part of the experience. An accessibility mode for a game can be a huge amount of work. Not to say that some people don't do that huge amount of work (which is great) but games lacking accessibility is no excuse to use game-like development practices to write apps that could otherwise be relatively easily made accessibile.

CJefferson · on May 26, 2021

Apple's accessibility is excellent if you use their GUI library. Windows is good and improving.

I'm not positive what you want them to provide-- some kind of "fake GUI?" I might not be imaginative enough, but I can't imagine how that would work.

amelius · on May 26, 2021

In my opinion, accessibility should be implemented at the OS level. The OS can, for example, run OCR on the entire screen and turn bitmaps into selectable text, read text out loud, etc. This kind of functionality shouldn't be replicated on a per-app basis. Accessibility of browsers is usually considered to be very good, but browsers are almost OSes, so why not lift this one more level into the OS?

masklinn · on May 26, 2021

> In my opinion, accessibility should be implemented at the OS level.

Which it is. Apple has been praised for the accessibility of both macOS and iOS for years if not decades, accessibility concerns have been baked into Cocoa forever. And accessibility is literally a top level category of the Settings app.

And I know that Windows has been improving by leaps and bounds.

> The OS can, for example, run OCR on the entire screen and turn bitmaps into selectable text, read text out loud, etc

That exists (look up VoiceOver Recognition). However it can not be reliable, and will never be anywhere near as good as actual semantic annotation.

Image recognition has no way to understand that the physical UI layout has no relation to its logical setup, nor does it have any way to differentiate between semantic and decorative content, or to see through invisibility to know that a button is a menu versus an action.

_d7dt · on May 26, 2021

That is not really going to work, at all. To have a good experience, a screen reader needs more contextual information than just the text.

p_l · on May 26, 2021

OSes do provide accessibility APIs - and usually if you use one of the more supported graphic toolkits there are somewhat easy ways to integrate directly in semantic ways, without needing to OCR and do fortune telling.

Usually it means that you need a deferred mode semantic tree of the application for accessibility UI to walk through, though.