Queues are operational complexity. Given the (worst-case-ish) choice between "ar...

clhodapp · on April 23, 2019

Sure! But that is trading off user experience for technical simplicity (which you do often have to do at some point). However: the argument was that this system was better for user experience than a design that could accept requests in parallel, which is what I'm resisting/not yet understanding. In reality, I'm sure that the system was fine for the use cases they had, which is what I meant to admit with "I'm not saying you needed all that". I will say that the single threaded no-queue design already carries a big risk of request A blocking request B.

geofft · on April 23, 2019

My argument that this helps user experience is that, when a failure does happen, it's a lot easier to figure out why, tell the user that experienced it what happened and get them unblocked, and fix it for future users in a simpler system than a more complex one. The intended case is that failures should not happen, so if you're in the case where you expect your mainframe to process requests well within the TCP/HTTP timeouts and you can do something client-side to make the user expect more than a couple hundred ms of latency (e.g., use JS to pop up a "Please wait," or better yet, drive the API call from an XHR instead of a top-level navigation and then do an entirely client-side spinner), you may as well not introduce more places where things could fail.

If you do expect the time to process requests to be multiple minutes in some cases, then you absolutely need a queue and some API for polling a request object to see if it's done yet. If you think that a request time over 30 seconds (including waiting for previous requests in flight) is a sign that something is broken, IMO user experience is improved if you spend engineering effort on making those things more resilient than building more distributed components that could themselves fail or at least make it harder to figure out where things broke.