Whether or not that's a fair characterization of their justification, that seems like a perfectly defensible justification to me. Most people out there are dealing with legacy code (as in "code I didn't write") and business logic they can't safely rewrite, and it's more important to deliver business value than to build an emotionally-satisfying architecture. And even so, why design for concurrency when the high-level task (loading a web page in a bounded amount of time) doesn't require it? Personally, having a system that scales very well that doesn't need to scale isn't even an emotionally-satisfying architecture.
This is so true and is why I use "pure" functions where possible. Make it super obvious that a function can be replaced or reused safely and your future counterpart will praise your name to the code Gods.
Designing software for concurrent requests, as in using poll/epoll/kqueue/IOCP, implementing worker pools, etc. is given, yes.
Designing infrastructure for concurrent requests is definitely not. I've worked on shared hosting systems with high concurrency requirements and it definitely was more complicated than just installing an Apache MPM—we had to think about balancing load across multiple servers, whether virtualizing bare-metal machines into multiple VMs was worthwhile (in our case it was for a very site-specific reason), how many workers to run on each VM, how much memory we should expect to use for OS caching vs. application memory, how to trade off concurrent access to the same page vs. concurrent access to different pages vs. cold start of entirely new pages, whether dependencies like auth backends or SQL databases could handle concurrency and how much we needed to cache those, etc. At the end of the day you have a finite number of CPUs, a finite network pipe, and a finite amount of RAM. You can throw more money at many of these problems (although often not a SQL database) but you generally have a finite amount of money too.
I would be surprised if most people had the infrastructure to handle significantly increased concurrency, even at the same throughput, as their current load. It's not a sensible thing to invest infrastructure budget into, most of the time.
(You can, of course, solve this by developing software to actively limit concurrency. That's not a given for exactly the reasons that developing for concurrency is a given, and it sounds like Lucidchart didn't have that software and determined that switching back to HTTP/1.1 was as good as writing that software.)
Sure, but in this article the problem was lack of software concurrency. This article really does boil down to "HTTP/2 exposed a fundamental flaw in our software".
Maybe I misread? My takeaway was that their frontend web server handled concurrency just fine and was happy to dispatch requests to the backend in parallel, but the backend couldn't keep up and the frontend returned timeouts. That's exactly what you get if you put a bunch of multithreaded web servers in front of a single-writer SQL database that needs to be hit on every request.
Yes, most such cases should be rearchitected to not go through a single choke point. But my claim is that this isn't automatic merely by developing for the web, and going through a CP database system is a pretty standard choice for good reason.
I have operated services where each web server had a single thread, because that provided a better user experience than more "optimal" configurations. It had certain bizarre scaling implications for single-core performance and I wouldn't have designed it that way in today's era, but there are times when this makes sense. (For example, when the web server only serves authenticated sessions, and each session requires a bound mainframe connection for requests, and parallelism is not only impermissible to the mainframe but would exceed the capacity available.)
I feel like the description we should be using for that class of machine is "former mainframe". (I'm sort of joking)
In seriousness, though, I'm both curious and a little bit skeptical of what user experience benefit that architecture would give over a server-side request queue and a single worker against the queue. That would allow you to pay the cost of networking for the next request while the mainframe is working. You could even separate the submission of jobs from collecting the result so that a disconnected client could resume waiting for a response. Anyway, I'm not saying you needed all that to have a well-functioning system, I'm just not convinced that a single threaded architecture is ever actually good for the user unless it gives a marked reduction in overhead.
Queues are operational complexity. Given the (worst-case-ish) choice between "architecture without a queue that sometimes has HTTP-level timeouts" and "architecture with a queue that reliably renders a spinner and sometimes has human-task-level timeouts," I'd probably favor the former unless management etc. really want the spinner and I'm confident we have tooling to figure out why requests are getting stuck in the queue. Without that tooling, debugging the single-threaded architecture is much easier.
Sure! But that is trading off user experience for technical simplicity (which you do often have to do at some point). However: the argument was that this system was better for user experience than a design that could accept requests in parallel, which is what I'm resisting/not yet understanding. In reality, I'm sure that the system was fine for the use cases they had, which is what I meant to admit with "I'm not saying you needed all that". I will say that the single threaded no-queue design already carries a big risk of request A blocking request B.
My argument that this helps user experience is that, when a failure does happen, it's a lot easier to figure out why, tell the user that experienced it what happened and get them unblocked, and fix it for future users in a simpler system than a more complex one. The intended case is that failures should not happen, so if you're in the case where you expect your mainframe to process requests well within the TCP/HTTP timeouts and you can do something client-side to make the user expect more than a couple hundred ms of latency (e.g., use JS to pop up a "Please wait," or better yet, drive the API call from an XHR instead of a top-level navigation and then do an entirely client-side spinner), you may as well not introduce more places where things could fail.
If you do expect the time to process requests to be multiple minutes in some cases, then you absolutely need a queue and some API for polling a request object to see if it's done yet. If you think that a request time over 30 seconds (including waiting for previous requests in flight) is a sign that something is broken, IMO user experience is improved if you spend engineering effort on making those things more resilient than building more distributed components that could themselves fail or at least make it harder to figure out where things broke.
I have a lot of sympathy for their situation. I can imagine humming along just fine at 5 concurrent requests over 10 seconds, having busy days where you hit 10/10
But to suddenly hit 50 for 1 second, then nothing for 9 seconds, well, that’s a tough spot to be in.
There must be some hard to find sequencing happening there, that they were not really exposed to before.
The reverse is also quite stellar, because in some scenarios if you can lower peak burst demand per node from N to (0.1 x N) you can often reduce allocated capacity by a factor greater than 0.1. (This is more likely the case when N exceeds SOMAXCONN, for example.)
Author here. Our application can handle concurrent requests just fine. The problem was actually that our application was partly that our application was trying to handle too many requests in parallel instead of queueing them, and partly that later requests were timing out because our load balancers were configured to expect clients to make a request, wait for the response then send the next request, not send all the requests, then wait for all the responses (which means the last response will take longer to complete from when the request was first sent).
This misconstrues the comment it replies to. The post author's comment says that their application was tuned for a certain level of concurrency, and that when the level of concurrency to the load balancers increased due to the HTTP/2 change, their load balancers increased the level of concurrency to the backend, causing issues.
This is an extremely common issue with Apache configurations, which often default to accepting hundreds of simultaneous requests without regard for memory capacity. If peak demand causes enough workers to spin up that Apache starts swapping, the entire server effectively goes offline.
Depending on the specific characteristics of the application, this could occur when load increases from 50 concurrent requests to 51 concurrent requests, or from 200 to 201, or from any integer A to B where A was fine but B causes the server to become unresponsive.
Saying that their A is 1 seems unnecessarily dour, given how common this problem has been over the past couple decades due to Apache's defaults alone.
The problem, of course, is that HTTP2 is behaves like having infinite connections, so the "more threads" on the server are almost always detrimental to performance.
Less is more is the mantra I have unsuccessfully tried to drill. If your API (assuming a basic rest like service) is running at 100% cpu utilization, you've likely over provisioned it.
Possibly, just depends on what you are doing and where you are going from/to.
For example, 200 threads with 200 connections to a single service is insane and likely causing you to be slow already. Increasing that will negatively impact performance.
I'm guessing that the distinction being drawn has to do with whether the application can handle concurrent requests for the same resource from different sessions on different computers (yes?), vs concurrent requests for every resource at once from a single session (no?, even though this is a much smaller number of requests).
I don't want to get into it but I've dealt with issues like this and I don't think you are terrible or bad at your job. I'm seeing a lot of shade going your way. Sometimes a particular configuration works well until something changes. Just wanted to offer my two cents.
I think people are reacting badly because the title can create the impression that this is HTTP/2’s fault.
The actual post seems perfectly reasonable though (essentially “you might think you can just turn on HTTP/2 as a drop in on your load balancer a but if your server code hasn’t been written to rapidly handle the quick bursts of requests that enable HTTP/2 to provide faster overall loads to the client then this can cause issues; you should test first and make sure your server systems are able to handle HTTP/2 request patterns.)
Somewhat understandable. I didn't get hung up on the title, and if anything the story is an object lesson in the need to familiarize yourself with the intricacies of inbound changes to your stack.
I appreciate when people share war stories; I like to think that wisdom is knowledge survived.
Most places I've worked, the timeouts have been calibrated for people on 1990s dial-up, and the 99th-percentile response time targets have been 2 orders of magnitude less.
I'm curious what platform your app is on, if you are willing to share?
A typical non-tuned Rails deployment, for instance, is gonna have queueing built in, with really not as much concurrency as one would want (enough to actually fully utilize the host; the opposite problem). So I'm guessing you aren't on Rails. :)
Curious what you are on, if you're willing to share, for how it effects concurrency defaults and options and affordances.
(I know full well that properly configuring/tuning this kind of concurrency is not trivial or one-size-fits all. And I am not at all surprised that http/2 changed the parameters disastrously, and appreciate your warning to pay attention to it. I think those who are thinking "it shouldn't matter" are under-experienced or misinformed.)
> I'm curious what platform your app is on, if you are willing to share?
Sure. We use the Scala Play framework (https://www.playframework.com/). And it does have some queuing built into, but we have tweaked it to meet certain application needs.
I assume your application is using a dedicated thread for each request?
Even then you would be handling more requests in parallel than the number of cores you have, but your concurrency would be limited by the cost of context switching and your memory capacity (having to allocate a sizable stack for each thread in most threading implementations).
Queueing is usually required for a stable multi-threaded server, but if you were doing async I/O you wouldn't need it. The extra memory overhead for each extra concurrent request (by means of lightweight coroutine stacks, callbacks or state machines) is not much different from the size it would take on the queue, and there is no preemptive context switching.
In most cases, you'll get the same behavior as having a queue here. Cooperative task-switching happens only on async I/O boundaries, so if you're processing a request that requires some CPU-heavy work, your application would just hog a core until it completes the request and then move to the next one.
Sizeable stacks don’t eat into memory unless your thread actually utilizes the full stack allocation. Otherwise, the memory is available for other uses, so you can spin up more threads than you’d think.
Do you think your local pizza joint could handle every order that they normally get between 4pm and 6pm if they got them all between 5pm and 5:30pm? At the same level of service, with no late orders, and with no additional equipment or labor?
they are not getting them at 5 but at 4. they should be able to handle all orders in the next two hours.
the problem is not making the pizzas in time but trying to get all pizzas started at once when there is not enough table space to even roll out that much dough, and then trying to squeeze all the pizzas into the oven at once, whereby several of them got messed up.
At full service resturants, it is the maître d′s duty to control the pace of orders so they arrive in a steady stream at the kitchen, instead of batches of 20 tickets at once that could easily overwhelm the chefs.
The logic here is not dissimilar at all: if the backend has no ability to queue and prioritise the requests, then the same function needs to be done elsewhere to safeguard quality of service.
This is more like if a pizza joint that could seat 10 people at once moved into a new location that could seat 100 people but still kept the same wait and kitchen staff and is suddenly surprised that wait times have increased.
Well, with every pizza joint I’ve ordered from I can order a bunch of pizzas in just one phone call instead of having to make a separate phone call in serial for each pizza. And I certainly don’t have to wait for each pizza to be delivered before ordering another.
That's an unfair assessment. HTTP/2 fundamentally changes how requests are handled. With HTTP/1.1 there is a defacto connection pool inside the browser and this throttling has been a feature of front-end development for 15+ years (from when ajax became a thing) so this wasn't something on anybody's mind. HTTP/2 all of sudden removes this constraints and for lucidchart, it led to a number of unintended consequences. This is an important consideration because the mantra has been that HTTP/2 can simply be turned on and everything will simply work as before.
> With HTTP/1.1 there is a defacto connection pool inside the browser and this throttling has been a feature of front-end development for 15+ years
This is only true when you look at a single client. If you look at a larger number of clients accessing the service at the same time, you would expect similar numbers of concurrent requests on HTTP/2 as on HTTP/1.1. Clients send larger numbers of requests at the same time, but they are done sending requests earlier so there are requests from fewer clients being processed concurrently. It should average out.
If you have, say, a 1000 clients accessing your service in one minute, I doubt the number of requests/second would be very different between both protocol versions. It would only be an issue if the service was built with a small number of concurrent users in mind.
You may be forgetting that load balancers have been working on a per request basis, and that no two requests are the same cost (despite what load balancer companies would have you believe).
Under HTTP/1.1 requests may have been hitting the LB and then being scattered across a dozen machines. Each of those machines was in a position to respond on their own time scale. Some requests would get back quickly, others slowly, but still actively being handled.
Under HTTP/2 with multiplexing, if the LB isn't set up to handle it (and they often aren't) they can be hitting the LB and _all_ ending up on a single machine, which is trying to process them while some of those requests might be requiring more significant processor resources, dragging the response rate for all the requests down simultaneously.
But it didn't, unless you're saying that Lucidchart made an incorrect analysis. Is that your argument?
>Clients send larger numbers of requests at the same time, but they are done sending requests earlier so there are requests from fewer clients being processed concurrently. It should average out.
Again, it didn't average out. And you assume it 'will average out' at your peril. Maybe it will, maybe it won't. Lucidchart engineers thought that too and it turns out that was wrong in a way that wasn't foreseen.
>It would only be an issue if the service was built with a small number of concurrent users in mind.
I doubt Lucidchart 'was built with a small number of concurrent users in mind'.
It literally says “we are aware that our application has underlying problems with large numbers of concurrent requests”. How much clearer than that so you want it ?
If I'm reading this article correctly, they're claiming their application couldn't handle the load of a single-user loading their webpage. They didn't talk about load spikes during certain times, so it certainly sounds like they just have an inadequate backend.
> all existing applications can be delivered without modification....
> The only observable differences will be improved performance and availability of new capabilities...
Lucidcharts may have an inadequate backend, but it wasn't a problem until they moved to HTTP/2, so those statements weren't true for them. For anyone else rolling out HTTP/2, that is worth bearing in mind.
There's also no indication on what kinds of requests were timing out, nor if it was possible to send fewer requests, or minimize static assets (if those were the problem).