The simplest load balancing I've done is modulo the user ID by the number of servers then point at that server.
This solves caching too since you are only ever receiving and caching user data on a single server. No cache communication required. You can enforce it on the server side for security as well.
Doesn't require a load balance server - just an extra line of code.
What happens when the number of servers changes? The cache hit rate would likely drop to zero until it warms up again, which is a good way to accidentally overload your systems.
Load balancing based on consistent hashing is the better way to implement this.
When the number of server changes you slowly ramp up from mod n to mod (n+1). Flip a biased coin for each user to decide whether to use n or n+1, slowly crank up the bias to the n+1 side.
Consistent hashing is a bit cleaner way to do it, but pretty much the same result as modulo-ing the user id against number of servers. At least as I understand it, you consistently hash something (a user id, a request URL, etc) into N buckets, where N is the number of servers, so changing N re-shuffles all of the buckets anyway.
Short of something like cassandra's ring topology, how would you use consistent hashing add new servers and assign them requests?
You are missing a crucial piece here to have consistent hashing: you also need hash the names of the servers. With consistent hashing you hash both the names of the requests and of the servers, then you assign the request to the server with closest hash (under the modulus). With this scheme, you only need to remap 1/n of the keys (where n is the number of servers).
You're kind of right. You can also use something like jump consistent hash [0] which only requires you to have a consistent ordering of the hosts where you're sending the information. We (Facebook) use something similar for our caches. It requires a linear array of hosts but you've already got that if you're load balancing.
Better consistent hashing means that existing servers don't have their caches invalidated, but the new servers that were just added start with empty caches anyway so are fielding all uncached requests. Hopefully the bottleneck is actually with some shared layer behind it (a database or something) otherwise I guess you'd need to come up with a more complex way to slowly distribute more traffic to the new nodes.
though a clever way to be able to ignore the real problem eventually time will force you to revisit from base principles
user_id%numb_server may work early on when user activity and uptake are consistent,
but what happens when user activity becomes more complex: increase in users, some users abandoning the platform, others using it more; and that complexity lacks homogeneous distribution through this only concerned property: 'user id';
what if over time you gain more users but the majority of people who drop the platform have a user_id%numb_servers==2|11|13|17
in this case you would have some servers working hard while others sitting dormant
what is the real distribution of the relation between activity and user_id over time? asymptotic(o)? similar to the prime numbers(i)? a gaussian distribution(ii)? a benford distribution(iii)?
whichever future dada will show to be the best fit, most distributions show a strong trend toward eventual favoring of values
which i think implies, to ensure an even distribution of work across servers, the problem requires something with greater dimensionality than modulo on an immutable value that is defined serially
I can think of a few ways to rebalance things on the fly, but I would probably just hash some immutable values for the user, like Id and name, together with a nonce. If a server get's overloaded, slowly move people from it by changing their nonce.
if you are introducing a nonce why use a hash? with a performance reflective mutating nonce on the user id modulo works as is
if you are introducing monitoring on a per user precision why use modulo? with a per user scheduled monitoring moving users based on user ids works as is
maybe i was unclear in the above but i like the gp's simple solution.. especially because i personally have an affection for the modulo operator, but also because.. it only requires an operator that performs in a scale dependent finitely specific number of cycles and works as designed without any monitoring
the above was intended to bring attention to shortcomings and probable failures in an otherwise elegant attempt
You could use nonce to determine what server to use. But I didn't want to choose directly, just the ability to chance the output of the hash for whatever reason.
I did not want to use just any non random rebalancing mechanics to avoid advesaries attacking that implementation. With a hash the output is deterministic, but unpredictable.
Arguably, that's sharding, not load balancing. If you want to get picky with terminology at least.
Anyway, I do have a point beyond being pedantic: this offers two advantages that a fixed sharding scheme doesn't. #1: it doesn't need to identify a piece of data on the request to shard off of. #2: it actively (though imperfectly) attempts to achieve similar utilization on every server.
Facebook chat used to use this scheme, but eventually had to move away from it because it made it difficult to add and remove servers to the fleet in response to load.
That's very simple consistent hashing. Consistent hashing is great when you want to trade even load for localized data.
In fact, we use consistent hashing when we accept requests, and two random choices when we deliver them to the apps. This works much better for _most_ of the apps we see. We're typically worried about cache data for a particular app. The app instances themselves, though, tend to be mostly stateless and disposable.
One problem with this is the "long tail" issue: in many applications, you have a highly active small minority of users, and any server assigned to enough of those users will be overloaded while other servers are underutilized. Since these are also (typically) your most excited / engaged users, this effectively penalizes user behaviors you'd rather encourage.
The other main problem is that it's not a consistent hash: if you grow the server pool, you typically need to reshard a lot of content.
(It's still useful in a pinch, but it helps to be aware of the tradeoffs.)
In the original comment, user mentions that the modulo logic would not require a loadbalancer server.
So, I would assume what the user meant is that you do not require a high throughput loadbalancer. But you still need some entity to do the modulo work as well as health-checking servers to calculate modulo for active servers only.
This is how many horizontally scalable OLTP databases operate too (e.g. DynamoDB, Citus): picking a partition key, then deterministically routing work associated with that partition key to the proper, well, partition.
This solves caching too since you are only ever receiving and caching user data on a single server. No cache communication required. You can enforce it on the server side for security as well.
Doesn't require a load balance server - just an extra line of code.
Keep it simple.