The simplest load balancing I've done is modulo the user ID by the number of ser...

alxv · on June 26, 2017

What happens when the number of servers changes? The cache hit rate would likely drop to zero until it warms up again, which is a good way to accidentally overload your systems.

Load balancing based on consistent hashing is the better way to implement this.

adrianN · on June 27, 2017

When the number of server changes you slowly ramp up from mod n to mod (n+1). Flip a biased coin for each user to decide whether to use n or n+1, slowly crank up the bias to the n+1 side.

mnutt · on June 26, 2017

Consistent hashing is a bit cleaner way to do it, but pretty much the same result as modulo-ing the user id against number of servers. At least as I understand it, you consistently hash something (a user id, a request URL, etc) into N buckets, where N is the number of servers, so changing N re-shuffles all of the buckets anyway.

Short of something like cassandra's ring topology, how would you use consistent hashing add new servers and assign them requests?

alxv · on June 26, 2017

You are missing a crucial piece here to have consistent hashing: you also need hash the names of the servers. With consistent hashing you hash both the names of the requests and of the servers, then you assign the request to the server with closest hash (under the modulus). With this scheme, you only need to remap 1/n of the keys (where n is the number of servers).

alexgartrell · on June 27, 2017

You're kind of right. You can also use something like jump consistent hash [0] which only requires you to have a consistent ordering of the hosts where you're sending the information. We (Facebook) use something similar for our caches. It requires a linear array of hosts but you've already got that if you're load balancing.

[0] https://arxiv.org/abs/1406.2294

mnutt · on June 27, 2017

That makes a lot of sense, thanks.

Better consistent hashing means that existing servers don't have their caches invalidated, but the new servers that were just added start with empty caches anyway so are fielding all uncached requests. Hopefully the bottleneck is actually with some shared layer behind it (a database or something) otherwise I guess you'd need to come up with a more complex way to slowly distribute more traffic to the new nodes.

mjb · on June 26, 2017

Proper consistent hashing will only move (on average) K/N assignments when going from N to N+1 servers, for K original assignments.

boyter · on June 27, 2017

Or how memcached does it https://www.adayinthelifeof.nl/2011/02/06/memcache-internals...

justifier · on June 27, 2017

though a clever way to be able to ignore the real problem eventually time will force you to revisit from base principles

user_id%numb_server may work early on when user activity and uptake are consistent,

but what happens when user activity becomes more complex: increase in users, some users abandoning the platform, others using it more; and that complexity lacks homogeneous distribution through this only concerned property: 'user id';

what if over time you gain more users but the majority of people who drop the platform have a user_id%numb_servers==2|11|13|17

in this case you would have some servers working hard while others sitting dormant

what is the real distribution of the relation between activity and user_id over time? asymptotic(o)? similar to the prime numbers(i)? a gaussian distribution(ii)? a benford distribution(iii)?

whichever future dada will show to be the best fit, most distributions show a strong trend toward eventual favoring of values

which i think implies, to ensure an even distribution of work across servers, the problem requires something with greater dimensionality than modulo on an immutable value that is defined serially

(o) https://en.wikipedia.org/wiki/Asymptotic_analysis

(i) https://en.wikipedia.org/wiki/Prime_number_theorem

(ii) https://en.wikipedia.org/wiki/Probability_density_function

(iii) https://en.wikipedia.org/wiki/Benford's_law

hvidgaard · on June 27, 2017

I can think of a few ways to rebalance things on the fly, but I would probably just hash some immutable values for the user, like Id and name, together with a nonce. If a server get's overloaded, slowly move people from it by changing their nonce.

justifier · on June 27, 2017

if you are introducing a nonce why use a hash? with a performance reflective mutating nonce on the user id modulo works as is

if you are introducing monitoring on a per user precision why use modulo? with a per user scheduled monitoring moving users based on user ids works as is

maybe i was unclear in the above but i like the gp's simple solution.. especially because i personally have an affection for the modulo operator, but also because.. it only requires an operator that performs in a scale dependent finitely specific number of cycles and works as designed without any monitoring

the above was intended to bring attention to shortcomings and probable failures in an otherwise elegant attempt

the method is flawed but the direction is superb

hvidgaard · on June 27, 2017

You could use nonce to determine what server to use. But I didn't want to choose directly, just the ability to chance the output of the hash for whatever reason.

I did not want to use just any non random rebalancing mechanics to avoid advesaries attacking that implementation. With a hash the output is deterministic, but unpredictable.

justifier · on June 27, 2017

What are your adversarial concerns?

adrianmonk · on June 27, 2017

Arguably, that's sharding, not load balancing. If you want to get picky with terminology at least.

Anyway, I do have a point beyond being pedantic: this offers two advantages that a fixed sharding scheme doesn't. #1: it doesn't need to identify a piece of data on the request to shard off of. #2: it actively (though imperfectly) attempts to achieve similar utilization on every server.

cbhl · on June 27, 2017

Facebook chat used to use this scheme, but eventually had to move away from it because it made it difficult to add and remove servers to the fleet in response to load.

ww520 · on June 27, 2017

User ID is not exactly random material. Might be better to do mod(md5(user_id), server_count) to scatter the bits using MD5.

mrkurt · on June 27, 2017

That's very simple consistent hashing. Consistent hashing is great when you want to trade even load for localized data.

In fact, we use consistent hashing when we accept requests, and two random choices when we deliver them to the apps. This works much better for _most_ of the apps we see. We're typically worried about cache data for a particular app. The app instances themselves, though, tend to be mostly stateless and disposable.

candu · on June 27, 2017

One problem with this is the "long tail" issue: in many applications, you have a highly active small minority of users, and any server assigned to enough of those users will be overloaded while other servers are underutilized. Since these are also (typically) your most excited / engaged users, this effectively penalizes user behaviors you'd rather encourage.

The other main problem is that it's not a consistent hash: if you grow the server pool, you typically need to reshard a lot of content.

(It's still useful in a pinch, but it helps to be aware of the tradeoffs.)

kinkrtyavimoodh · on June 26, 2017

But where is the modulo being calculated?

toomuchtodo · on June 26, 2017

[removed, brain failure]

lordvarys · on June 26, 2017

In the original comment, user mentions that the modulo logic would not require a loadbalancer server. So, I would assume what the user meant is that you do not require a high throughput loadbalancer. But you still need some entity to do the modulo work as well as health-checking servers to calculate modulo for active servers only.

kornish · on June 26, 2017

This is how many horizontally scalable OLTP databases operate too (e.g. DynamoDB, Citus): picking a partition key, then deterministically routing work associated with that partition key to the proper, well, partition.