As with many things experience plays a large role. I've now spent more than a year a probably more than a thousand hours on and around the topic of sort implementations. The way I think about this now is very different to the start. I have a background of investigating and wanting to understand compilers and how they interact with modern hardware. My mental model of how hardware works is usually my start-off point to try out ideas. More often than not my ideas don't work out and my mental model is wrong. That's ok, modern hardware is a giant bunch of shared mutable state that interacts in really non-trivial ways. So I guess yes I've developed an intuition, but I always try to prove it wrong with experiments. Sometimes I sit down with pen and paper and scribble ideas down, sometimes I start coding an implementation directly, sometimes I start with an experiment setup, sometimes I search for what other people have done in that space, and I've even tried asking Chat-GPT about some stuff but that has been mostly useless. Specifically with high-performance sort implementations it helps to think of them as a bunch of components that each have their own role. Giving a simple example of a quicksort based implementation, you have partitioning, small-sort, fallback and some kind of pattern analysis. Each component exists to fulfill a certain role, partitioning is the algorithmic core and should perform well for sizes above your small-sort threshold up to really large sizes. The small-sort is maybe the most perf critical part, quicksort will spend most of the calls, not necessarily most of the time, in the small-sort, classically insertion sort e.g. https://user-images.githubusercontent.com/6864584/200373619-.... The fallback is there to make sure that you have a worst case Big O(N * log(N)). It should take over if a certain recursion limit is reached based on the input size, eg. `2 * (len | 1).ilog2()`. And the pattern analysis, for example a simple streak analysis usually sits at the start and makes sure that fully ascending and descending inputs can be sorted in O(N - 1). Hope this answers your question, and thanks for asking :)