So Rust's ipnsort is actually one of the better sorting implementations across a...

Voultapher · on June 10, 2023

I'd like to emphasize that ipnsort is primarily designed as the new Rust std library unstable sort, which means it optimizes for all these factors simultaneously:

- input type (including stuff that is not an integer or float)

- input size

- input pattern

- prediction state

- binary size

- compile times

- varying ISAs and Hardware

Especially binary size and compile times mean I'm not chasing the performance crown for integers in some HPC setting. I can trivially pull out another 7% perf by ignoring binary size, compile times and effects on types that are not integers https://github.com/Voultapher/sort-research-rs/commit/d908fe....

mlochbaum · on June 10, 2023

This is a point that I'm kind of fuzzy on: is there a specific requirement to not change the algorithm too much based on type/comparison? Like if the user calls it on a 1-byte type with default comparison, the best way to do this in a vacuum is a counting sort. Smaller generated code and everything. Both C/C++ and Rust developers seem unwilling to do this, but not having asked I don't know why exactly.

On the other hand Julia just switched to radix sort much of the time in their latest version, so other languages can have different approaches.

xoranth · on June 11, 2023

As a datapoint, numpy also chooses between radix and timsort based on data type. From the docs [^0]:

> ‘stable’ automatically chooses the best stable sorting algorithm for the data type being sorted. It, along with ‘mergesort’ is currently mapped to timsort or radix sort depending on the data type. API forward compatibility currently limits the ability to select the implementation and it is hardwired for the different data types.

[^0]: https://numpy.org/doc/stable/reference/generated/numpy.sort....

PartiallyTyped · on June 10, 2023

It's possible that AVX512 sort does better on newer architectures, no? Author's machine is a 3 generations old cpu.

pstuart · on June 10, 2023

IIRC, there were issues with it causing frequency throttling on Intel cpus, whereas AMD's avoid that by "double pumping". It would be very interesting to compare and contrast there.

Seems like there's be value in it for all the new ML hotness that's come about. The AMD 7950x seems like it hits the sweet spot for that.

slashdev · on June 10, 2023

Yeah, that would probably change the calculus. There were major frequency (heat) issues on older Intel cpus when running avx512.

celrod · on June 11, 2023

The vqsort README says it is a non-issue on that generation CPU (Skylake-X) based on their benchmarks: https://github.com/google/highway/tree/master/hwy/contrib/so... At least on their low clock speed server CPU (<=3GHz), downclocking was hard to measure compared to clock speed variability they got with std::sort.

The 10980xe used for windows benchmarks here normally boosts much higher, so bigger differences could be expected. The author of OP mentioned measuring clock speeds with perf and seeing some difference.

PartiallyTyped · on June 10, 2023

There's also the complication that AVX512 was removed in 12xxx and down, and iirc has a softlock in 11xxx.

threadl0cal · on June 10, 2023

Only one way to find out -- benchmarks, innit?