Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So Rust's ipnsort is actually one of the better sorting implementations across a variety of input sizes. Faster than avx512 sort on small arrays, slower on mid sized arrays, and almost the same on large ones.

The avx512 sort seems like a questionable choice given it requires specialized hardware and is not the best option under many real-world conditions.



I'd like to emphasize that ipnsort is primarily designed as the new Rust std library unstable sort, which means it optimizes for all these factors simultaneously:

- input type (including stuff that is not an integer or float)

- input size

- input pattern

- prediction state

- binary size

- compile times

- varying ISAs and Hardware

Especially binary size and compile times mean I'm not chasing the performance crown for integers in some HPC setting. I can trivially pull out another 7% perf by ignoring binary size, compile times and effects on types that are not integers https://github.com/Voultapher/sort-research-rs/commit/d908fe....


This is a point that I'm kind of fuzzy on: is there a specific requirement to not change the algorithm too much based on type/comparison? Like if the user calls it on a 1-byte type with default comparison, the best way to do this in a vacuum is a counting sort. Smaller generated code and everything. Both C/C++ and Rust developers seem unwilling to do this, but not having asked I don't know why exactly.

On the other hand Julia just switched to radix sort much of the time in their latest version, so other languages can have different approaches.


As a datapoint, numpy also chooses between radix and timsort based on data type. From the docs [^0]:

> ‘stable’ automatically chooses the best stable sorting algorithm for the data type being sorted. It, along with ‘mergesort’ is currently mapped to timsort or radix sort depending on the data type. API forward compatibility currently limits the ability to select the implementation and it is hardwired for the different data types.

[^0]: https://numpy.org/doc/stable/reference/generated/numpy.sort....


It's possible that AVX512 sort does better on newer architectures, no? Author's machine is a 3 generations old cpu.


IIRC, there were issues with it causing frequency throttling on Intel cpus, whereas AMD's avoid that by "double pumping". It would be very interesting to compare and contrast there.

Seems like there's be value in it for all the new ML hotness that's come about. The AMD 7950x seems like it hits the sweet spot for that.


Yeah, that would probably change the calculus. There were major frequency (heat) issues on older Intel cpus when running avx512.


The vqsort README says it is a non-issue on that generation CPU (Skylake-X) based on their benchmarks: https://github.com/google/highway/tree/master/hwy/contrib/so... At least on their low clock speed server CPU (<=3GHz), downclocking was hard to measure compared to clock speed variability they got with std::sort.

The 10980xe used for windows benchmarks here normally boosts much higher, so bigger differences could be expected. The author of OP mentioned measuring clock speeds with perf and seeing some difference.


There's also the complication that AVX512 was removed in 12xxx and down, and iirc has a softlock in 11xxx.


Only one way to find out -- benchmarks, innit?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: