Faster Mac dev tools with custom allocators

pcwalton · on Nov 2, 2021

Pathfinder got enormous benefits (3x performance difference as I recall?) switching from Apple's default allocator to jemalloc. Apple's allocator has not kept pace with the competition, especially for multithreaded workloads.

haberman · on Nov 2, 2021

I saw similar results in malloc()-heavy benchmarks. libc on macOS seems to have a fair number of performance traps; as another example, I found that timegm() on macOS is 100x slower than Linux, and 1000x slower than a reasonably optimized standalone algorithm: https://blog.reverberate.org/2020/05/12/optimizing-date-algo...

meisel · on Nov 2, 2021

Yeah, related to timegm(), I made this to fix a performance issue: https://github.com/michaeleisel/JJLISO8601DateFormatter

lukeh · on Nov 2, 2021

What’s stopping Apple switching its default allocator?

dwaite · on Nov 2, 2021

I can't speak to Apple's reasons specifically, but usually it comes down to:

1. Degenerate cases with new allocator designs, e.g. being better for some workloads and not others

2. Bug-for-bug compatibility - applications which break due to dependencies on undocumented behavior or the old allocator memory structures.

3. Boundary conflicts - systems where an allocator change would mean allocation and free are hitting different implementations across module boundaries, as one module allocates memory for another to consume. Some systems and programming languages are more vulnerable to this sort of issue.

vlovich123 · on Nov 2, 2021

A fourth one would be debugging support. They have a bunch of stuff in there (perhaps dated these days) to auto scribble on malloc/free, allocate guard pages etc. They probably could/should look into integrating a more modern allocator (mimalloc, jemalloc) and falling back to the old one when those features are needed (assuming they can’t bring them forward).

pcwalton · on Nov 2, 2021

MallocScribble is available as "opt.junk" in jemalloc [1]. As for guard pages, tcmalloc has TCMALLOC_PAGE_FENCE [2], and there is an issue [3] in jemalloc.

In any case, Apple and others have invested hugely in LLVM AddressSanitizer, so the Electric Fence-like malloc debugging features are considered more of a last resort these days.

[1]: http://jemalloc.net/jemalloc.3.html

[2]: https://chromium.googlesource.com/external/gperftools/+/gper...

[3]: https://github.com/jemalloc/jemalloc/issues/1664

vlovich123 · on Nov 2, 2021

When I worked at Apple they were the first resort as they were usually cheaper to iterate with (no recompilation, lower overhead, etc), but ASAN has only gotten better & was the primary focus. I agree that maybe jemalloc might be a good drop-in replacement, but it just might not be a priority for the libc maintainers. Lots of low hanging fruit can be missed for very long periods of time because you have to pick what to focus on.

GWP ASAN from TCMalloc might be the better direction for runtime support for memory corruption from malloc as it gives a lot of ASAN-like protection with an electric fence-like performance profile (& can be turned on/off at runtime).

1over137 · on Nov 2, 2021

Yeah, but ASan requires recompilation. GuardMalloc and MallocScribble apply to your whole process, and so even malloc()s called by system frameworks (ex AppKit) get checked. I've found several OS bugs this way over the years. Maybe internally Apple has ASan builds of the whole OS, but they don't distribute it externally.

saagarjha · on Nov 2, 2021

Security-critical components like the kernel and WebKit have ASan variants, that’s for sure. But they also use custom allocators.

pcwalton · on Nov 2, 2021

Valgrind works OK on macOS, right?

lukeh · on Nov 2, 2021

According to release notes, it supports 10.12 (with preliminary support for 10.13). That's several releases behind (10.14, 10.15, 11, 12).

1over137 · on Nov 2, 2021

Homebrew won't even install valgrind anymore.

jcelerier · on Nov 2, 2021

it hasn't for years afaik

saagarjha · on Nov 2, 2021

Apple has a whole set of heap debugging tools that they use internally and are quite useful even as a third party developer, and I’m sure they rely on having certain heap metadata around so they can reconstruct a memory graph after the fact. Take a look at heap(1) and leaks(1) to see what they offer.

lukeh · on Nov 2, 2021

I don't think 3. is an issue for Apple, because there is no static libSystem.

saagarjha · on Nov 2, 2021

My understanding is that Apple really cares about memory footprint, since they ship hundreds of daemons on machines that often don’t have much memory to spare. So IMO their allocator prioritizes that over say throughout, which might annoy app developers but isn’t necessarily a bad compromise. (In places where Apple has different needs, e.g. WebKit, different allocators are used.)

fay59 · on Nov 2, 2021

There’s more to measure with allocators than how fast it can pump out memory. You’d probably want to check for memory overhead and heap fragmentation and various security considerations.

pcwalton · on Nov 2, 2021

I doubt there is any particularly compelling reason other than not having yet paid down that piece of technical debt.

coldcode · on Nov 2, 2021

Custom allocators have always been available for specialized needs, even the JDK ships with a number of options which each can be tuned further. I wrote a fast and safe memory allocator for the old MacOS that was briefly popular before MacOSX appeared which obsoleted it. But having built such a beast (and written the tons of test apps you need to ensure it works under all conditions) there is always room to optimize for needs that you can't employ in a generalized allocator. Like everything, you can't optimize for all cases and still be good enough for the average case.

meisel · on Nov 2, 2021

I'd be really interested to see benchmarks of Apple's allocator versus others, both in memory consumption and performance. Sometimes, Apple's version of things is worse in almost all use cases, but I'll reserve judgment and wait to see numbers.

OnlyMortal · on Nov 2, 2021

We use a modified (by the author) jemalloc in a heavily threaded server that runs on Centos.

The reason was due to memory fragmentation over time and jemalloc reduced this to a level that we could live with.

We’ve tried other allocators over the years but still jemalloc, or our version of to be precise, is the winner in memory usage and fragmentation.

Edit: performance wise, gains can be had by doing your own memory pools on a per-thread basis and making use of stack objects rather than heap objects. Larger allocations for your own object pools and managing those reduces the calls to malloc also helps reduce memory fragmentation and your vmsize diverging from your rss.

OnlyMortal · on Nov 2, 2021

Just to add… std::vector can lead to fragmentation in heavily threaded applications. We found std::deque solved that issue.

alberth · on Nov 2, 2021

mimalloc. Has anyone given Microsoft allocator (MIT license) a try?

It appears to benchmark better than even jemalloc and others.

https://github.com/microsoft/mimalloc#Performance

meisel · on Nov 2, 2021

mimalloc is briefly mentioned in my article, and I was surprised that it worked on the mac for swapping itself in. I found the performance of it to be roughly the same as jemalloc when I measured

jeffbee · on Nov 1, 2021

For even more bang, rebuild LLVM with your custom allocator, profile guidance, and link-time optimization. If you're planning to invoke the compiler a million times, you might as well get it peak optimized.

pdimitar · on Nov 1, 2021

I'd love to read a detailed walkthrough about this!

vlovich123 · on Nov 1, 2021

https://llvm.org/docs/BuildingADistribution.html

liuliu · on Nov 2, 2021

Yeah, building Swift is not straight-forward. There are several repos needs to be checked out and coordinated in lock-steps. I usually starts with "swift/release/xxx" branches in llvm / cmark / swift, and call this to find what I am missing: https://github.com/apple/swift/blob/main/utils/build-script

Jarred · on Nov 2, 2021

Mimalloc improved Bun’s performance on macOS by 10%.

bsaul · on Nov 2, 2021

Are there any official benchmarks for swift compilation times ? I’m curious to see if they’ve been going up or down with the latest releases.

m_eiman · on Nov 2, 2021

Perhaps there's a sandbox escape hiding in the workaround they did to get swiftc to use their jemalloc build?

saagarjha · on Nov 2, 2021

It’s a build variable in Xcode telling it which compiler to use. At that point you have arbitrary code execution anyways as you control the build process.