I didn't understand why the implementation wouldn't just do an atomic increment, but I guess Obj-C semantics provide too much magic to permit such a simple approach. The actual code, in addition to [presumably] not being inlined, does not seem easy to optimize at the hardware level: https://github.com/apple/swift-corelibs-foundation/blob/main...
The short answer for why it can’t just be an increment is because the reference count is stored in a subset of the bits of the isa pointer, and when the reference count grows too large it has to overflow into a separate sidetable. So it does separate load and CAS operations in order to implement this overflow behavior.
The native Swift retain (swift_retain above) seems to be somewhere inside this mess: https://github.com/apple/swift/blob/main/stdlib/public/runti...