I know I’m probably being naive about this, but is it stupid to ask if there’s a way to make multi process work better on Linux - rather than “fixing” PG?
I feel like the thread vs process thing is one of those pendulums/fads that comes and goes. I’d hate to see PG go down a rabbit hole only to discover the OS could be modified to make things go better.
(I understand not all PG instances run on Linux, just using it as an example)
That'll likely be an even bigger task, and harder to get into mainline kernel.
Linux multi-process is already pretty efficient compared to Windows. However, multi-process is inherently less efficient than multi-thread due to more safety predicates / isolation guaranteed by the kernel, I feel lowering it might lead to more security issues, similar to how Hyper Threading triggered a bunch of issues with Intel Processors.
Right - yeah I was really just wondering if some of the safety predicates could be reduced when there is a relationship between processes, such as the mitigations against cache attacks. I think the cache misses caused by multi-process were one of the reasons given that it's slower than threading. But I don't understand why this is necessarily the case given that the shared memory and executable text ultimately refer to the same data. But I suppose this would need to work with processor affinity and other elements to prevent the cache being knocked around by non-PG processes, and I guess this is one place where it starts getting complicated.
That said, please understand that I'm just being curious - I really don't know what I'm talking about, I haven't built a Linux kernel or dabbled in Unix internals in like 20 years, but thanks for replying :) Postgresql is my favourite open source project and I'm spooked by the threading naysayers.
The TLB is basically keyed by (address space, virtual address % granularity), or needs to be flushed entirely when switching between different views of the address space (e.g. switching between processes). Unless your address space is exactly the same, you're at least going to duplicate TLB contents. Leading to a lower hit rate.
This isn't really an OS issue, more a hardware one, although potential hardware improvements would likely have to be explicitly utilized by operating systems.
Note that the TLB issue is different from the data / instruction cache situation.
> I feel like the thread vs process thing is one of those pendulums/fads that comes and goes.
In this context threads can be understood as processes that share the same address space and vice-versa processes as threads with separate address space.
One gives you isolation, the other convenience and performance. Either can be desirable.
I feel like the thread vs process thing is one of those pendulums/fads that comes and goes. I’d hate to see PG go down a rabbit hole only to discover the OS could be modified to make things go better.
(I understand not all PG instances run on Linux, just using it as an example)