More

irogers · on June 27, 2024

Just to advertise the perf tool has inbuilt flamegraph generation code these days (well leaning on D3.js). So `perf script report flamegraph` will convert a perf.data file into a flamegraph.html. Similarly there is `perf script report gecko` to write out the firefox profiler's json format.

irogers · on Feb 20, 2024

Agreed this is awesome, obviously sanitizers fill some of this gap currently but they aren't great with things like reference counting that RAII makes a doddle. Fwiw, here is an implementation of a runtime RAII style checking on top of leak sanitizer: https://perf.wiki.kernel.org/index.php/Reference_Count_Check... There's an interesting overlap with the cleanup attribute that is now appearing in the Linux kernel (by way of systemd): https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

thradams · on Feb 20, 2024

Cake implements defer as an extension, where ownership and defer work together. The flow analysis must be prepared for defer.

    int * owner p = calloc(1, sizeof(int));
    defer free(p);

However, with ownership checks, the code is already safe. This may also change the programmer's style, as generally, C code avoids returns in the middle of the code.

In this scenario, defer makes the code more declarative and saves some lines of code. It can be particularly useful when the compiler supports defer but not ownership.

One difference between defer and ownership checks, in terms of safety, is that the compiler will not prompt you to create the defer. But, with ownership checks, the compiler will require an owner object to hold the result of malloc, for instance. It cannot be ignored.

The same happens with C++ RAII. If you forgot to free something at our destructor or forgot to create the destructor, the compiler will not complain.

In cake ownership this cannot be ignored.

    struct X {
      FILE * owner file;
    };

    int main(){
       struct X x = {};
       //....
       
    } //error x.file not freed

irogers · on Feb 15, 2024

Somewhat related, "Data-type profiling for perf": https://lwn.net/Articles/955709/

irogers · on Feb 10, 2024

For a collection of profilers you can also check out the Profilerpedia: https://profilerpedia.markhansen.co.nz/

irogers · on Nov 8, 2023

There are also GC techniques to make the pause shorter, for example, doing the work for the pause concurrently and then repeating it in the safepoint. The hope is that the concurrent work will turn the safepoint work into a simpler check that no work is necessary. Doubling the work may hurt GC throughput.

irogers · on Sept 25, 2023

This is awesome work and textual being able to support terminal or web (https://github.com/Textualize/textual-web) also gives hope that this can be more than a terminal app. I'm hoping that in the future features like this can be standard in Linux's perf tool, for example, Firefox profiler support was recently added as a Google summer-of-code contribution: https://perf.wiki.kernel.org/index.php/Tutorial#Firefox_Prof...

irogers · on Feb 15, 2023

prodfiler clearly has a market. It would be interesting to see the approach as something standard in the kernel tree, perhaps it can be added to perf's synthesis, etc. There is already BPF based profiling within perf to avoid file descriptor overheads. If engineering resources are the issue then this could be a good GSoC project: https://wiki.linuxfoundation.org/gsoc/2023-gsoc-perf

javierhonduco · on Feb 15, 2023

This would be ideal. There's some great work by folks at Oracle in this space: SFrame (https://www.phoronix.com/news/GNU-Binutils-SFrame) née ctf_frame that I hope will be integrated in the kernel.

As this will take few years, in the meantime I've developed a DWARF-based unwinder in BPF [0]. Some perf maintainers showed interest in this, so thanks for bringing up the GSoC project idea, didn't occur to me!

[0]: https://news.ycombinator.com/item?id=33788794

tdullien · on Feb 16, 2023

Yeah. I really like the ideas proposed by Brendan Gregg -- essentially encouraging every HLL runtime to embed an eBPF-based unwinder in it's own executable. The upshot of that would be "generic, in-production unwinding of native code and HLL code", similar to what prodfiler is doing, but inside the main kernel tree...

irogers · on Feb 15, 2023

AMD will have support in Zen4 and Linux 6.1 (which is LTS):

https://lore.kernel.org/lkml/Yz%2FcpNTSacRMh1FK@gmail.com/

Further, precise events are fixed in Linux 6.2:

https://lore.kernel.org/lkml/Y5eQeR2tpZ%2FBos49@gmail.com/

fooblaster · on Feb 16, 2023

What does precise events in perf help with?

irogers · on Feb 15, 2023

DWARF bytecode is a full VM. Do compiler writers test their DWARF output? (my experience is not - especially for architectures out of the big 2 or 3) How does the kernel access the ELF file pages with the DWARF information in when in an NMI handler? You could mlock all your debug information when a program loads but the memory overhead wouldn't be nice. It is hard enough getting a build ID.

The elephant in the room btw is LBR call stacks, but they aren't exposed in the kernel/BPF yet. Userland perf has them and they recently became available on AMD.

zznzz · on Feb 15, 2023

It is not required to unwind the user space stack in the NMI handler. It can be done later before returning to user space in a context that can handle faults.

irogers · on Feb 15, 2023

Allowing processes to sniff each others stacks has some fairly obvious security issues.

zznzz · on Feb 15, 2023

I don’t understand your concern - what about this would involve one process sniffing another process’s memory? The kernel would still be doing the unwinding, just not in the NMI handler.

irogers · on Feb 15, 2023

Wouldn't all your kernel stacks then end up in whatever this handler is? Why not implement your approach and mail it to LKML :-)

zznzz · on Feb 15, 2023

Yes, this only works for user space stacks, but that is sufficient since with ORC kernel stacks are solved (IMO) and it avoids all the issues with trying to mlock debuginfo of all processes that you mentioned. The NMI handler would still unwind the kernel stack.

> Why not implement your approach and mail it to LKML :-)

because this would still be an in-kernel dwarf unwinder and I would expect an instant reject, and because I am lazy and/or don’t care enough about this problem or linux to work on it. Even if people could be persuaded, I don’t have the interest or temperance to debate this with LKML.

irogers · on Feb 15, 2023

This could be a great Linux perf GSoC project. Projects and mentors are being looked for: https://wiki.linuxfoundation.org/gsoc/2023-gsoc-perf