Include-what-you-use: A tool to analyze includes in C and C++ source files

marcodiego · on April 20, 2021

IWYU is responsible for many lines of code that have been removed from libreoffice: https://cgit.freedesktop.org/libreoffice/core/log/?qt=grep&q...

chris_wot · on April 21, 2021

And it has been awesome :-)

inetknght · on April 20, 2021

My experience with IWYU has been mixed. In general it's a success. But it had trouble identifying that some headers were only conditionally needed (eg, debug build or macro conditional). Those cases are easy to work with if you own the code but can be annoying if it's in a third party lib.

That said, I do highly recommend its use.

anand-bala · on April 20, 2021

I've found that using IWYU Pragmas [1] for codebases you own and IWYU Mappings [2] for third-party libraries __almost__ entirely eliminates weird IWYU suggestions (there are a few annoyingly stupid suggestions from the tool I just ignore).

I've also recently been making libraries I write compatible with users that run IWYU by annotating all public headers with IWYU pragma comments that export symbols/transitive includes correctly, etc.

[1]: https://github.com/include-what-you-use/include-what-you-use...

[2]: https://github.com/include-what-you-use/include-what-you-use...

vbernat · on April 20, 2021

Is that robust? Depending on the system, libc, compiler, some includes may be unused while others may be needed.

quantumofalpha · on April 20, 2021

iwyu is Google's project originally. It has worked for them for more than a decade on their ginormous monorepo.

Sometimes it gets some things wrong, so you have these escape hatches to control it: https://github.com/include-what-you-use/include-what-you-use...

cperciva · on April 20, 2021

Even with a monorepo this isn't necessarily safe -- if you have a mix of x86 and arm servers, you'll need different headers included for intrinsics for example.

quantumofalpha · on April 20, 2021

Conditionally-off blocks of code under #ifdefs are challenging for it, yes - it runs a proper C++ compiler on your code and won't get to see code in those blocks without the right defines.

Don't blindly apply its suggestions - test them, skim to see what it got wrong, sprinkle some "// IWYU pragma: keep" to help it out in corner cases. The tool is more like a linter, you don't follow everything that your linter tells you to, no?

kccqzy · on April 21, 2021

The fix then is simple: run IWYU twice, the first time targeting x86 and then targeting arm. Then you merge the results of these two runs. For conflicts, just don't remove anything.

FWIW: when I run this tool my experience tends to be that it adds more includes than it removes, because I guess I rely too much on transitive includes.

GeorgeTirebiter · on April 21, 2021

I've taken to adding a "--source" switch to my personal utilities. That way, "program --source" dumps source code to stdout, which can then be captured, modified, and used directly -- without needing to search for the source. (yes, I know this has nothing to do specifically with .h files)

dasloop · on April 20, 2021

C++ 20 Modules will save us (eventually)

wyldfire · on April 20, 2021

Kinda. I think a primary use case for modules is to help with out-of-control compile times.

But the specific problem of include-what-you-use will still be encountered if you include directly from C libraries like system headers or library dependencies.

Kranar · on April 20, 2021

Unfortunately modules do not have a significant impact on compile times and in some cases can increase compiles times due to inhibiting parallelism.

gsliepen · on April 20, 2021

This is not true. Compile times are usually much better with modules. They also don't inhibit parallelism, but perhaps you are referring to this paper (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p144...), which shows that, with compiler versions from 2019, it can indeed be slower to compile with modules if you have a large number of threads and the depth of the module dependency graph is large.

Kranar · on April 20, 2021

Yes that's the correct benchmark.

Do you have evidence that the situation has changed? Last I checked it still remains the case that modules inhibit parallelism and hence result in slower builds in most practical work loads. But of course if you have evidence the contrary I'd be happy to see it.

gsliepen · on April 20, 2021

I don't know of any newer benchmarks. However, I'm reading the results differently I guess, because the results show that with 128 threads, modules become slower only when the DAG depth is higher than 29, and that's quite a large depth! It also looks like each source file used in the benchmark only imports other modules and declares 300 variables, but nothing else. Practical workloads will have more interesting stuff in the source files, so I would expect the impact of module loading to be less, so more can be done in parallel.

account4mypc · on April 20, 2021

> with 128 threads, modules become slower only when the DAG depth is higher than 29

yeah, but the same graph never shows modules being faster... it only ever shows them being the same or slower. If I'm going to put in all that work, the result should be *faster*

volta83 · on April 20, 2021

> This is not true. Compile times are usually much better with modules.

What significantly improves compile-times is Pre-Compiled Headers (PCH), which most compilers have supported for decades.

The study you mention, does not show data for them.

Having ported one >1 million LOC C++ app to use modules in two compilers, the compile time improvement of modules over PCH was not distinguishable from noise.

Modules have many advantages, like better encapsulation, etc.

The main thing people want from them seems to be better compile times, which is the one thing they don't deliver, at least over the PCH solutions that have existed for decades, are already supported by all build systems, etc.

Compared to modules, PCHs are "zero-effort" and deliver performance instantaneously.

colomon · on April 20, 2021

Off-topic, but is there a guide to best practices for portable pre-compiled headers out there somewhere? I'm under considerable pressure to add pre-compiled headers for Windows to my code, and it won't have any significant benefit for me unless I can also make it work on MacOS and Linux. So far my Googling has turned up little information for any platform other than Windows, and nothing that would suggest how to do it well for all three platforms. (Well, more to the point, Visual C++, clang, and g++.)

volta83 · on April 20, 2021

Does your project use CMake ?

CMake supports these with all major compilers...

I'll just google "<your build system> pre-compiled headers" and see if there is a flag or option that you can enabled.

You will definetly need quite a bit of fine tuning for apps over 500k LOC or so, but if your app is under that, and you are splitting code between .h and .cpp files appropriately, just flipping a flag might get you 80% there.

The speed ups you see people get from PCHs is like 20-30% faster compile-times. So they are more a "nice to have" feature than something that will solve your compile-time problems.

If your app is structured in such a way that it takes 20 min to compile, this can cut it to 15 min at most, but that would probably still suck. If you want more, then you'd need to consider other solutions like distributed build caches (sccache, etc.).

jcelerier · on April 20, 2021

With cmake it's just target_precompile_headers: https://cmake.org/cmake/help/latest/command/target_precompil...

dasloop · on April 20, 2021

My understanding is just the opposite, they will decrease compilation times as "included files" are processed just once. We can see them as a better version of precompiled headers (although they are more than that).

Kranar · on April 20, 2021

Yes except that includes are usually not the performance bottleneck, it's the semantic analysis that consumes the bulk of the compile times.

Modules inhibit parallelism because modules are ordered along a DAG and must be compiled from the root of the DAG down to the leafs in order. So consider a traditional setup as follows:

A.cpp <- A.h <- B.h <- C.h <- D.h

B.cpp <- B.h <- C.h <- D.h

C.cpp <- C.h <- D.h

D.cpp <- D.h

All four of those cpp files can be built in parallel, even though you're right that all of the header files are being reparsed multiple times. My claim is that parsing header files is incredibly cheap, it's translating the .cpp files that's expensive because cpp files are where the bulk of the semantic analysis and type checking is performed.

With modules, the same compilation model looks like this:

A.mxx <- B.mxx <- C.mxx <- D.mxx

There's no longer header/source and there's no longer redundancy, but I can't build this in parallel anymore. I have to first build D.mxx, then C.mxx, then B.mxx then A.mxx in serial.

dbaupp · on April 20, 2021

Parsing a single header file in isolation is cheap, but each header will include others, and templates mean many headers contain large amounts of code inline. For instance, just including <vector> results in the compiler having to look at almost 30kloc, on my system:

   $ clang -x c++ -E - <<<"#include <vector>" | wc -l
   27378

Other headers are similar:

   algorithm 23103
   array     23450
   memory    15909
   random    52107
   thread    31424
   tuple      9240

(Of course, a bunch of this code is shared, e.g. including both thread and vector is “only” 35713 loc total, not 60kloc.)

I believe C++ compilers have SIMD-accelerated lexers/parsers because of the sheer explosion of code due to headers and templates.

pjmlp · on April 20, 2021

VC++ already does multihreading code generation across multiple compiler phases, using modules won't change that.

Kranar · on April 20, 2021

No it doesn't, cl.exe's compiler is an inherently single threaded application. Parallelism in VC++ is achieved by running multiple copies of cl.exe with one serving as the primary instance and the rest as followers. The primary instance forwards individual translation units to the followers and waits for the followers to complete compilation, then at the end the primary instance terminates and the linker is invoked.

pjmlp · on April 20, 2021

Not up to date?

https://docs.microsoft.com/en-us/cpp/build/reference/cgthrea...

Kranar · on April 20, 2021

That is a linker option, not a compiler option. Modules have no effect on linking one way or another as linking is fairly independent of the compilation process.

pjmlp · on April 20, 2021

I mentioned code generation, you don't execute .obj files.

Kranar · on April 20, 2021

Then your comment is off-topic and creates confusion. My point was modules inhibit the parallelism of the compilation process, compile times, not that it has any effect on the link times.

Modules do not have any effect on the linker one way or another. They are independent of it.

pjmlp · on April 20, 2021

Modules will bring compiler and linker work more closer, just like other languages not tainted by UNIX toolchain model.

Some C++ developers can keep using their pre-historic UNIX like tooling, whereas others will embrace the fusion of compiler, linker and build system.

Kranar · on April 20, 2021

How so? It will switch the problem from include what you use to import what you use.

Other languages with modules have a similar issue. Go is the only language I know of that makes it a hard compiler error to import an unused module.

ot · on April 20, 2021

Modules won't allow to rely on transitive includes, which is one half of the problem. It won't solve the other half (importing too much).

Agentlien · on April 21, 2021

We used this at a previous job of mine, back in ~2014.

It was part of a huge push where we spent months focusing primarily on code health and performance. The guy who did this part of the work said it needed a bit of manual intervention to really work, back then, but in the end it helped us eliminate a lot of includes and really speed up compilation.

The very measurable gains also led to new guidelines for how to write our code to try and maintain the speed we'd gained.

makecheck · on April 21, 2021

In my project I was lucky to have started early on consistent naming conventions so that I could tell from names alone what headers were needed. Then my script to reduce unnecessary headers was primarily concerned with finding an #include without any other references to its name in the file.

Of course, that approach was only straightforward by imposing other constraints. It would not work for a random project.

One of the downsides of high language complexity is that “simple” concepts like this require a whole compiler to be able to handle every case.

Another downside of high language complexity is that we probably only care about unnecessary includes because there is such a cost to just referring to things. If module references were cheap, easy to cache, etc. then it wouldn’t matter if we have a few extra ones.

anarazel · on April 20, 2021

I found IWYU pretty annoying, due to its tendency to also include transitive includes. Some of those often end up about implementation details and are much more likely to be added/removed. But maybe the projects using it that I worked on were using it wrong?

anand-bala · on April 20, 2021

If the issue is with including transitive dependencies that are in your own codebase, then you should annotate the public interface header to the implementation details with IWYU Pragmas [1] that export the implementation (for example [2]).

If this is in third-party libraries, you can use IWYU Mappings [3] to map the "private" headers (usually the transitive include) to the public interface. An example that I use for the PEGTL library [4].

[1]: https://github.com/include-what-you-use/include-what-you-use...

[2]: https://github.com/anand-bala/signal-temporal-logic/blob/800...

[3]: https://github.com/include-what-you-use/include-what-you-use...

[4]: https://github.com/anand-bala/signal-temporal-logic/blob/800...

johnnyapol · on April 20, 2021

I think it definitely can be a project thing. My experience with IWYU has been on very large codebases and I considered its ability to find transitive includes a blessing. The specific case where it shined for me was it made it much easier to identify the true impact of fileset changes on the larger codebase when it came to refactoring.

pabs3 · on April 21, 2021

One thing I like about Python is that it is easy to determine where every token in a file came from (unless you use wildcard imports, but I suggest not doing that).

I wonder if there is any way to achieve that sort of thing with C/C++ and other languages.

notemaker · on April 21, 2021

I've thought about this too. C libraries usually have sensible prefices though, e.g. curl_, and if not a man-page. So for C specifically I don't have this issue.

Ruby though... I really hate it. Overloading on numbers for example. Which module lets you do 5.some_verb? Where did you import it?

In python and C you have imports local to the file or included headers. In ruby it doesn't matter as long as it's been imported somewhere in the same runtime??

Absolutely bonkers.

PS: I am quite new to ruby so please enlighten me if I have it all wrong :)

daemin · on April 21, 2021

I think that's a key trade-off in duck typing and reopening classes and objects. You get immense power to do "cool", powerful, and useful things like that, but at the cost of debug-ability.

It's been a while since I really used Ruby in anger but to discover where these things came from you need to just look up the documentation of your dependencies and their dependencies, and see where something got added to. For me most of it came from ActiveSupport, and I ended up using that library by itself in other projects than just Rails.

notemaker · on April 22, 2021

"you need to just look up the documentation of your dependencies and their dependencies"

Exactly, which seems directly opposed to the Ruby ethos of happy developers.

I do appreciate the magic of having stuff like 5.bytes or 8.days, but I don't see the reason for having runtime imports or at least warnings that you're using modules required elsewhere.

teddyh · on April 21, 2021

On every #include line in C, I like to write a comment, listing every symbol used from that include file, in order of first use. I even wrote a Python script to help me re-generate the #include lines for such a C program.

ur-whale · on April 20, 2021

What I've always wanted is to write C++ code and have the minimal set of necessary includes needed to compile my code automatically added [edit: I should have said "managed"] by the IDE.

How close can this tool get to that goal?

burntoutfire · on April 20, 2021

I've just Googled the same question. The answers seem to glorify the suffering of writing C++ and suggest that the inquirer would perhaps be better off with switching to Java... Sounds like a case of Stockholm syndrome to me.

Anyway, I'm a beginner in C/C++ world and the most convincing solution I've found to use in my personal project is the Single Compilation Unit approach (https://en.wikipedia.org/wiki/Single_Compilation_Unit). It is exemplified in the Handmade Hero github repository (which I'm afraid is available for paying users only). Essentially, the whole program is divided into modules, each within its own single cpp file. The modules are then all included in the SCU, which is the only file passed to the compiler. There can be no circular dependencies between modules (as then, there would be no order of including them in SCU which would work). In HH's case, there seems to be an absolutely minimal number and volume of headers and they only define data structures, never declare functions.

alexhutcheson · on April 21, 2021

clang-include-fixer is the better tool for that: https://clang.llvm.org/extra/clang-include-fixer.html

aflag · on April 20, 2021

CLion is capable of adding missing includes, I'm not sure if it tells you about unused ones. They have a free trial, may be worth a try.

dang · on April 20, 2021

If curious, past threads:

Include-what-you-use: Clang tool to analyze includes in C and C++ source files - https://news.ycombinator.com/item?id=10958186 - Jan 2016 (40 comments)

Include what you use, remove superfluous #includes - https://news.ycombinator.com/item?id=2582115 - May 2011 (1 comment)

qbonnard · on April 20, 2021

I haven't used C++ in a while (sadly), but in my days I had stumbled upon deheader[0] which I don't seem mentioned here. From what I remember, it was very simple and easy to use, and yielded useful results.

[0] http://www.catb.org/~esr/deheader/deheader.html

Blikkentrekker · on April 20, 2021

i rather more like OCaml's way of doing things that often releases one entirely from having to write module inclusion directives, since in general bindings are qualied by their module name which becomes part of their namespace. So one would use `Array.map` in code, which is the `map` binding exported by the `Array` module, and the `Array` module is then included automatically, of course. This would be `array_map` in many languages to avoid conflicts, but modules in OCaml deliberately export short names on the expectation that bindings will be namespace qualified with their module name.

It is possible to explicitly open this module, so that one can use `map` instead, but that's generally not wise.

I find having to write a long list of include directives at the top of a file quite annoying, and this also does not betray in what module exactly bindings are defined that one might encounter in the code below them. If I encounter, say, `Net.Tcp.open` in Ocaml code, I know that this function is defined in `./net/tcp.ml`.

MauranKilom · on April 20, 2021

Take note:

> CAVEAT

> This is alpha quality software -- at best (as of July 2018). It was originally written to work specifically in the Google source tree, and may make assumptions, or have gaps, that are immediately and embarrassingly evident in other types of code.

> While we work to get IWYU quality up, we will be stinting new features, and will prioritize reported bugs along with the many existing, known bugs. The best chance of getting a problem fixed is to submit a patch that fixes it (along with a test case that verifies the fix)!

https://github.com/include-what-you-use/include-what-you-use...

Further useful docs:

Why Include What You Use? https://github.com/include-what-you-use/include-what-you-use...

What Is A Use? https://github.com/include-what-you-use/include-what-you-use...

Why Include What You Use Is Difficult https://github.com/include-what-you-use/include-what-you-use...

mort96 · on April 20, 2021

This looks really useful. I just submitted a pull request to make it easier to use in automated checks: https://github.com/include-what-you-use/include-what-you-use...

Fingers crossed.

m463 · on April 23, 2021

cmake seems to have incorporated this into the build system and you can turn it on with a flag

https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_INCL...

It also has link what you use.

https://cmake.org/cmake/help/latest/prop_tgt/LINK_WHAT_YOU_U...

bradford · on April 20, 2021

semi related, but I'm coming back to C++ after a long hiatus (15 years). I realize this is probably a newb question...

The code base I'm working in is very large and I have a recurring problem where I see a term (class/variable/etc) being used in a cpp file, and want to know which header file contains the definition.

What's the quickest, easiest way to do this?

I've been using grep, but the size of the code base, combined with the large number of #includes in each cpp file, makes this inefficient.

I believe I can use ctags/vim, but I last used that circa 2000 and I'm curious to know what other static analysis solutions have cropped up since then.

Does IWYU address this scenario? I'm using clang as a compiler if that's at all relevant.

anand-bala · on April 20, 2021

In most cases, what you are looking for is a language server like `clangd` (works for most compilers) [1].

You can find a Language Server Protocol implementation for your editor at [2] (I don't think it lists __all__ clients, but it should include the most popular ones).

EDIT: I realized that this is a vague answer, so let me clarify.

An LSP implementation (especially clangd) provides actions like `go-to definition` or `find references` that you would find in full-featured IDEs like CLion (which is also amazing BTW). Since you mentioned vim, I am guessing you use it and don't necessarily want to let go of the hand-crafted vimrc you have created. Adding an LSP plugin to Vim is incredibly easy and gives you these "IDE" features with customizable mappings.

[1]: https://clangd.llvm.org/

[2]: https://langserver.org/#implementations-client

bradford · on April 20, 2021

Thanks! I read about using LSP/Clangd with vim via [coc](https://github.com/clangd/coc-clangd) and I think that's the path I'll try going down.

Other responses, thanks for your input. Just want to clarify that I have tried VS and VSCode with limited success (sometimes search works, sometimes it doesn't, and my biggest gripe is an occasional lack of transparency into what's going on under the cover). I think any solution is going to require some investment on my part and LSP sounds like a good investment.

inetknght · on April 20, 2021

> The code base I'm working in is very large and I have a recurring problem where I see a term (class/variable/etc) being used in a cpp file, and want to know which header file contains the definition.

A good IDE will have a feature to let you locate the declaration and/or definition of any variable or type.

I've found that a lot of IDEs have that feature completely broken. Qt Creator, for example, is easily confused and comes with all kinds of Qt garbage^H^H^H^H^H^H^H^H baggage. CLion is a resource hog and often just hangs. Visual Studio is usually pretty good -- assuming you're using Windows. VS _Code_ is "okay" but I've found it's more of a headache to set up. I don't have experience with XCode since I've never used OSX for development.

I've found the most reliable way is to learn how to use `grep` and pair that with understanding where to search; the project source directory of course but also system headers and any libraries installed to non-system locations. That knowledge translates to usefulness in other workflows too.

drummer · on April 20, 2021

If you use Visual Studio, it is as easy as right clicking on the typename or variable and choosing to go to the declaration or definition.

f00zz · on April 20, 2021

I use ctags+vim every day with rather large C++ codebases (but then again I'm a dinosaur).

bradford · on April 20, 2021

Curious, I was under the impression that ctags offers a 'jump to definition' functionality, but little more. (i.e., 'find all references' isn't supported).

Is that correct? Do you use if for functionality beyond the 'jump to definition/jump back to previous context'?

f00zz · on April 20, 2021

That's correct, there's "jump to definition" but no "find all references". I've used cscope for a while for that, but never really got used to it (and it doesn't work that well with C++).

I really should get with the times and try LSP like others suggested.

blcArmadillo · on April 20, 2021

There are lots of options:

- Yes, ctags/vim would work

- You could use something like vscode

- Consider checking out cscope. With cscope you can also build a reverse index which lets you find where things are called. It can be used with something like vim but also has a pretty nice TUI.

jcelerier · on April 21, 2021

I use Qt creator and in that IDE it'd just be "F2" on the symbol (it uses clang behind the scenes so it's semantically accurate)

albalbalb · on April 21, 2021

is there something similar to this for python

bgschiller · on April 21, 2021

Unless you're importing *, pylint will tell you about both missing imports and unused ones.

albalbalb · on April 21, 2021

thanks

eliora · on April 20, 2021

i want to be a hacker

MaxBarraclough · on April 20, 2021

Welcome to HN.

Please check out the Guidelines and the FAQ, [0][1] regarding how best to participate. HN is friendly to curiosity, but not to off-topic comments, which is why you've been downvoted. If you'd like to discuss how to become a programmer, either find a thread where that's being discussed, or submit an Ask HN thread, following the style of this thread [2].

[0] https://news.ycombinator.com/newsguidelines.html

[1] https://news.ycombinator.com/newsfaq.html

[2] https://news.ycombinator.com/item?id=24810399