Go to file
Volodymyr Sapsai f8fd069e44 Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik.
Also made the length of the first line to be 80 characters where possible.
2012-11-25 22:09:37 +00:00
more_tests Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
tests Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
CMakeLists.txt Fixed build under MinGW (issue #71). 2012-07-15 08:54:54 +00:00
LICENSE.TXT Add a test to protect against syntax errors in main(). Also 2011-04-13 03:11:35 +00:00
Makefile Update to reflect changes in Clang. Fixed linking. 2012-08-11 17:49:24 +00:00
README.txt Update with instructions on how to build against llvm+clang 2.9 2011-06-17 19:45:08 +00:00
fix_includes.py fix_includes was not correctly merging in 'includes are all 2011-09-23 17:18:46 +00:00
fix_includes_test.py fix_includes was not correctly merging in 'includes are all 2011-09-23 17:18:46 +00:00
gcc.libc.imp Read private to public mappings from external file. Patch by Kim Gräsman. 2012-10-14 22:39:30 +00:00
gcc.stl.headers.imp Read private to public mappings from external file. Patch by Kim Gräsman. 2012-10-14 22:39:30 +00:00
gcc.symbols.imp Read private to public mappings from external file. Patch by Kim Gräsman. 2012-10-14 22:39:30 +00:00
google.imp Read private to public mappings from external file. Patch by Kim Gräsman. 2012-10-14 22:39:30 +00:00
iwyu.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu.gcc.imp Read private to public mappings from external file. Patch by Kim Gräsman. 2012-10-14 22:39:30 +00:00
iwyu_ast_util.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_ast_util.h Fixed reporting destructor usage for objects created with 'new' (issue #65). 2012-08-19 22:11:16 +00:00
iwyu_cache.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_cache.h A few small fixes to get the build working. 2011-05-04 20:52:23 +00:00
iwyu_driver.cc Update to reflect changes in Clang. Now DiagnosticOptions are intrusively reference-counted (r166508). 2012-10-25 21:01:41 +00:00
iwyu_driver.h Add a test to protect against syntax errors in main(). Also 2011-04-13 03:11:35 +00:00
iwyu_getopt.cc Added getopt_long for Windows (resolves issue #52). 2012-08-11 18:15:00 +00:00
iwyu_getopt.h Added getopt_long for Windows (resolves issue #52). 2012-08-11 18:15:00 +00:00
iwyu_globals.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_globals.h Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_include_picker.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_include_picker.h Enhanced search paths of mapping files. Patch by Kim Gräsman. 2012-11-25 21:26:06 +00:00
iwyu_lexer_utils.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_lexer_utils.h Include-what-you-use fixit -- run iwyu on itself to fix up includes (part 2). 2011-05-04 18:32:59 +00:00
iwyu_location_util.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_location_util.h Instead of include location use included filename location in 2012-06-12 20:40:02 +00:00
iwyu_output.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_output.h Some refactoring to straighten out dependencies. 2011-12-01 02:30:18 +00:00
iwyu_path_util.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_path_util.h Enhanced search paths of mapping files. Patch by Kim Gräsman. 2012-11-25 21:26:06 +00:00
iwyu_preprocessor.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_preprocessor.h Update to reflect changes in Clang. Now InclusionDirective callback has FilenameRange argument. 2012-09-30 20:07:16 +00:00
iwyu_stl_util.h iwyu was egregiously wrong in how it handled template 2011-08-31 11:42:23 +00:00
iwyu_string_util.h Include-what-you-use fixes by running it on itself. 2011-05-04 18:29:59 +00:00
iwyu_test_util.py Read private to public mappings from external file. Patch by Kim Gräsman. 2012-10-14 22:39:30 +00:00
iwyu_verrs.cc Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
iwyu_verrs.h Fixed file heading comments not matching the filename (issue #83). Patch by Ryan Pavlik. 2012-11-25 22:09:37 +00:00
port.h Fixed build under MinGW (issue #71). 2012-07-15 08:54:54 +00:00
run_iwyu_tests.py Don't ask me why, but the automated system I use for updating patches wants to undo Paul Holden's latest patches and then re-apply them. I'm sure there's a way to skip doing that, but I don't know what it is, so I'm going to let it do its thing. This is step 2 (the re-apply). 2011-08-01 20:58:24 +00:00
third_party.imp Read private to public mappings from external file. Patch by Kim Gräsman. 2012-10-14 22:39:30 +00:00

README.txt

====================
Include What You Use
====================

"Include what you use" means this: for every symbol (type, function,
variable, or macro) that you use in foo.cc (or foo.cpp), either foo.cc
or foo.h should #include a .h file that exports the declaration of
that symbol. (Similarly, for foo_test.cc, either foo_test.cc or foo.h
should do the #including.)  Obviously symbols defined in foo.cc itself
are excluded from this requirement.

This puts us in a state where every file includes the headers it needs
to declare the symbols that it uses.  When every file includes what it
uses, then it is possible to edit any file and remove unused headers,
without fear of accidentally breaking the upwards dependencies of that
file.  It also becomes easy to automatically track and update
dependencies in the source code.

======
CAVEAT
======

This is alpha quality software -- at best.  It was written to work
specifically in the Google source tree, and may make assumptions, or
have gaps, that are immediately and embarrassingly evident in other
types of code.  For instance, we only run this on C++ code, not C or
Objective C.  Even for Google code, the tool still makes a lot of
mistakes.

While we work to get IWYU quality up, we will be stinting new
features, and will prioritize reported bugs along with the many
existing, known bugs.  The best chance of getting a problem fixed is
to submit a patch that fixes it (along with a unittest case that
verifies the fix)!

============
How to Build
============

You will need the clang and llvm trees on your system, such as by
checking out their SVN trees:
   http://clang.llvm.org/get_started.html

Then download the include-what-you-use tarball and unpack it the
/path/to/llvm/tools/clang/tools directory:
   llvm/tools/clang/tools$ tar xfz include-what-you-use-<whatever>.tar.gz
Or, alternately, get the project directly from svn:
   llvm/tools/clang/tools$ svn co http://include-what-you-use.googlecode.com/svn/trunk/ include-what-you-use

Then cd into the include-what-you-use directory (under tools) and type
'make'.

Include what you use makes heavy use of clang internals, and will
occasionally break when clang is updated.  For best results, download
clang as of the same revision number of the last include-what-you-use
release.  You can find this revision number in comments at the top
of the include-what-you-use Makefile.

-- BUILDING AGAINST LLVM 2.9

These are the commands I ran, which worked for me:

1) Visit http://llvm.org/releases/download.html#2.9
2) Download the appropriate binaries ("Clang Binaries for Linux/x86_64")
3) Untar the download file, cd to its root directory
4) Run this command:
      g++ -D_DEBUG -D_GNU_SOURCE -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-rtti -I include -L lib ~/devel/llvm/tools/clang/tools/include-what-you-use/*.cc -lclangFrontend -lclangSerialization -lclangDriver -lclangSema -lclangAnalysis -lclangAST -lclangParse -lclangLex -lclangBasic -lLLVMipo -lLLVMScalarOpts -lLLVMInstCombine -lLLVMTransformUtils -lLLVMipa -lLLVMAnalysis -lLLVMTarget -lLLVMMC -lLLVMCore -lLLVMSupport -lpthread -ldl -lm -o include-what-you-use

==========
How to Run
==========

The easiest way to run IWYU over your codebase is to run
   make -k CXX=/path/to/llvm/Debug+Asserts/bin/include-what-you-use
or
   make -k CXX=/path/to/llvm/Release/bin/include-what-you-use

(include-what-you-use always exits with an error code, so the build
system knows it didn't build a .o file.  Hence the need for -k.)

We also include, in this directory, a tool that automatically fixes up
your source files based on the iwyu recommendations.  This is also
alpha-quality software!  Here's how to use it (requires python):

   make -k CXX=/path/to/llvm/Debug+Asserts/bin/include-what-you-use > /tmp/iwyu.out
   python fix_includes.py --nosafe < /tmp/iwyu.out

If you don't like the way fix_includes.py munges your #include lines,
you can control its behavior via flags. fix_includes.py --help will
give a full list, but these are some common ones:
    * -b: Put blank lines between system and Google #includes
    * --nocomments: Don't add the 'why' comments next to #includes

WARNING: include-what-you-use only analyzes .cc (or .cpp) files built
by 'make', along with their corresponding .h files.  If your project
has a .h file with no corresponding .cc file, iwyu will ignore it.
include-what-you-use supports the AddGlobToReportIWYUViolationsFor()
function which can be used to indicate other files to analyze, but
it's not currently exposed to the user in any way.

============================
How to Correct IWYU Mistakes
============================

1) If fix_includes.py has removed an #include you actually need, add
   it back in with the comment '// IWYU pragma: keep' at the end of
   the #include line.  Note that the comment is case-sensitive.

2) If fix_includes has added an #include you don't need, just take it
   out.  We hope to come up with a more permanent way of fixing later.

3) If fix_includes has wrongly added or removed a forward-declare,
   just fix it up manually.

4) If fix_includes has suggested a private header file (such as
   <bits/stl_vector.h>) instead of the proper public header file
   (<vector>), you can fix this by inserting this comment near the
   top of the private file (assuming you can write to it):
      // IWYU pragma: private, include "the/public/file.h"
   The full list of 'iwyu pragma' comments are at the top of
   iwyu_preprocessor.h.

=================
Running the Tests
=================

To run the iwyu tests, run
   python run_iwyu_tests.py

It runs one test for each .cc file in the tests/ directory.  (We have
additional tests in more_tests/, but have not yet gotten the testing
framework set up for those tests.)  The output can be a bit hard to
read, but if a test fails, the reason why will be listed after the
'ERROR:root:Test failed for xxx' line.

If fixing a bug in clang, please add a test to the test suite!  You
can create a file called whatever.cc (*not* .cpp), and whatever.h, and
whatever-<extension>.h.  You may be able to get away without adding
any .h files, and just #including "direct.h" -- see, for instance,
tests/remove_fwd_decl_when_including.cc.

When fixing fix_includes.py, add a test case to fix_includes_test.py
and run
   python fix_includes_test.py

=========
Debugging
=========

It's possible to run include-what-you-use in gdb, to debug that way.
Another useful tool -- especially in combination with gdb -- is to get
the verbose include-what-you-use output.  See iwyu_output.h for a
description of the verbose levels.  Level 7 is very verbose -- it
dumps basically the entire AST as it's being traversed, along with
iwyu decisions made as it goes -- but very useful for that:
   env IWYU_VERBOSE=7 make -k CXX=/path/to/llvm/Debug+Asserts/bin/include-what-you-use 2>&1 > /tmp/iwyu.verbose

===============
Developer Notes
===============

The codebase is strewn with TODOs of known problems, and also language
constructs that aren't adequately tested yet.  So there's plenty to
do!  Here's a brief guide through the codebase:

* iwyu.cpp: the main file, it includes the logic for deciding when a
  symbol has been 'used', and whether it's a full use (definition
  required) or forward-declare use (only a declaration required).  It
  also inclues the logic for following uses through template
  instantiations.

* iwyu_output.cpp: the file that translates from 'uses' into iwyu
  violations.  This has the logic for deciding if a use is covered by
  an existing #include (or is a built-in).  It also, as the name
  suggests, prints the iwyu output.

* iwyu_preprocessor.cpp: handles the preprocessor directives, the
  #includes and #ifdefs, to construct the existing include-tree.  This
  is obviously essential for include-what-you-use analysis.  This file
  also handles the iwyu pragma-comments.

* iwyu_include_picker.cpp: this finds canonical #includes, handling
  hard-coded private->public mappings (like bits/stl_vector.h ->
  vector) and symbols with multiple possible #includes (like NULL).

* iwyu_cache.cpp: holds the cache of instantiated templates (may hold
  other cached info later).  This is data that is expensive to compute
  and may be used more than once.

* iwyu_globals.cpp: holds various global variables.  We used to think
  globals were bad, until we saw how much having this file simplified
  the code...

* iwyu_*_util(s).h and .cpp: utility functions of various types.  The
  most interesting, perhaps, is iwyu_ast_util.h, which has routines
  that make it easier to navigate and analyze the clang AST.  There
  are also some STL helpers, string helpers, filesystem helpers, etc.

* fix_includes.py: the helper script that edits a file based on the
  iwyu recommendations.

=====================
Why IWYU is Difficult
=====================

This last section is informational, for folks who are wondering why
include-what-you-use requires so much code and yet still has so many
errors.

Include-what-you-use has the most problems with templates and
macros. If your code doesn't use either, iwyu will probably do
great. And, you're probably not actually programming in C++...

USE VERSUS FORWARD DECLARE

Include-what-you-use has to be able to tell when a symbol is being
used in a way that you can forward-declare it. Otherwise, if you wrote

   vector<MyClass*> foo;

iwyu would tell you to #include "myclass.h", when perhaps the whole
reason you're using a pointer here is to avoid the need for that
#include.

In the above case, it's pretty easy for iwyu to tell that we can
safely forward-declare MyClass. But now consider

   vector<MyClass> foo;        // requires full definition of MyClass
   scoped_ptr<MyClass> foo;    // forward-declaring MyClass is ok

To distinguish these, clang has to instantiate the vector and
scoped_ptr template classes, including analyzing all member variables
and the bodies of the constructor and destructor (and recursively for
superclasses).

But that's not enough: when instantiating the templates, we need to
keep track of which symbols come from template arguments and which
don't. For instance, suppose you call MyFunc<MyClass>(), where MyFunc
looks like this:

   template<typename T> void MyFunc() {
      T* t;
      MyClass myclass;
      ...
   }

In this case, the caller of MyFunc is not using the full type of
MyClass, because the template parameter is only used as a pointer. On
the other hand, the file that defines MyFunc is using the full type
information for MyClass. The end result is that the caller can
forward-declare MyClass, but the file defining MyFunc has to #include
"myclass.h".

HANDLING TEMPLATE ARGUMENTS

Even figuring out what types are 'used' with a template can be
difficult. Consider the following two declarations:

   vector<MyClass> v;
   hash_set<MyClass> h;

These both have default template arguments, so are parsed like

   vector<MyClass, alloc<MyClass> > v;
   hash_set<MyClass, hash<MyClass>, equal_to<MyClass>, alloc<MyClass> > h;

What symbols should we say are used? If we say alloc<MyClass> is used
when you declare a vector, then every file that #includes <vector>
will also need to #include <memory>.

So it's tempting to just ignore default template arguments. But that's
not right either. What if hash<MyClass> is defined in some local
myhash.h file (as hash<string> often is)? Then we want to make sure iwyu
says to #include "myhash.h" when you create the hash_set
(otherwise the code won't compile). That requires paying attention to
the default template argument. Figuring out how to handle default
template arguments can get very complex.

Even normal template arguments can be confusing. Consider this
templated function:

template<typename A, typename B, typename C> void MyFunc(A (*fn)(B,C)) { ... }

and you call MyFunc(FunctionReturningAFunctionPointer()). What types
are being used where, in this case?

WHO IS RESPONSIBLE FOR DEPENDENT TEMPLATE TYPES?

If you say vector<MyClass> v;, it's clear that you, and not vector.h,
are responsible for the use of MyClass, even though all the functions
that use MyClass are defined in vector.h. (OK, technically, these
functions are not "defined" in a particular location, they're
instantiated from template methods written in vector.h, but for us it
works out the same.)

When you say hash_map<MyClass, int> h;, you are likewise responsible
for MyClass (and int), but are you responsible for pair<MyClass, int>?
That is the type that hash_map uses to store your entries internally,
and it depends on one of your template arguments, but even so it
shouldn't be your responsibility -- it's an implementation detail of
hash_map. Of course, if you say hash_map<pair<int, int>, int>, then
you are responsible for the use of pair. Distinguishing these two
cases from each other, and from the vector case, can be difficult.

Now suppose there's a template function like this:

   template<typename T> void MyFunc(T t) {
      strcat(t, 'a');
      strchr(t, 'a');
      cerr << t;
   }

If you call MyFunc(some_char_star), which of these symbols are you
responsible for, and which is the author of MyFunc responsible for:
strcat, strchr, operator<<(ostream&, T)?

strcat is a normal function, and the author of MyFunc is responsible
for its use. This is an easy case.

In C++, strchr is a templatized function (different impls for char*
and const char*). Which version is called depends on the template
argument. So, naively, we'd conclude that the caller is responsible
for the use of strchr. However, that's ridiculous; we don't want every
caller of MyFunc to have to #include <string.h> just to call
MyFunc. We have special code that (usually) handles this kind of case.

operator<< is also a templatized function, but it's one that may be
defined in lots of different files. It would be ridiculous in its own
way if MyFunc was responsible for #including every file that defines
operator<<(ostream&, T) for all T. So, unlike the two cases above, the
caller is the one responsible for the use of operator<<, and will have
to #include the file that defines it. It's counter-intuitive, perhaps,
but the alternatives are all worse.

As you can imagine, distinguishing all these cases is extremely
difficult. To get it exactly right would require re-implementing C++'s
(byzantine) lookup rules, which we have not yet tackled.

TEMPLATE TEMPLATE TYPES

Let's say you have a function

   template<template<typename U> T> void MyFunc() {
     T<string> t;
   }

and you call MyFunc<hash_set>(). Who is responsible for the 'use' of
hash<string>, and thus needs to #include "myhash.h"? I think
it has to be the caller, even if the caller never uses the string type
in its file at all. This is rather counter-intuitive. Luckily, it's
also rather rare.

TYPEDEFS

Suppose you #include a file "foo.h" that has typedef hash_map<Foo,
Bar> MyMap;. And you have this code:

   for (MyMap::iterator it = ...)

Who, if anyone, is using the symbol hash_map<Foo, Bar>::iterator? If
we say you, as the author of the for-loop, are the user, then you must
#include <hash_map>, which undoubtedly goes against the goal of the
typedef (you shouldn't even have to know you're using a hash_map). So
we want to say the author of the typedef is responsible for the
use. But how could the author of the typedef know that you were going
to use MyMap::iterator? It can't predict that. That means it has to be
responsible for every possible use of the typedef type. This can be
complicated to figure out. It requires instantiating all methods of
the underlying type, some of which might not even be legal C++ (if,
say, the class uses SFINAE).

Worse, when the language auto-derives template types, it loses typedef
information. Suppose you wrote this:

   MyMap m;
   find(m.begin(), m.end(), some_foo);

The compiler sees this as syntactic sugar for find<hash_map<Foo, Bar,
hash<Foo>, equal_to<Foo>, alloc<Foo> >(m.begin(), m.end(), some_foo);

Not only is the template argument hash_map instead of MyMap, it
includes all the default template arguments, with no indication
they're default arguments. All the tricks we used above to
intelligently ignore default template arguments are worthless here. We
have to jump through lots of hoops so this code doesn't require you to
#include not only <hash_map>, but <alloc> and <utility> as well.

MACROS

It's no surprise macros cause a huge problem for include-what-you-use.
Basically, all the problems of templates also apply to macros, but
worse: we can analyze an uninstantiated template, but we can't analyze
an uninstantiated macro -- the macro likely doesn't even parse cleanly
in isolation. As a result, we have very few tools to distinguish when
the author of a macro is responsible for a symbol used in a macro, and
when the caller of the macro is responsible.

PRIVATE INCLUDES

Suppose you write vector<int> v;. You are using vector, and thus have
to #include <vector>. Even this seemingly easy case is difficult,
because vector isn't actually defined in <vector>; it's defined in
<bits/stl_vector.h>. The C++ standard library has hundreds of private
files that users are not supposed to #include directly. Third party
libraries have hundreds more. There's no general way to distinguish
private from public headers; we have to manually construct the proper
mapping.

In the future, we hope to provide a way for users to annotate if a
file is public or private, either a comment or a #pragma. For now, we
hard-code it in the iwyu tool.

The mappings themselves can be ambiguous. For instance, NULL is
provided by many files, including stddef.h, stdlib.h, and more. If you
use NULL, what #include file should iwyu suggest? We have rules to try
to minimize the number of #includes you have to add; it can get rather
involved.

UNPARSED CODE

Conditional #includes are a problem for iwyu when the condition is
false:

#if _MSC_VER
#include <foo>
#endif

If we're not running under windows (and iwyu does not currently run
under windows), we have no way of telling if foo is a necessary
#include or not.

PLACING NEW INCLUDES AND FORWARD-DECLARES

Figuring out where to insert new #includes and forward-declares is a
complex problem of its own (one that is the responsibility of
fix_includes.py). In general, we want to put new #includes with
existing #includes. But the existing #includes may be broken up into
sections, either because of conditional #includes (with #ifdefs), or
macros (such as #define __GNU_SOURCE), or for other reasons. Some
forward-declares may need to come early in the file, and some may
prefer to come later (after we're in an appropriate namespace, for
instance).

fix_includes.py tries its best to give pleasant-looking output, while
being conservative about putting code in a place where it might not
compile. It uses heuristics to do this, which are not yet perfect.