A long-standing problem with IWYU has been the preprocessor
metaprogramming techniques used in Boost.
IWYU is fine with cycles in the include graph, but did not accept cycles
in the corresponding mapping graph (a potential rationale might have
been to let people know if they explicitly added contradictory mappings,
but we really don't know what drove this decision).
Looking at the algorithm expanding the transitive closure of the mapping
graph, it expands all nodes in depth-first order. Cycle detection is
necessary to protect the algorithm from infinite recursion in the
presence of a cycle. But there should be no difference in the transitive
closure between a graph with cycles and one without. DFS will still
visit all nodes, it will just avoid visiting the same subtree twice.
Once the expansion is done, mapping lookup is an inherently
non-recursive operation (made cheaper by performing the recursive
expansion up-front). See the GetCandidateHeadersForFilepath family of
functions and their respective callers.
Two conclusions:
* Cycles are already detected by the transitive closure algorithm
* No code _after_ the transitive closure is affected by any cycles
Therefore, update the transitive closure expansion to ignore cycles
instead of failing on them. Add diagnostic logging at level 8 so we can
look into its decision-making if necessary.
Fixes issue #424
* Do all this logging on level 8, whether logging is on for file or not
* Always say "Add dynamic mappings..."
* Don't log quoted from->to, that's already logged in AddMapping
* State the reason dynamic mapping is added
* Include file paths where available
This should make it easier to troubleshoot strange effects from dynamic
mappings triggered by IWYU itself.
This is useful to generically remap files from one directory to another.
The following mapping would for instance map all files under the "foo/"
directory to "third_party/foo/", for instance, mapping "foo/bar.h" to
"third_party/foo/bar.h".
[ { include: ['@"(foo/.*)"', private, '"third_party/\1"', public ] } ]
This is useful for large project where different third_party libraries
are compiled together. In Chromium for instance, include paths would
look like:
-Ithird_party -Ithird_party/v8/include
This allows code in V8 to include its own headers directly. Other
third_parties would include them via "third_party/v8/include/...".
Because files are accessible from two different paths, IWYU can't know
which should be used. Using a mapping file, adding or removing the
"third_party/v8/include" prefix where needed, resolves this problem.
Only LLVM regexes need explicit anchoring; std::regex_match already has
full match semantics.
This saves us thousands of temporaries when using std::regex-based
dialects and actually improves processing time of
$ time include-what-you-use -Xiwyu --regex=ecmascript -Xiwyu --mapping_file=qt5_11.imp tests/981/t.cc
...
real 0m5,517s
user 0m5,504s
sys 0m0,013s
if only very slightly (~10%).
When building regexes from @-prefixed mappings, we wrapped the
expression in ^(...)$ to force llvm::Regex::match to match only the
whole string.
The parentheses were probably added in there by accident, we have no use
for a capture group in this scenario, and the anchors are fine without
grouping.
As reported in issue #981, using std::regex in IWYU has caused a
tremendous performance regression for large mapping files containing
regex mappings.
$ cat t.cc
#include <string>
# with llvm::Regex
$ time include-what-you-use -Xiwyu --mapping_file=qt5_11.imp t.cc
...
real 0m0,529s
user 0m0,509s
sys 0m0,020s
# with std::regex
$ time include-what-you-use -Xiwyu --mapping_file=qt5_11.imp t.cc
...
real 0m29,870s
user 0m29,717s
sys 0m0,012s
qt5_11.imp contains 2300+ regex mappings, and <string> has a bunch of
includes, so this is a good testbed for regular expression engines, but
over 50x slower is not the result we were hoping for.
The reason we switched to std::regex was to get support for negative
lookaround (llvm::Regex does not have it), but exotic regexes in
mappings are pretty rare, and this is a significant performance hit.
Introduce a --regex option to select regex dialect, with documented
tradeoffs. Put the default back to LLVM's fast implementation.
This fixes issue #981.
struct stat members are implemented through macros and other
stuff, which is defined in that headers, so map the header to the
same headers that struct stat is mapped.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
siginfo_t members are implemented through macros and other stuff,
which is defined in that header, so map the header to the same
headers that siginfo_t is mapped.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
SYS_xxx macros, which are implemented in glibc in
<bits/syscall.h>, are intended to be included from
<sys/syscall.h>. See syscall(2), amd also see the following:
$ grepc -k 'SYS_[a-z]\w+' /usr/include | head
/usr/include/x86_64-linux-gnu/bits/syscall.h:35:# define SYS_accept __NR_accept
/usr/include/x86_64-linux-gnu/bits/syscall.h:39:# define SYS_accept4 __NR_accept4
/usr/include/x86_64-linux-gnu/bits/syscall.h:43:# define SYS_access __NR_access
/usr/include/x86_64-linux-gnu/bits/syscall.h:47:# define SYS_acct __NR_acct
/usr/include/x86_64-linux-gnu/bits/syscall.h:51:# define SYS_acl_get __NR_acl_get
/usr/include/x86_64-linux-gnu/bits/syscall.h:55:# define SYS_acl_set __NR_acl_set
/usr/include/x86_64-linux-gnu/bits/syscall.h:59:# define SYS_add_key __NR_add_key
/usr/include/x86_64-linux-gnu/bits/syscall.h:63:# define SYS_adjtimex __NR_adjtimex
/usr/include/x86_64-linux-gnu/bits/syscall.h:67:# define SYS_afs_syscall __NR_afs_syscall
/usr/include/x86_64-linux-gnu/bits/syscall.h:71:# define SYS_alarm __NR_alarm
$ grep -rn bits/syscall.h /usr/include
/usr/include/x86_64-linux-gnu/bits/syscall.h:5:# error "Never use <bits/syscall.h> directly; include <sys/syscall.h> instead."
/usr/include/x86_64-linux-gnu/sys/syscall.h:27: programs expect the traditional form SYS_*. <bits/syscall.h>
/usr/include/x86_64-linux-gnu/sys/syscall.h:29:#include <bits/syscall.h>
I removed duplicate mappings, and also removed the mapping for
<syscall.h>, which is an undocumented feature of glibc. Portable
programs should stick to <sys/syscall.h> (and unportable ones
too, since there's no gain apart from those 4 bytes).
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
I couldn't find an documentation about where it should be defined
(POSIX doesn't define it), so my best guess was to find the
definitions experimentally. Combining my findings in the glibc
repo with the findings in my own Debian system, and ignoring
kernel headers and unrelated headers such as X11's, this is my
best guess.
See:
alx@asus5775:/usr/include$ grepc -k MAXHOSTNAMELEN
./X11/Xtrans/Xtranssock.c:225:#define MAXHOSTNAMELEN 255
./asm-generic/param.h:17:#define MAXHOSTNAMELEN 64 /* max length of hostname */
./protocols/timed.h:44:#define MAXHOSTNAMELEN 64
./x86_64-linux-gnu/ruby-3.0.0/rb_mjit_min_header-3.0.4.h:24134:#define MAXHOSTNAMELEN 64
./x86_64-linux-gnu/sys/param.h:54:# define MAXHOSTNAMELEN HOST_NAME_MAX
alx@asus5775:~/src/gnu/glibc$ grepc -kx '\.h$' MAXHOSTNAMELEN
./inet/protocols/timed.h:44:#define MAXHOSTNAMELEN 64
./misc/sys/param.h:54:# define MAXHOSTNAMELEN HOST_NAME_MAX
./sunrpc/rpc/types.h:102:#define MAXHOSTNAMELEN 64
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
Update incidental use of htons(3) to use a private name to avoid influence of
the default mappings on test output.
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
IWYU currently warns for this example:
$ cat test.c
#include <inttypes.h>
#include <stdio.h>
int main(void)
{
uint64_t x = UINT64_C(1234);
printf("%" PRIu64 "\n", x);
return 0;
}
$ include-what-you-use test.c
test.c should add these lines:
#include <stdint.h> // for UINT64_C, uint64_t
test.c should remove these lines:
The full include-list for test.c:
#include <inttypes.h> // for PRIu64
#include <stdint.h> // for UINT64_C, uint64_t
#include <stdio.h> // for printf
---
However, the C standard states that "the header <inttypes.h> includes
the header <stdint.h>". So, this program shouldn't need to include
<stdint.h>.
Some mappings were previously added for individual symbols from
<stdint.h> (e.g., commit 24f9fdf0af ("Add more mappings for
[u]intptr_t"), but it's more accurate to export everything. Get rid of
the symbol mappings and add an include mapping.
llvm::Regex does not support negative lookaround, so there was no way to do
negative assertions such as:
"@\"((?!.*/internal/).*)-inl.h\""
as part of mapping rules.
Switch to std::regex (which defaults to ECMAScript regex dialect) to allow these
constructs.