Commit Graph

186 Commits

Author SHA1 Message Date
Kim Gräsman 3511ccdedd Silently break cycles in mappings
A long-standing problem with IWYU has been the preprocessor
metaprogramming techniques used in Boost.

IWYU is fine with cycles in the include graph, but did not accept cycles
in the corresponding mapping graph (a potential rationale might have
been to let people know if they explicitly added contradictory mappings,
but we really don't know what drove this decision).

Looking at the algorithm expanding the transitive closure of the mapping
graph, it expands all nodes in depth-first order. Cycle detection is
necessary to protect the algorithm from infinite recursion in the
presence of a cycle. But there should be no difference in the transitive
closure between a graph with cycles and one without. DFS will still
visit all nodes, it will just avoid visiting the same subtree twice.

Once the expansion is done, mapping lookup is an inherently
non-recursive operation (made cheaper by performing the recursive
expansion up-front). See the GetCandidateHeadersForFilepath family of
functions and their respective callers.

Two conclusions:

* Cycles are already detected by the transitive closure algorithm
* No code _after_ the transitive closure is affected by any cycles

Therefore, update the transitive closure expansion to ignore cycles
instead of failing on them. Add diagnostic logging at level 8 so we can
look into its decision-making if necessary.

Fixes issue #424
2022-12-16 22:23:36 +01:00
Kim Gräsman 6d5414fcf3 Add consistent logging for dynamic mappings
* Do all this logging on level 8, whether logging is on for file or not
* Always say "Add dynamic mappings..."
* Don't log quoted from->to, that's already logged in AddMapping
* State the reason dynamic mapping is added
* Include file paths where available

This should make it easier to troubleshoot strange effects from dynamic
mappings triggered by IWYU itself.
2022-12-04 16:18:13 +01:00
Jean-Philippe Gravel 6fb66575fb Add support for group/backreferences regex replacement in mapping files.
This is useful to generically remap files from one directory to another.
The following mapping would for instance map all files under the "foo/"
directory to "third_party/foo/", for instance, mapping "foo/bar.h" to
"third_party/foo/bar.h".

[ { include: ['@"(foo/.*)"', private, '"third_party/\1"', public ] } ]

This is useful for large project where different third_party libraries
are compiled together. In Chromium for instance, include paths would
look like:
 -Ithird_party -Ithird_party/v8/include

This allows code in V8 to include its own headers directly. Other
third_parties would include them via "third_party/v8/include/...".
Because files are accessible from two different paths, IWYU can't know
which should be used. Using a mapping file, adding or removing the
"third_party/v8/include" prefix where needed, resolves this problem.
2022-10-09 20:40:18 +02:00
Kim Gräsman f81ef2388e Move ^$ anchoring into RegexMatch
Only LLVM regexes need explicit anchoring; std::regex_match already has
full match semantics.

This saves us thousands of temporaries when using std::regex-based
dialects and actually improves processing time of

  $ time include-what-you-use -Xiwyu --regex=ecmascript -Xiwyu --mapping_file=qt5_11.imp tests/981/t.cc
  ...
  real    0m5,517s
  user    0m5,504s
  sys     0m0,013s

if only very slightly (~10%).
2022-09-03 15:04:27 +02:00
Kim Gräsman 2f561c1035 Remove unused capture group
When building regexes from @-prefixed mappings, we wrapped the
expression in ^(...)$ to force llvm::Regex::match to match only the
whole string.

The parentheses were probably added in there by accident, we have no use
for a capture group in this scenario, and the anchors are fine without
grouping.
2022-09-03 15:04:27 +02:00
Kim Gräsman cf53880822 Add --regex option
As reported in issue #981, using std::regex in IWYU has caused a
tremendous performance regression for large mapping files containing
regex mappings.

  $ cat t.cc
  #include <string>

  # with llvm::Regex
  $ time include-what-you-use -Xiwyu --mapping_file=qt5_11.imp t.cc
  ...
  real 0m0,529s
  user 0m0,509s
  sys  0m0,020s

  # with std::regex
  $ time include-what-you-use -Xiwyu --mapping_file=qt5_11.imp t.cc
  ...
  real 0m29,870s
  user 0m29,717s
  sys  0m0,012s

qt5_11.imp contains 2300+ regex mappings, and <string> has a bunch of
includes, so this is a good testbed for regular expression engines, but
over 50x slower is not the result we were hoping for.

The reason we switched to std::regex was to get support for negative
lookaround (llvm::Regex does not have it), but exotic regexes in
mappings are pretty rare, and this is a significant performance hit.

Introduce a --regex option to select regex dialect, with documented
tradeoffs. Put the default back to LLVM's fast implementation.

This fixes issue #981.
2022-09-03 15:04:27 +02:00
Kim Gräsman 0a714265c0 Add regex module
Move regular expression matching into a function in its own module.

No functional change.
2022-09-03 15:04:27 +02:00
Alejandro Colomar 6f2b277e6d Add mappings for itimerspec
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-09-03 12:37:20 +02:00
Alejandro Colomar 16df68dbb7 Add mappings for <bits/struct_stat.h>
struct stat members are implemented through macros and other
stuff, which is defined in that headers, so map the header to the
same headers that struct stat is mapped.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-09-03 12:37:20 +02:00
Alejandro Colomar 3bb4e2dc8d Add mappings for getopt(3) and opt{arg,err,ind,opt}
Glibc also provides them in <getopt.h>, but let's recommend the
portable POSIX header only.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-09-03 12:37:20 +02:00
Alejandro Colomar 57044c7e43 Add mappings for <bits/types/siginfo_t.h>
siginfo_t members are implemented through macros and other stuff,
which is defined in that header, so map the header to the same
headers that siginfo_t is mapped.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-09-03 12:37:20 +02:00
Alejandro Colomar 58f925f063 Add mappings for <bits/mman-shared.h>
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-09-03 12:37:20 +02:00
Boleyn Su 6f772242cf Add mapping of bits/utility.h for gcc.stl.headers 2022-09-01 19:50:08 +02:00
Alejandro Colomar 75623bce3b Fix mapping for <bits/syscall.h>
SYS_xxx macros, which are implemented in glibc in
<bits/syscall.h>, are intended to be included from
<sys/syscall.h>.  See syscall(2), amd also see the following:

$ grepc -k 'SYS_[a-z]\w+' /usr/include | head
/usr/include/x86_64-linux-gnu/bits/syscall.h:35:# define SYS_accept __NR_accept
/usr/include/x86_64-linux-gnu/bits/syscall.h:39:# define SYS_accept4 __NR_accept4
/usr/include/x86_64-linux-gnu/bits/syscall.h:43:# define SYS_access __NR_access
/usr/include/x86_64-linux-gnu/bits/syscall.h:47:# define SYS_acct __NR_acct
/usr/include/x86_64-linux-gnu/bits/syscall.h:51:# define SYS_acl_get __NR_acl_get
/usr/include/x86_64-linux-gnu/bits/syscall.h:55:# define SYS_acl_set __NR_acl_set
/usr/include/x86_64-linux-gnu/bits/syscall.h:59:# define SYS_add_key __NR_add_key
/usr/include/x86_64-linux-gnu/bits/syscall.h:63:# define SYS_adjtimex __NR_adjtimex
/usr/include/x86_64-linux-gnu/bits/syscall.h:67:# define SYS_afs_syscall __NR_afs_syscall
/usr/include/x86_64-linux-gnu/bits/syscall.h:71:# define SYS_alarm __NR_alarm
$ grep -rn bits/syscall.h /usr/include
/usr/include/x86_64-linux-gnu/bits/syscall.h:5:# error "Never use <bits/syscall.h> directly; include <sys/syscall.h> instead."
/usr/include/x86_64-linux-gnu/sys/syscall.h:27:   programs expect the traditional form SYS_*.  <bits/syscall.h>
/usr/include/x86_64-linux-gnu/sys/syscall.h:29:#include <bits/syscall.h>

I removed duplicate mappings, and also removed the mapping for
<syscall.h>, which is an undocumented feature of glibc.  Portable
programs should stick to <sys/syscall.h> (and unportable ones
too, since there's no gain apart from those 4 bytes).

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-09-01 19:44:29 +02:00
Alejandro Colomar 983fe5a8fb Add another mapping for NULL
POSIX defines NULL in <unistd.h> too.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-08-01 20:10:27 +02:00
Alejandro Colomar 979002bf8e Add mappings for SIGABRT
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-07-09 21:59:46 +02:00
Alejandro Colomar fdca99fcd2 Add mappings for 'struct iovec'
As recently documented in iovec(3type).

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-07-09 21:59:46 +02:00
Alejandro Colomar ec625ea9db Add mappings for 'MAP_POPULATE'
It's not in POSIX, so I found it experimentally.

$ grepc -k MAP_POPULATE
./asm-generic/mman-common.h:26:#define MAP_POPULATE		0x008000	/* populate (prefault) pagetables */
./x86_64-linux-gnu/bits/mman-map-flags-generic.h:34:# define MAP_POPULATE	0x08000		/* Populate (prefault) pagetables.  */

$ grep -r '<bits/mman-map-flags-generic.h>'
x86_64-linux-gnu/bits/mman.h:#include <bits/mman-map-flags-generic.h>
x86_64-linux-gnu/bits/mman-map-flags-generic.h:# error "Never use <bits/mman-map-flags-generic.h> directly; include <sys/mman.h> instead."

So glibc defines it in <sys/mman.h>.

$ grep -r '<asm-generic/mman-common.h>'
asm-generic/mman.h:#include <asm-generic/mman-common.h>
$ grep -r '<asm-generic/mman.h>'
x86_64-linux-gnu/asm/mman.h:#include <asm-generic/mman.h>
$ grep -r '<asm/mman.h>'
linux/mman.h:#include <asm/mman.h>

And the kernel in <linux/mman.h>.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-07-09 21:59:46 +02:00
Alejandro Colomar a7d6088546 Add mappings for 'MAXHOSTNAMELEN'
I couldn't find an documentation about where it should be defined
(POSIX doesn't define it), so my best guess was to find the
definitions experimentally.  Combining my findings in the glibc
repo with the findings in my own Debian system, and ignoring
kernel headers and unrelated headers such as X11's, this is my
best guess.

See:

alx@asus5775:/usr/include$ grepc -k MAXHOSTNAMELEN
./X11/Xtrans/Xtranssock.c:225:#define MAXHOSTNAMELEN 255
./asm-generic/param.h:17:#define MAXHOSTNAMELEN	64	/* max length of hostname */
./protocols/timed.h:44:#define MAXHOSTNAMELEN	64
./x86_64-linux-gnu/ruby-3.0.0/rb_mjit_min_header-3.0.4.h:24134:#define MAXHOSTNAMELEN 64
./x86_64-linux-gnu/sys/param.h:54:# define MAXHOSTNAMELEN	HOST_NAME_MAX

alx@asus5775:~/src/gnu/glibc$ grepc -kx '\.h$' MAXHOSTNAMELEN
./inet/protocols/timed.h:44:#define MAXHOSTNAMELEN	64
./misc/sys/param.h:54:# define MAXHOSTNAMELEN	HOST_NAME_MAX
./sunrpc/rpc/types.h:102:#define        MAXHOSTNAMELEN  64

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-07-09 21:59:46 +02:00
Alejandro Colomar 21d4c7d2df Add mappings for 'struct tm'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-07-09 21:59:46 +02:00
Alejandro Colomar fb7f8557a1 Add mappings for byteorder(3) functions
Update incidental use of htons(3) to use a private name to avoid influence of
the default mappings on test output.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2022-05-21 10:49:29 +02:00
fanquake d3e7eb40b4 Add additional termios.h mappings
```cpp
include <bits/termios-c_lflag.h>  // for ECHO
include <bits/termios-struct.h>   // for termios
include <bits/termios-tcflow.h>   // for TCSANOW

void SetStdinEcho(bool enable)
{
    struct termios tty;
    tcgetattr(STDIN_FILENO, &tty);
    if (!enable) {
        tty.c_lflag &= ~ECHO;
    } else {
        tty.c_lflag |= ECHO;
    }
    (void)tcsetattr(STDIN_FILENO, TCSANOW, &tty);
}
```
2022-04-21 21:13:12 +02:00
Alejandro Colomar a0da664a05 Add mappings for the macro 'SIGCHLD'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-12-05 10:44:25 +01:00
Alejandro Colomar 3dcdbf1520 Add mappings for the macro 'MAP_STACK'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-12-05 10:44:25 +01:00
Alejandro Colomar 635c287c4e Add mappings for (struct) 'stat'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-12-05 10:44:25 +01:00
Omar Sandoval 44480a2de0 Add export from <stdint.h> to <inttypes.h>
IWYU currently warns for this example:

  $ cat test.c
  #include <inttypes.h>
  #include <stdio.h>

  int main(void)
  {
          uint64_t x = UINT64_C(1234);
          printf("%" PRIu64 "\n", x);
          return 0;
  }
  $ include-what-you-use test.c

  test.c should add these lines:
  #include <stdint.h>    // for UINT64_C, uint64_t

  test.c should remove these lines:

  The full include-list for test.c:
  #include <inttypes.h>  // for PRIu64
  #include <stdint.h>    // for UINT64_C, uint64_t
  #include <stdio.h>     // for printf
  ---

However, the C standard states that "the header <inttypes.h> includes
the header <stdint.h>". So, this program shouldn't need to include
<stdint.h>.

Some mappings were previously added for individual symbols from
<stdint.h> (e.g., commit 24f9fdf0af ("Add more mappings for
[u]intptr_t"), but it's more accurate to export everything. Get rid of
the symbol mappings and add an include mapping.
2021-12-04 20:25:50 +01:00
Kim Gräsman a5d8408c5f Use std::regex instead of llvm::Regex
llvm::Regex does not support negative lookaround, so there was no way to do
negative assertions such as:

    "@\"((?!.*/internal/).*)-inl.h\""

as part of mapping rules.

Switch to std::regex (which defaults to ECMAScript regex dialect) to allow these
constructs.
2021-08-04 09:53:25 +02:00
Alejandro Colomar 9e0d33d828 Add more mappings for 'va_list'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar b6eb039187 Add mappings for 'uintmax_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 4c187d46aa Reorder (alphabetically) mappings for 'uid_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar de49a862c8 Add more mappings for (struct) 'timeval'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar dc2acabc58 Add more mappings for (struct) 'timespec'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 029d045f1a Add mappings for 'timer_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 2499d91c45 Add more mappings for 'time_t'
Use <time.h> as the primary header for portability to non-POSIX systems.

Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 96b7299340 Reorder (alphabetically) mappings for 'suseconds_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar e89c3c1638 Add more mappings for 'ssize_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 289e0d3856 Fix mappings for 'socklen_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar bcdd338d4d Add mappings for 'struct sockaddr'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar d50530f319 Add more mappings for 'size_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar c19ae5ef07 Add mappings for 'union sigval'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar e109386c09 Fix mappings for 'sigset_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar d83f42cd46 Add mappings for 'siginfo_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 291e1c7ae3 Add mappings for 'struct sigevent'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 3a2b81eb94 Add mappings for 'regoff_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 12e6971de4 Add mappings for 'regmatch_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 17ee1d337d Add mappings for 'regex_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 9bdc2bb869 Add more mappings for 'pid_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 2c4a2ac7b1 Add more mappings for 'off_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar 5a683775e9 Add more mappings for 'mode_t'
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00
Alejandro Colomar d860e3cd5c Add mappings for [l]ldiv_t
Signed-off-by: Alejandro Colomar <alx.manpages@gmail.com>
2021-07-11 09:03:34 +02:00