This avoid double-anchoring, for example if users put their own anchors
in mappings. It also reduces clutter in the RegexMatch/RegexReplace
functions now that both need to do the same anchoring.
This is useful to generically remap files from one directory to another.
The following mapping would for instance map all files under the "foo/"
directory to "third_party/foo/", for instance, mapping "foo/bar.h" to
"third_party/foo/bar.h".
[ { include: ['@"(foo/.*)"', private, '"third_party/\1"', public ] } ]
This is useful for large project where different third_party libraries
are compiled together. In Chromium for instance, include paths would
look like:
-Ithird_party -Ithird_party/v8/include
This allows code in V8 to include its own headers directly. Other
third_parties would include them via "third_party/v8/include/...".
Because files are accessible from two different paths, IWYU can't know
which should be used. Using a mapping file, adding or removing the
"third_party/v8/include" prefix where needed, resolves this problem.
Only LLVM regexes need explicit anchoring; std::regex_match already has
full match semantics.
This saves us thousands of temporaries when using std::regex-based
dialects and actually improves processing time of
$ time include-what-you-use -Xiwyu --regex=ecmascript -Xiwyu --mapping_file=qt5_11.imp tests/981/t.cc
...
real 0m5,517s
user 0m5,504s
sys 0m0,013s
if only very slightly (~10%).
As reported in issue #981, using std::regex in IWYU has caused a
tremendous performance regression for large mapping files containing
regex mappings.
$ cat t.cc
#include <string>
# with llvm::Regex
$ time include-what-you-use -Xiwyu --mapping_file=qt5_11.imp t.cc
...
real 0m0,529s
user 0m0,509s
sys 0m0,020s
# with std::regex
$ time include-what-you-use -Xiwyu --mapping_file=qt5_11.imp t.cc
...
real 0m29,870s
user 0m29,717s
sys 0m0,012s
qt5_11.imp contains 2300+ regex mappings, and <string> has a bunch of
includes, so this is a good testbed for regular expression engines, but
over 50x slower is not the result we were hoping for.
The reason we switched to std::regex was to get support for negative
lookaround (llvm::Regex does not have it), but exotic regexes in
mappings are pretty rare, and this is a significant performance hit.
Introduce a --regex option to select regex dialect, with documented
tradeoffs. Put the default back to LLVM's fast implementation.
This fixes issue #981.