Junio C Hamano wrote: > Thomas Rast <trast@xxxxxxxxxxxxxxx> writes: > > > * The word regex matches anything that is !isspace(). > > > > * The word regex does not match '\n'. (This case is not very harmful, > > but we used to silently cut off at the '\n' which may go against > > user expectations.) > > How expensive to run this check twice, every time word_regex finds a > match? It runs the first bullet point for every non-match, and the second bullet point for every match. So it looks at every input character exactly once. > As this is about making sure that we got a sane regex from the user (or a > builtin pattern), I wonder if we can make it not depend on the payload we > are matching the regex against. Then before using a word_regex that we > have not checked, we check if that regex is sane, mark it checked, and do > not have to do the check over and over again. Algorithmically it should be easy once you have the finite state automaton corresponding to the regex: just verify that for every possible non-terminal state, there is a transition for every !isspace() character to a state other than "fail to match" or "match the empty string". In the implementation, it might be doable if we switch to compat/regex on all platforms, since we then have ready access to all internal structures regcomp() creates, including the DFA. I'll think about at least using compat/regex for a static check of all *builtin* patterns, which would be superior to the brute force approach in 4/4. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html