René Scharfe <l.s.r@xxxxxx> writes: >> In the face of unreliable segfaults we need to reverse our strategy, >> I think. Searching for something not in the buffer (e.g. "1") and >> considering matches and segfaults as confirmation that the bug is >> still present should avoid any false positives. Right? > > And that's not it either. *sigh* > > If the 4097th byte is NUL or LF then we can only hope its access > triggers a segfault -- there is no other way to distinguish the > result from a legitimate match when limiting with REG_STARTEND. So > we have to accept this flakiness. > We can check the value of that byte with [^0] and interpret a > match as failure, but that adds negation and makes the test more > complex. > > ^0*$ would falsely match if the 4097th byte (and possibly later > ones) is 0. We need to make sure we check for end-of-line after > the 4096th byte, not later. I do not have a strong objection against "^0{64}{64}$", but let me make sure that I understand your rationale correctly. We assume that we do not have an unmapped guard page to reliably trap us. So "^0{64}{64}$" will report a false success when that 4097th byte is either NUL or LF. There is 2/256 probability of that happening but we accept it. A pattern "0$" will give us a false success in the same case, but the reason why it is worse is because in addition, we get a false success if that second page begins with "0\0" or "0\n". The chance of that happening is additional 2/256 * 1/256. Worse yet, the page could start with "00\0" or "00\n", which is additional 2/256 * 1/65536. We could have yet another "0" at the beginning of that second page, which only increases the chance of us getting a false sucess. We consider that 2/256 * (1/256 + 1/256^2 + 1/256^3 + ...) is too big an additional chance of getting false success to be acceptable, even though 2/256 which is the chance of false success that we have to and are willing to accept. Now I am not good at doing math on the keyboard and text terminal, but X = (1/256 + 1/256^2 + ...) 256X = 1 + X X = 1/255 so that additional rate of false success is 2/256 * 1/255. So we are saying that we accept ~1/100 false success rate, but additional ~1/30000 is unacceptable. I do not know if I buy that argument, but I do think that additional false success rate, even if it is miniscule, is unnecessary. So as long as everybody's regexp library is happy with "^0{64}{64}$", let's use that. Thanks.