On 2012-10-28 13:01, Jeff King wrote: > On Sat, Oct 27, 2012 at 08:37:24PM +0200, Peter Oberndorfer wrote: > >> It seems "git diff-tree -Ganything <tree>" crashes[1] with a null >> pointer dereference >> when run on a commit that adds a file (pdf) with a textconv filter. >> >> It can be reproduced with vanilla git by having a commit on top that >> adds a file with a textconv filter and executing git diff-tree >> -Ganything HEAD >> But running git log -Ganything still works without a crash. >> This problem seems to exist since the feature was first added in f506b8e8b5. > Thanks for a thorough bug report. I didn't reproduce the crash, but I > can see how it happens (it happens with diff-tree because we will reuse > the working tree file in that instance; it could also happen if you > turned on textconv caching). > >> While testing I also noticed the -S and -G act on the original file >> instead of the textconv munged data. >> Is this intentional or caused by accessing the wrong data? > Both, perhaps. :) Hi, thanks for your patch for this! > > -G operates on the munged data; you can see it feed the munged data to > xdiff in diff_grep. But the optimization for handling added and removed > files accidentally fed the wrong pointer. Fixing that is a no-brainer, > since the optimization is inconsistent with the regular code path. > > -S, however, predates the invention of textconv and has never used it. > It is a little less clear that textconv is the right thing here, because > it is not about grepping the diff, but about counting occurrences of the > string inside the file content. I tend to think that doing so on the > textconv'd data would be what people generally want, but it is a > behavior change. > >> Wild guess: should we really access p->one->data and not mf1.ptr? > Precisely. Thanks for your wild guess; it made finding the bug very > easy. :) > >> Is there some more information i should provide? > The patch below should fix it. I added tests, but please try your > real-world test case on it to double-check. I tested your patch, but now it crashes for another reason :-) i have a file with exactly 12288(0x3000) bytes in the repository. When the file is loaded, the data is placed luckily so the data end falls at a page boundary. Later diff_grep() calls regexec() which calls strlen() on the loaded buffer and ends up reading beyond the actual data into the next page which is not allocated and causes a pagefault. Or it could possibly (randomly) match the regex on data that is not actually part of a file... Different memory allocation rules on Windows probably also have some influence here. My guess is that diff_filespec->data is not supposed to be zero terminated and we should not invoke strlen() on a not zero terminated data. But this should be decided by somebody who knows the rules. Greetings Peter -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html