On Fri, Dec 07 2018, Jonathan Nieder wrote: > Hi, > > Konstantin Ryabitsev wrote: > >> Every now and again I come across a patch sent to LKML without a leading >> "diff a/foo b/foo" -- usually produced by quilt. E.g.: >> >> https://lore.kernel.org/lkml/20181125185004.151077005@xxxxxxxxxxxxx/ >> >> I am guessing quilt does not bother including the leading "diff a/foo >> b/foo" because it's redundant with the next two lines, however this >> remains a valid patch recognized by git-am. >> >> If you pipe that patch via git-patch-id, it produces nothing, but if I >> put in the leading "diff", like so: >> >> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c >> >> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e". > > Interesting. As Ævar mentioned, the relevant code is > > /* Ignore commit comments */ > if (!patchlen && !starts_with(line, "diff ")) > continue; > > which is trying to handle a case where a line that is special to the > parser appears before the diff begins. > > The patch-id appears to only care about the diff text, so it should be > able to handle this. So if we have a better heuristic for where the > diff starts, it would be good to use it. No, the patch-id doesn't just care about the diff, it cares about the context before the diff too. See this patch: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. diff --git x/refs/files-backend.c y/refs/files-backend.c index 9183875dad..dd8abe9185 100644 --- x/refs/files-backend.c +++ y/refs/files-backend.c @@ -180,7 +180,8 @@ static void files_reflog_path(struct files_ref_store *refs, break; case REF_TYPE_OTHER_PSEUDOREF: case REF_TYPE_MAIN_PSEUDOREF: - return files_reflog_path_other_worktrees(refs, sb, refname); + files_reflog_path_other_worktrees(refs, sb, refname); + break; case REF_TYPE_NORMAL: strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname); break; Observe that the diff --git line matters, we hash it: $ git diff-tree -p HEAD~.. | git patch-id 5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000 $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id --stable 5870d115b7e2a9a936ab8fdc254932234413c710 0000000000000000000000000000000000000000 $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id --stable 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 The thing it doesn't care about is the "index" between the "diff" and patch: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id --stable 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 We also care about the +++ and --- lines: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/^(\+\+\+|---).*/$1/g' | git patch-id 56985c2c38cce6079de2690082e1770a8e81214c 0000000000000000000000000000000000000000 Then we normalize the @@ line, e.g.: $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | git patch-id 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | grep -v ^index | perl -pe 's/\d+/123/g' | git patch-id 4cd136f2b98760150f700ac6a5b126389d6d05a7 0000000000000000000000000000000000000000 There's other caveats (see the code, e.g. "strip space") but to a first approximation a patch id is a hash of something that looks like this: diff --git x/refs/files-backend.c y/refs/files-backend.c --- x/refs/files-backend.c +++ y/refs/files-backend.c @@ -123,123 +123,123 @@ static void files_reflog_path(struct files_ref_store *refs, break; case REF_TYPE_OTHER_PSEUDOREF: case REF_TYPE_MAIN_PSEUDOREF: - return files_reflog_path_other_worktrees(refs, sb, refname); + files_reflog_path_other_worktrees(refs, sb, refname); + break; case REF_TYPE_NORMAL: strbuf_addf(sb, "%s/logs/%s", refs->gitcommondir, refname); break; Which means that accepting a patch like this as input would actually give you a different patch-id than if it had the proper header. So it seems most sensible to me if this is going to be supported that we go a bit beyond the call of duty and fake up the start of it, namely: --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c To be: diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c It'll make the state machine a bit more complex, but IMO it would suck more if we generate a different hash depending on the tool generating the diff. OTOH the "diff --git" line was never there, and it *does* matter, so should we be faking it up? Maybe not, bah! > "git apply" uses apply.c::find_header, which is more permissive. > Maybe it would be possible to unify these somehow. (I haven't looked > closely enough to tell how painful that would be.) > > Thanks and hope that helps, > Jonathan