On Thu, Mar 31, 2016 at 6:47 AM, Jeff King <peff@xxxxxxxx> wrote: > On Wed, Mar 30, 2016 at 12:31:30PM -0700, Jacob Keller wrote: > >> So far I've only found a single location that ends up looking worse >> within the Linux kernel. Diffs of some Kbuild settings result in >> something like >> >> before: >> >> If unsure, say Y. >> + >> +config RMI4_I2C >> + tristate "RMI4 I2C Support" >> + depends on RMI4_CORE && I2C >> + help >> + Say Y here if you want to support RMI4 devices connected to an I2C >> + bus. >> + >> + If unsure, say Y. >> >> after: >> >> required for all RMI4 device support. >> >> + If unsure, say Y. >> + >> +config RMI4_I2C >> + tristate "RMI4 I2C Support" >> + depends on RMI4_CORE && I2C >> + help >> + Say Y here if you want to support RMI4 devices connected to an I2C >> + bus. >> + >> If unsure, say Y. >> >> So in this particular instance which has multiple blank lines and is a >> similar issue as with Stefan's note above, this is where the heuristic >> falls apart. At least for C code this is basically vanishingly small >> compared to the number of comment header fix ups. >> >> I think it may be that Stefan's suggestions above may be on the right >> track to resolve that too. > > This is a tricky one. There _aren't_ actually multiple blank lines in > the ambiguous area, because this particular example comes at the very > end of the file. Try: > > git show 8d99758dee3 drivers/input/rmi4/Kconfig > > which adds a block in the middle of the file. It looks good both before > and after running through the script. Now look at this example: > > git show fdf51604f10 drivers/input/rmi4/Kconfig > > which looks like: > > diff --git a/drivers/input/rmi4/Kconfig b/drivers/input/rmi4/Kconfig > index 5ea60e3..cc3f7c5 100644 > --- a/drivers/input/rmi4/Kconfig > +++ b/drivers/input/rmi4/Kconfig > @@ -8,3 +8,12 @@ config RMI4_CORE > required for all RMI4 device support. > > If unsure, say Y. > + > +config RMI4_I2C > + tristate "RMI4 I2C Support" > + depends on RMI4_CORE && I2C > + help > + Say Y here if you want to support RMI4 devices connected to an I2C > + bus. > + > + If unsure, say Y. > > > Note that there is no trailing context, as we're adding at the end of > the file. So the ambiguous portion consists of only two lines: an empty > line, and "If unsure...". And we bump the latter to the top, per the > heuristic (it's the exact opposite of every other case, where the blank > line is a true delimiter). > > As a human, I think the indentation here is the real syntactic clue. But > getting into indentation heuristics is probably insane. > > The script could probably make this work by disabling itself if the hunk > is at the end of the diffed file (i.e., we don't see more context lines > afterward). That covers any case like this where newline _is_ a > delimiter, but we just have some internal newlines, too. It wouldn't > cover the case where we had internal newlines but used some other > paragraph delimiter, but based on the results so far, that seems rather > rare. > > Something like this: > > --- foo.pl.orig 2016-03-31 09:44:44.281232230 -0400 > +++ foo.pl 2016-03-31 09:44:34.901232632 -0400 > @@ -24,13 +24,15 @@ > push @hunk, $_; > $state = STATE_IN_CHUNK; > } else { > - flush_hunk(); > + print @hunk; > + @hunk = (); > $state = STATE_NONE; > print; > } > } > } > -flush_hunk(); > +print @hunk; > +@hunk = (); > > sub flush_hunk { > my $context_len = 0; > > -Peff I started attempting to implement this heuristic within xdiff, but I am at a loss as to how xdiff actually works. I suspect this would go in xdi_change_compact or after it, but I really don't understand how xdiff represents the diffs at all... Thanks, Jake -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html