Re: [PATCH 0/5] pretty format %(trailers): improve machine readability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 05 2020, Anders Waldenborg wrote:

> Ævar Arnfjörð Bjarmason writes:
>
>> I started writing this on top of "master", but then saw the
>> outstanding series of other miscellaneous fixes to this
>> facility[1]. This is on top of that topic & rebased on master.
>>
>> Anders, any plans to re-roll yours? Otherwise the conflicts I'd have
>> on mine are easy to fix, so I can also submit it as a stand-alone.
>
> Yes, I have plans to do that. But have yet to carve out the required
> time from my copious spare time to actually do it.
>
> So please don't hold your breath waiting for me to do that.

Thanks. I sent a v2 of mine yesterday as
https://lore.kernel.org/git/20201206002449.31452-1-avarab@xxxxxxxxx/

As noted there the merge conflict with yours is trivial, so hopefully it
won't cause you much hassle if you re-roll while it's outstanding.

>> This series comes out of a discussion at work today (well, yesterday
>> at this point) where someone wanted to parse %(trailers) output. As
>> noted in 3/5 doing this is rather tedious now if you're trying to
>> unambiguously grap trailers as a stream of key-value pairs.
>>
>> So this series adds a "key_value_separator" and "keyonly" parameters,
>> and fixes a few bugs I saw along the way.
>
> Interesting. When adding "valueonly" I never consider it being used
> without "key". The trick you are doing with separate keyonly and
> valueonly is quite clever.
>
> I've only been doing machine parsing for explicit keys, things like:
> "%cn%x00%x00%an%x00%x00%(trailers:key=Reviewed-By,valueonly,unfold,separator=%x00)%x00%x00%(trailers:key=Backport-Reviewed-By,valueonly,unfold,separator=%x00)"
> (double-NUL to separate field, single-NUL to separate values within field).
>
> But I can't help wonder that if the goal just is to have a nice machine
> parsable format maybe it would be easier (both for user and
> implementation) to have a separate placeholder for "machine readable
> trailers" which by default emits in a format suitable for machine
> parsing. Something like a new "%(ztrailers)" (but with a better name)
> which simply emits a sequence of "<KEY> NUL <VAL> NUL" for each trailer

I think it's a bit tricky to make something general in the middle of all
the custom format printf-likes in the pretty format. E.g. some users
might want to use \0 as a delimiter for key-values, others \0\0
etc. because they used \0, or the other way around.

Maybe if there's a reason to extend the optimization it could be smarter
about detecting that you only wanted some fixed-string separator and
nothing else custom?

B.t.w. I tried just deleting the optimization for testing and it slowed
down by around 8% on linux.git according to an extended
p4205-log-pretty-formats.sh.

Looking at the code I wonder if there aren't other lower hanging
optimizations, e.g. it seems we call find_separator() on multiple passes
instead of saving it away, e.g. in the format_trailers_from_commit()
entry point if there's any custom options such as "unfold".

I also wonder if memory allocation is a bottleneck in the "git log"
path, but didn't have time to refactor & test it. For each commit the
walking machinery eventually calls the trailer.c code, which allocates &
free()'s internal structures that could be re-used for parsing the next
commit.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux