On Sat, Dec 05 2020, Anders Waldenborg wrote: > Ævar Arnfjörð Bjarmason writes: > >> I started writing this on top of "master", but then saw the >> outstanding series of other miscellaneous fixes to this >> facility[1]. This is on top of that topic & rebased on master. >> >> Anders, any plans to re-roll yours? Otherwise the conflicts I'd have >> on mine are easy to fix, so I can also submit it as a stand-alone. > > Yes, I have plans to do that. But have yet to carve out the required > time from my copious spare time to actually do it. > > So please don't hold your breath waiting for me to do that. Thanks. I sent a v2 of mine yesterday as https://lore.kernel.org/git/20201206002449.31452-1-avarab@xxxxxxxxx/ As noted there the merge conflict with yours is trivial, so hopefully it won't cause you much hassle if you re-roll while it's outstanding. >> This series comes out of a discussion at work today (well, yesterday >> at this point) where someone wanted to parse %(trailers) output. As >> noted in 3/5 doing this is rather tedious now if you're trying to >> unambiguously grap trailers as a stream of key-value pairs. >> >> So this series adds a "key_value_separator" and "keyonly" parameters, >> and fixes a few bugs I saw along the way. > > Interesting. When adding "valueonly" I never consider it being used > without "key". The trick you are doing with separate keyonly and > valueonly is quite clever. > > I've only been doing machine parsing for explicit keys, things like: > "%cn%x00%x00%an%x00%x00%(trailers:key=Reviewed-By,valueonly,unfold,separator=%x00)%x00%x00%(trailers:key=Backport-Reviewed-By,valueonly,unfold,separator=%x00)" > (double-NUL to separate field, single-NUL to separate values within field). > > But I can't help wonder that if the goal just is to have a nice machine > parsable format maybe it would be easier (both for user and > implementation) to have a separate placeholder for "machine readable > trailers" which by default emits in a format suitable for machine > parsing. Something like a new "%(ztrailers)" (but with a better name) > which simply emits a sequence of "<KEY> NUL <VAL> NUL" for each trailer I think it's a bit tricky to make something general in the middle of all the custom format printf-likes in the pretty format. E.g. some users might want to use \0 as a delimiter for key-values, others \0\0 etc. because they used \0, or the other way around. Maybe if there's a reason to extend the optimization it could be smarter about detecting that you only wanted some fixed-string separator and nothing else custom? B.t.w. I tried just deleting the optimization for testing and it slowed down by around 8% on linux.git according to an extended p4205-log-pretty-formats.sh. Looking at the code I wonder if there aren't other lower hanging optimizations, e.g. it seems we call find_separator() on multiple passes instead of saving it away, e.g. in the format_trailers_from_commit() entry point if there's any custom options such as "unfold". I also wonder if memory allocation is a bottleneck in the "git log" path, but didn't have time to refactor & test it. For each commit the walking machinery eventually calls the trailer.c code, which allocates & free()'s internal structures that could be re-used for parsing the next commit.