On 5/23/22 5:33 PM, Junio C Hamano wrote:
"Jeff Hostetler via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
+ ls | test-tool hexdump | grep "63 5f c3 a9"
A few comments:
* Not folding output lines at arbitrary place like "od", "hd",
etc. does, is a good design decision made by "hexdump" here.
Depending on where in the pathname the 4-byte sequence appears,
tools from other people may split the sequence across output
lines, making grep ineffective. But our hexdump would work fine
here.
* For the narrow purpose of the tests in this script, output that
is a single long line produced by hexdump might be sufficient,
but I wonder if it makes the tool more useful if we at least
placed the hexified output for each line on separate output
lines.
Yeah, having tools arbitrarily wrap every 16 or whatever bytes
(and including offset line prefixes) makes it difficult to use
when looking for specific patterns that might span a boundary.
I could see having a command line option to emit a '\n' (in addition
to or in place of) each LF in the input. I suppose it depends on the
type of data we are dumping. (That also gets into issues about CRLFs,
however.)
I'm using hexdump for unicode text here, soit could make sense. But
if I were using it to dump .git/index it wouldn't.
So having the default be one very long line is a good start.
We can teach it more later.
* Purist in us may find it a bit disturbing that exit status from
test-tool is hidden by the pipe. I do not care too deeply about
it, as it is very unlikely that we care about segfault after
hexdump successfully shows the substring the downstream grep is
looking for, but it does make us feel dirty.
Given the simplicity of the current version of the helper, I'm not
really worried about such problems. I suppose that we could do the
usual trick of writing the hex dump to a file and grepping it, but
I'm not sure it's worth the bother right now.
A devil's advocate suggestion is to go in the completely opposite
side of the spectrum. Perhaps if we are willing to limit the tool's
utility to the tests done in this script file, it might be a good
idea to combine the latter two elements in the pipeline, i.e.
ls | test-tool hexgrep 63 5f c3 a9
that exits with 0 when the output from "ls" has the 4-byte sequence,
exits with 1 when it does not, and exits with 139 when it segfauls ;-)
I was a little afraid to suggest a hex version of grep. That would
be interesting project to work on, but has lots of hard problems in
it and is too much to tack on to this series. Johannes raises some
interesting questions in a later response in this thread that suggest
that this could be a seriously non-trivial task. So again, I'd like
to not attempt this.
Thanks
Jeff