Re: [PATCH v7 27/30] t/lib-unicode-nfc-nfd: helper prereqs for testing unicode nfc/nfd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/23/22 5:33 PM, Junio C Hamano wrote:
"Jeff Hostetler via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

+	ls | test-tool hexdump | grep "63 5f c3 a9"

A few comments:

  * Not folding output lines at arbitrary place like "od", "hd",
    etc. does, is a good design decision made by "hexdump" here.
    Depending on where in the pathname the 4-byte sequence appears,
    tools from other people may split the sequence across output
    lines, making grep ineffective.  But our hexdump would work fine
    here.

  * For the narrow purpose of the tests in this script, output that
    is a single long line produced by hexdump might be sufficient,
    but I wonder if it makes the tool more useful if we at least
    placed the hexified output for each line on separate output
    lines.


Yeah, having tools arbitrarily wrap every 16 or whatever bytes
(and including offset line prefixes) makes it difficult to use
when looking for specific patterns that might span a boundary.

I could see having a command line option to emit a '\n' (in addition
to or in place of) each LF in the input.  I suppose it depends on the
type of data we are dumping. (That also gets into issues about CRLFs,
however.)

I'm using hexdump for unicode text here, soit could make sense.  But
if I were using it to dump .git/index it wouldn't.

So having the default be one very long line is a good start.
We can teach it more later.


  * Purist in us may find it a bit disturbing that exit status from
    test-tool is hidden by the pipe.  I do not care too deeply about
    it, as it is very unlikely that we care about segfault after
    hexdump successfully shows the substring the downstream grep is
    looking for, but it does make us feel dirty.

Given the simplicity of the current version of the helper, I'm not
really worried about such problems.  I suppose that we could do the
usual trick of writing the hex dump to a file and grepping it, but
I'm not sure it's worth the bother right now.


A devil's advocate suggestion is to go in the completely opposite
side of the spectrum.  Perhaps if we are willing to limit the tool's
utility to the tests done in this script file, it might be a good
idea to combine the latter two elements in the pipeline, i.e.

	ls | test-tool hexgrep 63 5f c3 a9

that exits with 0 when the output from "ls" has the 4-byte sequence,
exits with 1 when it does not, and exits with 139 when it segfauls ;-)


I was a little afraid to suggest a hex version of grep.  That would
be interesting project to work on, but has lots of hard problems in
it and is too much to tack on to this series.  Johannes raises some
interesting questions in a later response in this thread that suggest
that this could be a seriously non-trivial task.  So again, I'd like
to not attempt this.

Thanks
Jeff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux