Re: [PATCH 5/5] diff: handle NULs in get_string_hash()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 19, 2017 at 1:31 PM, Jeff King <peff@xxxxxxxx> wrote:
> For computing moved lines, we feed the characters of each
> line into a hash. When we've been asked to ignore
> whitespace, then we pick each character using next_byte(),
> which returns -1 on end-of-string, which it determines using
> the start/end pointers we feed it.
>
> However our check of its return value treats "0" the same as
> "-1", meaning we'd quit if the string has an embedded NUL.

I agree. The code looks correct.

> This is unlikely to ever come up in practice since our line
> boundaries generally come from calling strlen() in the first
> place.

get_string_hash is called from
 prepare_entry, which in turn is called from
  add_lines_to_move_detection or mark_color_as_moved
   diff_flush_patch_all_file_pairs

that constructs the arguments in
diff_flush_patch
 run_diff
  run_diff_cmd
   builtin_diff (part "/* Crazy xdl interfaces.. */")
    xdi_diff_outf( fn_out_consume as arg!)
     xdi_diff
      xdl_diff
       xdl_call_hunk_func
        -> fn_out_consume(cb, line, len)

xdl_call_hunk_func however uses pointer arithmetic instead
of strlen. So I think this sentence is not a good idea to put in
the commit message.

It may not occur in practice, due to binary files detection using
NUL as a signal, but conceptually our move-colored(!) diffs
should be compatible with NULs with this patch now.

> But it was a bit surprising to me as a reader of the
> next_byte() code. And it's possible that we may one day feed
> this function with more exotic input, which otherwise works
> with arbitrary ptr/len pairs.

Good point.

>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
> I noticed that we make an extra copy of each line here, just to feed it
> to memihash! I guess "-w" is not a critical-performance code path, but
> this could be fixed if we could do memhash() incrementally (e.g., by
> putting the FNV state into a struct and letting callers "add" to it
> incrementally). Maybe an interesting #leftoverbits, though I'd want to
> see timing tests that show it's worth doing.
>

I agree.

Thanks,
Stefan



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux