Re: git rev-list %an, %ae, %at bug in v1.7.10.1 and beyond

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Peff,

Thanks for taking the time to respond! I appreciate it.

I went and spoke to the tester who made the original commit, and it
sounds like what he did was open .git/config with a text editor and
update the user.name directly. He then saved the file. His operating
system's code page was UTF-16LE, so I believe the file was saved in
that format. He then did the commit.

That sort of behavior is pretty far outside the scope of anything git
can realistically be expected to handle (and I told the tester that as
well). That said, I'm interested that it "suddenly" broke, and also
that cat-file still appears to not think there's any problem.

I piped the output through xxd, for cat-file and rev-list. The
cat-file output is the same in both versions of git, and looks like
this:
auri:qa-resources.git bturner$ git cat-file -p
5c1ccdec5f84aa149a4978f729fdda70769f942f | xxd
0000000: 7472 6565 2064 6431 3733 6362 3730 6261  tree dd173cb70ba
0000010: 6161 6330 3762 6466 3430 3566 3465 3364  aac07bdf405f4e3d
0000020: 6231 3130 6537 6661 6664 3138 300a 7061  b110e7fafd180.pa
0000030: 7265 6e74 2030 3263 3738 6263 3339 6163  rent 02c78bc39ac
0000040: 3631 3932 3632 3362 6630 3830 6533 6532  6192623bf080e3e2
0000050: 6163 3839 3261 3466 3537 3634 630a 6175  ac892a4f5764c.au
0000060: 7468 6f72 2061 203c 6661 726d 6173 4061  thor a <farmas@a
0000070: 746c 6173 7369 616e 2e63 6f6d 3e20 3133  tlassian.com> 13
0000080: 3237 3837 3632 3232 202b 3131 3030 0a63  27876222 +1100.c
0000090: 6f6d 6d69 7474 6572 2061 203c 6661 726d  ommitter a <farm
00000a0: 6173 4061 746c 6173 7369 616e 2e63 6f6d  as@xxxxxxxxxxxxx
00000b0: 3e20 3133 3237 3837 3632 3232 202b 3131  > 1327876222 +11
00000c0: 3030 0a0a 636f 6d6d 6974 2077 6974 6820  00..commit with
00000d0: 756e 6963 6f64 6520 6e61 6d65 0a         unicode name.

In 1.7.9.5, rev-list looks like this:
auri:qa-resources.git bturner$ git rev-list
--format="%H|%h|%P|%p|%an|%ae|%at%n%B%n@@object_end@@" -1
5c1ccdec5f84aa149a4978f729fdda70769f942f | xxd
0000000: 636f 6d6d 6974 2035 6331 6363 6465 6335  commit 5c1ccdec5
0000010: 6638 3461 6131 3439 6134 3937 3866 3732  f84aa149a4978f72
0000020: 3966 6464 6137 3037 3639 6639 3432 660a  9fdda70769f942f.
0000030: 3563 3163 6364 6563 3566 3834 6161 3134  5c1ccdec5f84aa14
0000040: 3961 3439 3738 6637 3239 6664 6461 3730  9a4978f729fdda70
0000050: 3736 3966 3934 3266 7c35 6331 6363 6465  769f942f|5c1ccde
0000060: 7c30 3263 3738 6263 3339 6163 3631 3932  |02c78bc39ac6192
0000070: 3632 3362 6630 3830 6533 6532 6163 3839  623bf080e3e2ac89
0000080: 3261 3466 3537 3634 637c 3032 6337 3862  2a4f5764c|02c78b
0000090: 637c 617c 6661 726d 6173 4061 746c 6173  c|a|farmas@atlas
00000a0: 7369 616e 2e63 6f6d 7c31 3332 3738 3736  sian.com|1327876
00000b0: 3232 320a 636f 6d6d 6974 2077 6974 6820  222.commit with
00000c0: 756e 6963 6f64 6520 6e61 6d65 0a0a 4040  unicode name..@@
00000d0: 6f62 6a65 6374 5f65 6e64 4040 0a         object_end@@.

In 1.7.10.2, it looks like this:
auri:qa-resources.git bturner$ git rev-list
--format="%H|%h|%P|%p|%an|%ae|%at%n%B%n@@object_end@@" -1
5c1ccdec5f84aa149a4978f729fdda70769f942f | xxd
0000000: 636f 6d6d 6974 2035 6331 6363 6465 6335  commit 5c1ccdec5
0000010: 6638 3461 6131 3439 6134 3937 3866 3732  f84aa149a4978f72
0000020: 3966 6464 6137 3037 3639 6639 3432 660a  9fdda70769f942f.
0000030: 3563 3163 6364 6563 3566 3834 6161 3134  5c1ccdec5f84aa14
0000040: 3961 3439 3738 6637 3239 6664 6461 3730  9a4978f729fdda70
0000050: 3736 3966 3934 3266 7c35 6331 6363 6465  769f942f|5c1ccde
0000060: 7c30 3263 3738 6263 3339 6163 3631 3932  |02c78bc39ac6192
0000070: 3632 3362 6630 3830 6533 6532 6163 3839  623bf080e3e2ac89
0000080: 3261 3466 3537 3634 637c 3032 6337 3862  2a4f5764c|02c78b
0000090: 637c 7c7c 0a63 6f6d 6d69 7420 7769 7468  c|||.commit with
00000a0: 2075 6e69 636f 6465 206e 616d 650a 0a40   unicode name..@
00000b0: 406f 626a 6563 745f 656e 6440 400a       @object_end@@.

None of the output indicates a NUL is present. I'm wondering if the
old code that rev-list was running (and maybe the code cat-file
appears to still be running) might be ignoring leading NULs until it
finds the first character, or something similar. If that were the
case, had the tester used a name like "ab" (or anything more than one
character), perhaps the commit would have caused unexpected behavior
all along.

I don't know how to dump the binary for the commit object itself; that
may give us better information on what the author segment actually
contains. Unfortunately, the repository has been packed, so there is
no loose object for the commit anymore.

If there's any other information I can provide, please let me know.

Best regards,
Bryan Turner

On 22 May 2012 14:35, Jeff King <peff@xxxxxxxx> wrote:
> On Tue, May 22, 2012 at 12:32:21AM -0400, Jeff King wrote:
>
>> I'm not too surprised this is broken (in fact, I'm surprised it ever
>> really worked). UTF-16, especially representing pure ascii characters,
>> will have embedded NULs. Most of the code assumes that things like names
>> and emails are NUL-terminated and ascii-compatible (so ascii, or some
>> ascii-superset encoding like utf8, iso8859-1, etc). You can store a
>> commit message (and name) in utf-16 if you tell git that you are doing
>> so, but we should be re-encoding it before handling it.
>
> Actually, I take that back. I think storing in utf-16 would probably
> fail. We need to use ascii to even read the headers to get to the
> encoding header to realize that it is in utf-16. So I believe we really
> do only support ascii-superset encodings.
>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]