Re: [PATCH] Fix in Git.pm cat_blob crashes on large files (resubmit with reviewed-by)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for all the input and patience.

On Fri, Feb 22, 2013 at 10:34 AM, Jeff King <peff@xxxxxxxx> wrote:
> On Fri, Feb 22, 2013 at 09:30:57AM -0800, Joshua Clayton wrote:
>
>> Read and write each 1024 byte buffer, rather than trying to buffer
>> the entire content of the file.
>
> OK. Did you ever repeat your timing with a larger symmetric buffer? That
> should probably be a separate patch on top, but it might be worth doing
> while we are thinking about it.
>
>> Previous code would crash on all files > 2 Gib, when the offset variable
>> became negative (perhaps below the level of perl), resulting in a crash.
>
> I'm still slightly dubious of this, just because it doesn't match my
> knowledge of perl (which is admittedly imperfect). I'm curious how you
> diagnosed it?

I first had the memory exhaustion problem running my git repo on a 32 vm.
After bumping the memory from 512 to 4 GiB, and that failing to fix it
I moved to my workstation with 16 GiB
...reproduced
After the initial crash, I added

print $size, " ", $bytesToRead, " ", $bytesRead, "\n";

right before the read command, and it does indeed crash right after
the $bytesRead variable crosses LONG_MAX
...
2567089913 1024 2147482624
2567089913 1024 2147483648
2567089913 1024 2147484672
Offset outside string at /usr/share/perl5/Git.pm line 901, <GEN36> line 2604.

Note that $bytesRead is still positive.
I know very little perl, but that symptom seems pretty clear

>
>> On a 32 bit system, or a system with low memory it might crash before
>> reaching 2 GiB due to memory exhaustion.
>>
>> Signed-off-by: Joshua Clayton <stillcompiling@xxxxxxxxx>
>> Reviewed-by: Jeff King <peff@xxxxxxxx>
>
> The commit message is a good place to mention any side effects, and why
> they are not a problem. Something like:
>
>   The previous code buffered the whole blob before writing, so any error
>   reading from cat-file would result in zero bytes being written to the
>   output stream.  After this change, the output may be left in a
>   partially written state (or even fully written, if we fail when
>   parsing the final newline from cat-file). However, it's not reasonable
>   for callers to expect anything about the state of the output when we
>   return an error (after all, even with full buffering, we might fail
>   during the writing process).  So any caller which cares about this is
>   broken already, and we do not have to worry about them.
>
>> ---
>>  perl/Git.pm |   12 +++++-------
>>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> The patch itself looks fine to me.
>
> -Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]