Re: [PATCH v2 0/7] Allow clean/smudge filters to handle huge files in the LLP64 data model

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021-10-28 at 20:50:30, Johannes Schindelin via GitGitGadget wrote:
> This patch series came in via the Git for Windows fork
> [https://github.com/git-for-windows/git/pull/3487], and I intend to merge it
> before v2.34.0-rc0, therefore I appreciate every careful review you gentle
> people can spare.
> 
> The x86_64 variant of Windows uses the LLP64 data model, where the long data
> type is 32-bit. This is very different from the LP64 data model used e.g. by
> x86_64 Linux, where unsigned long is 64-bit.
> 
> Most notably, this means that sizeof(unsigned long) != sizeof(size_t) in
> general.
> 
> However, since Git was born in the Linux ecosystem, where that inequality
> does not hold true, it is understandable that unsigned long is used in many
> code locations where size_t should have been used. As a consequence, quite a
> few things are broken e.g. on Windows, when it comes to 4GB file contents or
> larger.
> 
> Using Git LFS [https://git-lfs.github.io/] trying to work around such issues
> is one such a broken scenario. You cannot git checkout, say, 5GB files. Huge
> files will be truncated to whatever the file size is modulo 4GB (in the case
> of a 5GB file, it would be truncated to 1GB).
> 
> This patch series primarily fixes the Git LFS scenario, by allowing clean
> filters to accept 5GB files, and by allowing smudge filters to produce 5GB
> files.
> 
> The much larger project to teach Git to use size_t instead of unsigned long
> in all the appropriate places is hardly scratched by this patch series.
> 
> Side note: The fix for the clean filter included in this series does not
> actually affect Git LFS! The reason is that Git LFS marks its filter as
> required, and therefore Git streams the file contents to Git LFS via a file
> descriptor (which is unaffected by LLP64). A "clean" filter that is not
> marked as required, however, lets Git take the code path that is fixed by
> this patch series.

This series seems sane to me.  I'm delighted we can fix this issue with
so little code, since it's a been a very inconvenient problem for a lot
of Windows users.

I might suggest when we make the giant transition project in the future
that we use size_t for things that are going to be in memory and off_t
or, if necessary, intmax_t for general object sizes so it's clear which
one we want.  However, that has no effect on this series since this
intentionally has a limited scope.
-- 
brian m. carlson (he/him or they/them)
Toronto, Ontario, CA

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux