On 2021-10-28 at 20:50:30, Johannes Schindelin via GitGitGadget wrote: > This patch series came in via the Git for Windows fork > [https://github.com/git-for-windows/git/pull/3487], and I intend to merge it > before v2.34.0-rc0, therefore I appreciate every careful review you gentle > people can spare. > > The x86_64 variant of Windows uses the LLP64 data model, where the long data > type is 32-bit. This is very different from the LP64 data model used e.g. by > x86_64 Linux, where unsigned long is 64-bit. > > Most notably, this means that sizeof(unsigned long) != sizeof(size_t) in > general. > > However, since Git was born in the Linux ecosystem, where that inequality > does not hold true, it is understandable that unsigned long is used in many > code locations where size_t should have been used. As a consequence, quite a > few things are broken e.g. on Windows, when it comes to 4GB file contents or > larger. > > Using Git LFS [https://git-lfs.github.io/] trying to work around such issues > is one such a broken scenario. You cannot git checkout, say, 5GB files. Huge > files will be truncated to whatever the file size is modulo 4GB (in the case > of a 5GB file, it would be truncated to 1GB). > > This patch series primarily fixes the Git LFS scenario, by allowing clean > filters to accept 5GB files, and by allowing smudge filters to produce 5GB > files. > > The much larger project to teach Git to use size_t instead of unsigned long > in all the appropriate places is hardly scratched by this patch series. > > Side note: The fix for the clean filter included in this series does not > actually affect Git LFS! The reason is that Git LFS marks its filter as > required, and therefore Git streams the file contents to Git LFS via a file > descriptor (which is unaffected by LLP64). A "clean" filter that is not > marked as required, however, lets Git take the code path that is fixed by > this patch series. This series seems sane to me. I'm delighted we can fix this issue with so little code, since it's a been a very inconvenient problem for a lot of Windows users. I might suggest when we make the giant transition project in the future that we use size_t for things that are going to be in memory and off_t or, if necessary, intmax_t for general object sizes so it's clear which one we want. However, that has no effect on this series since this intentionally has a limited scope. -- brian m. carlson (he/him or they/them) Toronto, Ontario, CA
Attachment:
signature.asc
Description: PGP signature