Re: [PATCH v3 5/8] odb: teach read_blob_entry to use size_t

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Junio,

On Fri, 29 Oct 2021, Junio C Hamano wrote:

> "Matt Cooper via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
>
> > -void *read_blob_entry(const struct cache_entry *ce, unsigned long *size)
> > +void *read_blob_entry(const struct cache_entry *ce, size_t *size)
> >  {
> >  	enum object_type type;
> > -	void *blob_data = read_object_file(&ce->oid, &type, size);
> > +	unsigned long ul;
> > +	void *blob_data = read_object_file(&ce->oid, &type, &ul);
> >
> > +	*size = ul;
>
> It is a bit curious place to draw the line; we want to make sure
> that blob_entry can hold huge data, but in this step we do not mind
> read_object_file() is not capable of going full 64-bit?

Indeed that is the case. The consideration here is: how large of a patch
series do I want to take this late in the cycle?

Here is the call tree (for full details, look no further than
https://github.com/git-for-windows/git/pull/3487#issuecomment-950727616):

    read_object_file()
        repo_read_object_file()
            read_object_file_extended()
                read_object()
                    oid_object_info_extended()
                        do_oid_object_info_extended()
                            loose_object_info()
                                parse_loose_header_extended()
                            packed_object_info()
                                cache_or_unpack_entry()
                                    unpack_entry()
                                        unpack_object_header()
                                           unpack_object_header_buffer()
                                get_size_from_delta()
                                    get_delta_hdr_size()

All three leaves have code that needs to be adjusted to use `size_t`
instead of `unsigned long`, and all of the other functions in that call
tree need to be adjusted for that. Some of the callers do not even pass an
`unsigned long` pointer around, but instead a pointer to `struct
object_info` (which, you guessed it, also has an `unsigned long` that
should have been a `size_t` from the beginning).

This is too big a change I am willing to work on, let alone accept, this
late in the cycle.

Sure, it would fix the scenario where clean/smudge filters are not even
involved (read: `git add`ing a 5GB file _directly_). But the potential for
bugs to hide!

> I guess I'll see soon enough why by reading later steps.  I can see
> that for the purpose of making write_entry() aware of the size_t,
> this is necessary at the minimum.

I fear that you won't see more about this in the later steps ;-)

Ciao,
Dscho




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux