Shawn Pearce <spearce@xxxxxxxxxxx> writes: > Junio C Hamano <junkio@xxxxxxx> wrote: >> Shawn Pearce <spearce@xxxxxxxxxxx> writes: >> >> > And I'm half-way done with the 64 bit mmap sliding window. You, >> > Junio and I are all hacking on the same tiny bit of core code for >> > different reasons. :-) >> >> Which makes me quite nervous, actually. > > What order do you want the patches in then? > > I'm willing to go before or after Nico's offset changes, and before > or after your 64 bit index changes. > > Either way two of us are going to have to redo our work on top of > the others. I'm finding that I'm basically rewriting the sliding > window code onto your 64 bit version - there's no easy merge here. > And Nico's got the same problem, he's in unpack_delta_entry too. While 64-bit offset exercise was a good way for me to see where the <pack_base+offset> assumption lies in the current code, and while it would have been the right way to do so in the ideal world, I realize that it may not be the best way to do things in the real world for a few reasons: - Always doing u64 arithmetic would be expensive, especially on 32-bit architectures, and doing so conditionally is quite an extra complexity. - Not necessarily everybody has large file offsets, so asking to mmap 8MB at file offset at 6GB may not be possible. Even on systems with large file offsets, truly huge single file is cumbersome to manage than a handful files under 32-bit offset (e.g. using a DVD-RW to back up your packs). - Mozilla may not be the largest project in the world, but it certainly is on the larger side than majority of our target audience, and even its entire history, 450MB, comfortably fits in 32-bit space. So while I merged the 64-bit offset change in "next" branch, I am quite doubtful that it was a good change. As long as we devise an easy way to keep each packfile under manageable size, the current 32-bit arrangement would be more beneficial and practical than doing everything internally in 64-bit. Larger projects, if needed, can use multiple packs, and even a project that _could_ pack everything under 450MB would definitely want to use multiple packs anyway (i.e. one or more huge historical packs and one active pack) to avoid repacking and incremental update cost. Your "mmap parts into windows" update is of a far more practical value than the idx with 64-bit offsets. So I am very inclined to revert my 64-bit offset change from "next". Before declaring it a failure, in order to keep the window open for the future, we would need to make sure that we can change things as needed to do everything we want with 32-bit offset packs. The changes we would need I can think of are in the following areas: - "git repack" needs to be taught to limit the resulting packs under 32-bit offset. I think this is a given, and we would need some updates to the command because it needs to be taught not to touch "huge historical packs" anyway. - "git send-pack" and "git fetch-pack" are Ok -- the receiving end explodes the pack that comes over the wire using unpack-objects into loose objects, and there is no offset involved during the process. Note that pack-objects may need to keep the offset in the stream in u64 if we were to do the offset encoding of the base object, but that is pretty much an independent issue. - "git fetch-pack -k" has a problem. The daemon or sending side runs rev-list piped to pack-objects and there is no limit in the size of the pack generated in the pipeline. Worse yet, the protocol does not allow more than one packs to be sent, so we would need a protocol update to say "the data you asked would result in a pack that is more than 4GB, so I'll send more than one packs -- here is one, here is another, that's all folks". The receiving end needs to be taught to handle this. Note: I am not proposing to do any of the above right now, until there is no need for this by any real project. We just need to be sure that when need arises there is a way out. The only case that can be problematic is when a single object is larger than 4GB deflated, and at that point we would hit a very hard wall of 32-bit offset. But handling such a huge object would have other problems [*1*] anyway, so I think it is Ok to declare that there is a hard limit on deflated object size, at least for now. [*1*] For example, we tend to do everything in core, but doing diffs or anything on such a huge blob should probably be done by swapping in only parts of it at a time). Also we give the whole deflated size to zlib as avail_in, which is "uInt", so on architectures where int is shorter than 64-bit we lose (my 64-bit index patch does not deal with this). This however is something your "mmap parts into windows" update would solve if it were updated to deal with the internal 64-bit offsets. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html