> I couldn't help noticing that the interface to the packs data is > a bit complex: > > unsigned char *use_pack(struct packed_git *p, > struct pack_window **window, > unsigned long offset, > unsigned int *left); > void unuse_pack(struct pack_window **w); > > Or am I missing something very obvious, and something like this > is just not feasible for some reasons? The use counter. Every time someone asks for a pointer into the pack they need to lock that window into memory to prevent us from garbage collecting it by unmapping it to make room for another window that the application needs.
I think the counters can be kept in struct packed_git somewhere. Given mmap granularity, and the fact that not all of the pack is used in normal case (and the granularity help us in the worst case) the memory used up by the page counters shouldn't be too much.
> I was almost about to move your code into unpack_object_header_gently, > but ... The header isn't that big, is it? It is variable in the pack, > but the implementation of the parser is at the moment restricted by > the type we use for object size (unsigned long for the particular > platform). For example: All true. However what happens when the header spans two windows? Lets say I have the first 4 MiB mapped and the next 4 MiB mapped in a different window; these are not necessarily at the same locations within memory. Now if an object header is split over these two then some bytes are at the end of the first window and the rest are at the start of the next window.
Assuming these are adjacent windows, we can just increment counters on the all touched pages (at least the two together) and return the pointer into the lowest page. Otherwise - time for garbage collection (why produce the garbage at all, btw?) and remap.
I can't just say "make sure we have at least X bytes available before starting to decode the header, as to do that in this case we'd have to unmap BOTH windows and remap a new one which keeps that very small header fully contiguous in memory. That's thrashing the VM page tables for really no benefit.
You can't mmap less than a page, can you? So it's actually never a small portion, but at least 4k on x86.
> (BTW, current unpack_object_header_gently does not use it's len > argument to check if there actually is enough data to hold at least > minimal header. Is the size of mapped data checked for correctness > somewhere before?) Yes. Somewhere. I think we make sure there's at least 20 bytes in the pack remaining before we start to decode a header. We must have at least 20 as that's the trailing SHA1 checksum of the entire pack. :-)
Hmm :) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html