On Mon, 14 Jan 2008, Junio C Hamano wrote: > > If we are using different types anyway, we might want to start > using time_t (a worse alternative is ulong which we use for > timestamps everywhere else, which we probably want to convert to > time_t as well). Careful. There are two issues, one trivial one and one important one: (a) trivially, right now, the code depends on the fact that the in-memory structure is actually smaller than the on-disk one, to avoid having to estimate the size of the allocation for the in-memory array. That was a matter of gettign a quickly working and efficient patch (we do *not* want to allocate those initial "struct cache_entry" entries one by one, we want to allocate one big block!) This should be pretty easy to fix up, by just taking the sizes and number of entries (which we do know) into account of the initial allocation. However, it's made a bit more interesting by the differing alignment of the "name" part (and the fact that we align each individual on-disk and in-memory structure). (b) More importantly, the on-disk structures DO NOT CONTAIN the whole stat information! The classic example of this is "ce_size": it's 32-bit, but it works even if you have a file that is larger than 32 bits in size! It just means that from a stat comparison standpoint, we only compare the low 32 bits! This means that if you make "ce_size" be a "loff_t", for example, you still need to then *compare* it in just an "unsigned int", because the upper bits aren't zero - they are "nonexistent". that (b) is important, and is why some of the code changed from - if (ce->ce_ino != htonl(st->st_ino)) + if (ce->ce_ino != (unsigned int) st->st_ino) ie note how this didn't just remove the "htonl()", it replaced it by a "truncate to 'unsigned int'"! So the fact that the types aren't necessarily the "native" types is actually *important*. > Is there still a reason to insist that ce_flags should be a > single field that is multi-purposed for storing stage, namelen > and other flags? Wouldn't the code become even simpler and > safer if we separated them into individual fields? For example, > a piece like this: No reason for that part, except I wanted to make this particular initial patch be as minimal as possible. > I somehow had this impression that it was a huge deal to you > that we do not have to read and populate each cache entry when > reading from the existing index file, and thought that was the > reason why we mmap and access the fields in network byte order. > If that was my misconception, then I agree this is a good change > to make everything else easier to write and much less error > prone. I was a bit worried about it, but I did make sure that the allocation is done as one single allocation, and I did time it. Doing a git update-index --refresh seems to be identical before and after, so the costs of conversion are either very small or are possibly counteracted by the fact that we then can avoid the byte-order conversion of individual words less at run-time. Linus - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html