On 05/31, Thomas Rast wrote: > As far as 174fea4 goes: I think you could split it into two separate > commits, one for the code movement (i.e., refactoring of a large chunk > of read_index_from into read_index_v2_from), and a second one for the > introduction of the separate header struct. Other than that it looks > good. Thanks for your review. I rebased the commits now, 174fea4 is now 5eb9c21 and 946888d. There were also two commits to the python code, which I moved previous of the refactoring commits. (210d1ff and b99fdf8) > The rest would look like this: > > > cache.h | 32 ++++- > > read-cache.c | 315 ++++++++++++++++++++++++++++++++++++++++++++++--- > > 5 files changed, 334 insertions(+), 29 deletions(-) > > The review follows, though note that as usual I have snipped all hunks > where I didn't have anything to add: > > > @@ -108,6 +108,13 @@ struct cache_header_v2 { > > unsigned int hdr_entries; > > }; > > > > +struct cache_header_v5 { > > + unsigned int hdr_ndir; > > + unsigned int hdr_nfile; > > + unsigned int hdr_fblockoffset; > > + unsigned int hdr_nextension; > > +}; > > + > > #define INDEX_FORMAT_LB 2 > > #define INDEX_FORMAT_UB 4 > > Somewhat confusingly, the non-update to INDEX_FORMAT_UB is correct since > that is only used by git-update-index to verify that a user request for > the *write* format is valid, and we don't currently support writing v5. I left them as they are, since they aren't used in the read code, so I thought this would be fine. > > @@ -132,11 +139,27 @@ struct cache_entry { > > unsigned int ce_size; > > unsigned int ce_flags; > > unsigned char sha1[20]; > > + int ce_stat_crc; > > This should probably be an unsigned int for correctness. The docs for > zlib actually say it has type 'uLong', whatever that might mean. Don't > turn it into an unsigned long though, that would be 64 bits at least on > x86_64 Linux. I digged a bit in the git code, uint32_t seems to be the right way to do it. (as in csum-file.h) I converted it to that for now. > > + unsigned int ce_namelen; > > You are using this to hack around the issue that we dropped the name > length out of the flags to save bits. Perhaps it would be a nice > cleanup to keep it in the struct "in the open", instead of having the > odd ce_namelen() wrapper that amounts to a maybe-strlen. I guess that may better fit in a refactoring patch? I put it in e73adcf. > > +struct directory_entry { > > + unsigned short de_flags; > > + unsigned int de_foffset; > > + unsigned int de_cr; > > + unsigned int de_ncr; > > + unsigned int de_nsubtrees; > > + unsigned int de_nfiles; > > + unsigned int de_nentries; > > + unsigned char sha1[20]; > > + struct directory_entry *next; > > + unsigned int de_pathlen; > > + char pathname[FLEX_ARRAY]; > > +}; > > Why are the flags out of order compared to the on-disk layout? > > Also, if you make a linked list it would be more natural (to me) to put > the 'next' member at the top. I suspect the precedent in 'cache_entry' > may be from times where that was the ondisk layout, and thus any > not-on-disk entries had to be at the end. Perhaps Junio can clarify > this. They were in order at the beginning, before I moved the flags to the end in the ondisk format, and forgot to move them in the struct. It's in order now. I also moved the next member to the top. I originally put it at the end, because it was at the end for the cache_entry too. > > -extern void read_index_v2_from(struct index_state *, struct stat, void *mmap, int); > > +extern void read_index_v2(struct index_state *, struct stat, void *mmap, int); > > +extern void read_index_v5(struct index_state *, struct stat, void *mmap, int); > > You should make up your mind in the refactoring patch ;-) I thing read_index_v2 makes more sense, since from sounds like we expect a pathname. > > diff --git a/read-cache.c b/read-cache.c > > index 750fbfa..fc8033a 100644 > > --- a/read-cache.c > > +++ b/read-cache.c > > @@ -397,8 +397,9 @@ int df_name_compare(const char *name1, int len1, int mode1, > > > > int cache_name_compare(const char *name1, int flags1, const char *name2, int flags2) > > { > > - int len1 = flags1 & CE_NAMEMASK; > > - int len2 = flags2 & CE_NAMEMASK; > > + /* TODO: This possibly can be replaced with something faster */ > > + int len1 = strlen(name1); > > + int len2 = strlen(name2); > > int len = len1 < len2 ? len1 : len2; > > int cmp; > > Now that you have the ce_namelen entry, shouldn't you use that here? > > If it is not filled in correctly by the other readers, you'd have to > patch them, preferably in an earlier patch where you introduce this > field (only -- not doing any v5 work yet). Yes, there would be more patching needed, since the function uses the flags too. The patch is not yet in the repository. > > +static int verify_hdr_v5(void *mmap) > > +{ > > + uint32_t crc; > > + uint32_t* filecrc; > > + unsigned int header_size_v5; > > + struct cache_version_header *hdr; > > + struct cache_header_v5 *hdr_v5; > > + > > + hdr = mmap; > > + hdr_v5 = mmap + sizeof(*hdr); > > + /* Size of the header + the size of the extensionoffsets */ > > + header_size_v5 = sizeof(*hdr_v5) + hdr_v5->hdr_nextension * 4; > > + /* Initialize crc */ > > + crc = crc32(0, (Bytef*)hdr, sizeof(*hdr)); > > + crc = crc32(crc, (Bytef*)hdr_v5, header_size_v5); > > Can't you crc32() this block in one go? Yes, I just thought it's clearer this way, calculating the crc for the signature + version and the rest of the header_v5. I changed it for now to one block. > > + filecrc = mmap + sizeof(*hdr) + header_size_v5; > > + if (crc != ntohl(*filecrc)) > > + return error("bad index file header crc signature"); > > + return 0; > > +} > > + > > static int read_index_extension(struct index_state *istate, > > const char *ext, void *data, unsigned long sz) > > { > > @@ -1315,23 +1361,71 @@ static struct cache_entry *cache_entry_from_ondisk(struct ondisk_cache_entry *on > > { > > struct cache_entry *ce = xmalloc(cache_entry_size(len)); > > > > - ce->ce_ctime.sec = ntoh_l(ondisk->ctime.sec); > > - ce->ce_mtime.sec = ntoh_l(ondisk->mtime.sec); > > + ce->ce_ctime.sec = ntoh_l(ondisk->ctime.sec); > > + ce->ce_mtime.sec = ntoh_l(ondisk->mtime.sec); > > ce->ce_ctime.nsec = ntoh_l(ondisk->ctime.nsec); > > ce->ce_mtime.nsec = ntoh_l(ondisk->mtime.nsec); > > - ce->ce_dev = ntoh_l(ondisk->dev); > > - ce->ce_ino = ntoh_l(ondisk->ino); > > - ce->ce_mode = ntoh_l(ondisk->mode); > > - ce->ce_uid = ntoh_l(ondisk->uid); > > - ce->ce_gid = ntoh_l(ondisk->gid); > > - ce->ce_size = ntoh_l(ondisk->size); > > - ce->ce_flags = flags; > > + ce->ce_dev = ntoh_l(ondisk->dev); > > + ce->ce_ino = ntoh_l(ondisk->ino); > > + ce->ce_mode = ntoh_l(ondisk->mode); > > + ce->ce_uid = ntoh_l(ondisk->uid); > > + ce->ce_gid = ntoh_l(ondisk->gid); > > + ce->ce_size = ntoh_l(ondisk->size); > > + ce->ce_flags = flags; > > + ce->ce_namelen = len; > > AFAICS all but the last one only change the alignment of the > assignments. Only the last addition is a true change, related to the > addition of the ce_namelen field. It would be far easier on the > reviewers if you first did the alignment cleanup (it can then be checked > to be a no-change patch simply with --ignore-space-change), and then the > ce_namelen, before any v5 work. Ok, this is changed in e73adcf. The alignment is done in bbc8928. Hope understood the alignment cleanup right. > > +static struct directory_entry *read_directories_v5(unsigned int dir_offset, > > + unsigned int ndir, > > + void *mmap, > > + int mmap_size) > > +{ > > + int i; > > + uint32_t crc; > > + uint32_t* filecrc; > > + struct directory_entry *entries = NULL; > > + struct directory_entry *current = NULL; > > + > > + for (i = 0; i < ndir; i++) { > > + struct ondisk_directory_entry *disk_de; > > + struct directory_entry *de; > > + unsigned int data_len; > > + unsigned int len; > > + char *name; > > + > > + name = (char *)mmap + dir_offset; > > + len = strlen(name); > > + disk_de = (struct ondisk_directory_entry *) > > + ((char *)mmap + dir_offset + len + 1); > > + de = directory_entry_from_ondisk(disk_de, name, len); > > + > > + if (entries == NULL) { > > + entries = de; > > + current = de; > > + } else { > > + current->next = de; > > + current = current->next; > > + current->next = NULL; > > + } > > + > > + /* Length of pathname + nul byte for termination + size of > > + * members of ondisk_directory_entry. (Just using the size > > + * of the stuct doesn't work, because there may be padding > > + * bytes for the struct) > > + */ > > Style nit: > > > + data_len = len + 1 > > + + sizeof(disk_de->flags) > > + + sizeof(disk_de->foffset) > > + + sizeof(disk_de->cr) > > + + sizeof(disk_de->ncr) > > + + sizeof(disk_de->nsubtrees) > > + + sizeof(disk_de->nfiles) > > + + sizeof(disk_de->nentries) > > + + sizeof(disk_de->sha1); > > How does this differ from > > data_len = len + 1 + sizeof(struct ondisk_directory_entry); > > ? There is only a single 16-bit entry in struct ondisk_directory_entry for which the compiler adds padding so the size of the struct is a multiple of 32-bit. And since I'm not sure every compiler does that, I had to do it this way. (And adding a -2 magic number doesn't seem cleaner to me eiter) > > + > > + crc = crc32(0, (Bytef*)(mmap + dir_offset), data_len); > > + filecrc = mmap + dir_offset + data_len; > > + if (crc != ntoh_l(*filecrc)) > > + goto unmap; > > The CRC checking code in general is a bit noisy on my eyes, increasing > the risk that I would miss a mistake in one of the calls. I wonder if > you could refactor it as something like > > static int check_crc32_chunk(void *data, size_t len, unsigned int expected_crc); > > Then again the pointer math would still be on the caller side. Sigh. Yes, that makes sense, I changed it. > > + > > + dir_offset += data_len + 4; /* crc code */ > > + } > > + > > + return entries; > > +unmap: > > + munmap(mmap, mmap_size); > > + errno = EINVAL; > > + die("directory crc doesn't match for '%s'", current->pathname); > > +} > > I can see there's precedent for this in read_index_v2 (as of your patch; > it used to be in read_index_from); but what's the point of setting errno > immediately before not using it and exiting? I just copy and pasted it from there, but thinking about it id doesn't make sense. > That's all for today. Thanks for reading. Thanks a lot for reviewing my code. -- Thomas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html