[PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 22, 2015 at 03:02:45PM +0000, steve.norman@xxxxxxxxxxxxxxxxxx wrote:

> On Friday, May 22, 2015 @ 11:06 AM Duy Nguyen did write:
> 
> > Strange. Maybe there is something else... Anyway some numbers from me.
> > This is nfs3 hosted by Raspberry Pi, accessed over wireless. I just
> > run index-pack on git.git pack instead of full clone.
> > 
> >  - v1.8.4.1 34s
> >  - v1.8.4.2 519s (oh.. wireless..)
> >  - v1.8.4.2 without has_sha1_file() in index-pack.c 33s
> >  - v1.8.4.2 + Jeff's mtime patch 36s
> 
> Just ran the test again and it was 12 seconds.   Too many versions of git available on the 
> machine and I think I might have run the wrong one:

Thanks for re-checking. Here's a series that fixes the rough edges of
the patch I sent earlier. I'd appreciate it if you can re-confirm that
it improves things for you.

  [1/3]: stat_validity: handle non-regular files
  [2/3]: cache.h: move stat_validity definition up
  [3/3]: prepare_packed_git: use stat_validity to avoid re-reading packs

I'm adding Michael to the cc, as I'm abusing the stat_validity code
which we worked on in 2013.

My big concern here is that using stat_validity with a directory is
racy. It works for a regular file like packed-refs because that file is
replaced atomically.  We fill the validity using fstat() on the same
descriptor we read the data from, and nobody modifies an already-written
file. So we know it matches what we read.

For a directory, I don't think that atomicity guarantee is there.
Somebody can modify the direct while we're reading. For the most part, I
think that is OK. We fstat() before reading any entries, so our fstat
data will then become out of date, and we'll re-read next time. It would
be a problem if opendir() looked at the entries at all (e.g., if it
called getdents() under Linux before our first readdir() call, then our
fstat is happening after the first read, and wouldn't match what we
read). I don't have any reason to believe that any libc does that, but
it is making an assumption about how opendir() is implemented.

The other problem is that I'm not sure stat data is enough to notice
when a directory changes. Certainly the mtime should change, but if you
have only one-second resolution on your mtimes, we can be fooled. For a
regular file that is replaced atomically, the inode will change (and
probably the size, too). But I don't know if that is the case for a
directory. Can writing an entry go undetected between two stat() calls
(or something even more pathological, like deleting one entry and
writing another one)?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]