Re: git archive generates tar with malformed pax extended attribute

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 02, 2019 at 06:58:48PM +0200, René Scharfe wrote:

> > That sounds about right. It's basically every version of every tree that
> > has a symlink. Did it make a noticeable difference in timing? Indexing
> > the whole kernel history is already a horribly slow process. :)
> 
> Right, I didn't notice a difference -- no patience for watching that
> thing to the end.  But here are some numbers for v2.21.0 vs. master with
> the patch:
> 
> Benchmark #1: git fsck
>   Time (mean ± σ):     307.775 s ±  9.054 s    [User: 307.173 s, System: 0.448 s]
>   Range (min … max):   294.052 s … 322.931 s    10 runs
> 
> Benchmark #2: ~/src/git/git fsck
>   Time (mean ± σ):     319.754 s ±  2.255 s    [User: 318.927 s, System: 0.671 s]
>   Range (min … max):   316.376 s … 323.747 s    10 runs
> 
> Summary
>   'git fsck' ran
>     1.04 ± 0.03 times faster than '~/src/git/git fsck'

I guess that's about what I'd expect. The bulk of the time in most repos
will go to fscking the actual blobs, I'd think. But hitting each tree
twice really is noticeable.

> Seeing only a single CPU core being stressed for that long is a bit sad
> to see.  Checking individual objects should be relatively easy to
> parallelize, shouldn't it?

Yes. The fsck code is pretty old, and uses a very simple way of walking
over all of the packs. index-pack (which backs verify-pack these days)
is much smarter, and runs in parallel. It still takes a lock when doing
the actual fsck checks, but most of the time goes to the zlib inflation
and delta reconstruction.

There's some discussion in:

  https://public-inbox.org/git/20180816210657.GA9291@xxxxxxxxxxxxxxxxxxxxx/

and even some patches elsewhere in the thread here:

  https://public-inbox.org/git/20180902075528.GC18787@xxxxxxxxxxxxxxxxxxxxx/

and here:

  https://public-inbox.org/git/20180902085503.GA25391@xxxxxxxxxxxxxxxxxxxxx/

I think the big show-stopper there is how ugly it is to run the pack
verification in a separate process (and I suspect it is not just ugly
from a code point of view, but actively breaks index-pack because it
then relies on the set of objects seen during the first phase to do its
connectivity check).

So there would probably need to be some lib-ification work on index-pack
first, so that we could call it (at least in verification mode) multiple
times from inside fsck.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux