回复: [External Mail]Re: Partial-clone cause big performance impact on server

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Another thing I need to point out is, current partial-clone also have big performance impact on disks. Especially on SMR disks.
Some of SMR disk has really bad random write performance (like 100kb/s).
We found typically it takes us 50 minutes to download whole Android project by partial-clone (2 hours and a half without partial-clone)
However it takes us 5 hours to download with partial-clone on those SMR disks.

That's also because the big number wants make the download process to be made of random writes rather than sequence writes.

> > >     3. with GIT_TRACE_PACKET=1. We found on big repositories (200K+refs, 6m+ objects). Git will sends 40k want.
> > >     4. And we then track our server(which is gerrit with jgit). We found the
> server is couting objects. Then we check those 40k objects, most of them are
> blobs rather than commit. (which means they're not in bitmap)
> > >     5. We believe that's the root cause of our problem. Git sends too many "want SHA1" which are not in bitmap, cause the server to count objects frequently, which then slow down the server.
> > >
> > > What we want is, download the things we need to checkout to specific commit. But if one commit contain so many objects (like us , 40k+). It takes more time to counting than downloading.
> > > Is it possible to let git only send "commit want" rather than all the objects SHA1 one by one?
> >
> > On a technical level, it may be possible - at the point in the Git
> > code where the batch prefetch occurs, I'm not sure if we have the
> > commit, but we could plumb the commit information there. (We have the
> > tree, but this doesn't help us here because as far as I know, the tree
> > won't be in the bitmap so the server would need to count objects
> > anyway, resulting in the same problem.)
> >
> > However, sending only commits as wants would mean that we would be
> > fetching more blobs than needed. For example, if we were to clone
> > (with
> > checkout) and then checkout HEAD^, sending a "commit want" for the
> > latter checkout would result in all blobs referenced by the commit's
> > tree being fetched and not only the blobs that are different.
>
> It seems your solution require changes from both server side and client side
> Why not we just add another filter, allow partial-clone always sends commit
> level want?
> If we checkout HEAD~1, then client can send "want HEAD~1 HEAD~2".
>
> > One idea that we (at $DAYJOB) had is to supply a commit hint so that
> > the server can first use bitmaps to narrow down the objects that need
> > to be checked. I had a preliminary patch for that [1] but as of now,
> > no one has continued pursuing that idea.
> >
> > [1]
> > https://lore.kernel.org/git/20201215200207.1083655-1-
> jonathantanmy@goo
> > gle.com/
#/******本邮件及其附件含有小米公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from XIAOMI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!******/#




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux