Re: GSoC 2009 Prospective student

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 23 Feb 2009, Shawn O. Pearce wrote:
> Jakub Narebski <jnareb@xxxxxxxxx> wrote:
>> Nicolas Pitre <nico@xxxxxxx> writes:
>>> On Sun, 22 Feb 2009, Miklos Vajna wrote: 
>>>> 
>>>> http://thread.gmane.org/gmane.comp.version-control.git/55254/focus=55298
>>>> 
>>>> Especially Shawn's message, which can be a base for your proposal, if
>>>> you want to work in this.
>>> 
>>> I don't particularly agree with Shawn's proposal.  Reliance on a stable 
>>> sorting on the server side is too fragile, restrictive and cumbersome.
> 
> We already rely on a stable sort in the tree format. [...]

I (and Nicolas) by 'sorting order' mean here ordering of objects and
deltas in the pack file, i.e. whether we get _exactly_ the same (byte
for byte) packfile for the same want/have exchange (your proposal), or
even for the same arguments to git-pack-objects (which is a necessary,
although I think not sufficient condition).

[...]
>> I think it is possible for dumb protocols (using commit walkers) and
>> for (deprecated) rsync.
> 
> Yes, it is possible for the commit walkers to implement a restart,
> as they are actually beginning at the current root and walking back
> in history.  Resuming a large file like a pack is easy to do on HTTP
> if the remote server supports byte range serving.  Its also easy
> to validate on the client that the pack wasn't repacked during the
> idle period (between initial fetch and restart), just validate the
> SHA-1 footer.  If the pack was repacked and came up with the same
> name you'll have a mismatch on the footer.  Discard and try again.

Can we assume that packfiles are named correctly (i.e. name of packfile
match SHA-1 footer)?

>
> And if you want to save bandwidth, always grab the last 20 bytes
> of the file before getting any other parts, save it somewhere,
> and revalidate that last 20 before resuming.  If its changed,
> you should discard what you have and start over from the beginning.

Therefore I think that restartable clone for "dumb" (commit walker)
protocols is easy GSoC project, while restartable clone for "smart"
(generate packfile) protocols is at least of medium difficulty, and
might be harder.

>>> I think restartable clone is a really bad suggestion for SOC students.  
>>> After all we want successful SOC projects, not ones that even core git 
>>> developers did not yet find a good solution for.
>>> 
>>> IMHO of course.
>> 
>> But I agree that within current limits (as far as I know there are no
>> way to ask for SHA-1; you can only ask for refs for security reasons)
>> it would be difficult to very difficult to add restartable clone
>> support to native (smart) protocols.
>> 
>> If not for this limitation it would be, I think, possible to do a kind
>> of fsck, checking which commits in packfile are complete (i.e. have
>> all objects), and based on that ask for subset of objects.  This would
>> require support only from a client... alas, this is not possible.
> 
> I think the current "must want advertised ref" restriction is
> too strict.  If you make the server check the reachability of the
> wanted object, (assuming it can be resolved to a commit) then you
> can pick up in the middle of history.  We already (to some extent)
> support that with the deepen thing in a shallow clone.  Sure, it
> may cause more server load when clients ask for this partial fetch.

Hmmm... I forgot about shallow clone.


Still, we can have the following situation:

  *---*---o---.---.---. ....  .---o---*---*   <-- some ref

      ^                               ^
      |                               |
      a                               b

where '*' means that we have commit and all its object fully in packfile
(i.e. if they are delta, there is base for delta in packfile), 'o' means
incomplete, for example commit with some o blobs missing, and '.' means
missing commit object.

Because git deals with continuous range, we can tell on restart of clone
that we have 'a', and that we want 'b', but without further extensions
to git protocols, where we can tell that we have some objects (to
exclude), but not assume anything about their requirements; something
that if I remember correctly was implemented in some floating 'lazy
clone' patch (well, lazy loading of blobs patch)...

[...]
> So, IMHO, the restriction that a commit must be advertised, and not
> merely reachable, is overly strict and doesn't buy us a whole lot.
>  
>> I think that unless 'restartable clone' is limited to commit wakers
>> (HTP protocol etc.) it should be moved up the diffuculty from "New to
>> Git?" section. I guess that mirror-sync, formerly GitTorrent, could be
>> easier to implement.
> 
> Maybe.  But a simple stable sort on the objects makes it easier,
> perhaps within reach of "new to git".

As Nico said in the presence of threaded packing ordering of _objects_
on _packfile_ might be not deterministic.

> 
> That ideas page is a wiki for a reason.  If folks feel differently
> from me, please edit it to improve things!  :-)

I'll try to add 'pack file cache for git-daemon' proposal to 
GSoC2009Ideas page... but I cannot be mentor (or even co-mentor) for
this idea.
-- 
Jakub Narebski
Poland
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux