Re: Keeping unreachable objects in a separate pack instead of loose?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jeff King <peff <at> peff.net> writes:
> > Then, the creation of unreferenced objects from successive 'git add' 
> > shouldn't create that many objects in the first place.  They currently 
> > never get the chance to be packed to start with.
> 
> I don't think these objects are necessarily from successive "git add"s.
> That is one source, but they may also come from reflogs expiring. I
> guess in that case that they would typically be in an older pack,
> though.
...
> That is satisfyingly simple, but the storage requirement is quite bad.
> The unreachable objects are very much in the minority, and an 
> occasional duplication there is not a big deal; duplicating all of the 
> reachable objects would double the object directory's size.
...
(I don't think this is a valid generalization for servers)

I am sorry to be coming a bit late into this discussion, but I think there
 is an even worse use case which can cause much worse loose object 
explosions which does not seem to have been mentioned yet:   "the 
server upload rejected case".  For example, think of a client pushing a 
change from the wrong repository to a server.  Since there will be no 
history in common, the client will push the entire repository and if for
 some reason this gets rejected by the server (perhaps a pre-receive 
hook, or a gerrit server which says:  "way too many new changes..."), 
then the pack file may stay abandonned on the server.  When gc runs: 
boom the entire history of that other project will explode but not get
 pruned since the pack file may be fairly new!

I believe that this has happened to us several times fairly recently.  We
 have a tiny project which some people keep confusing for the kernel
and they push a change destined for the kernel to it.  Gerrit rejects it and
their massive packfile (larger than the entire project) stays around.  If gc 
runs, it almost becomes a DOS for us, the sheer number of loose object
files makes the system crawl when accessing that repo, even on an SSD.
 We have been talking about moving to NFS soon (with packfiles git 
should still perform fairly well on NFS), but this explosion really scares 
me.

It seems like the current design is a DOS just waiting to happen for
servers.  While I would love to eliminate the races discussed in this
thread, I think I agree with Ted in that the first fix should just focus on
never expanding loose objects for pruning (if certain objects simply don't 
do well in pack files and the local gc policy says they should be loose, 
go ahead: expand them, but that should be unrelated to pruning).  People
can DOS a server with unused packfiles too, but that rarely will have the
same impact that loose objects would have,

-Martin


-- 
Employee of Qualcomm Innovation Center, Inc. which is a member 
of Code Aurora Forum

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]