Re: [POC PATCH 5/5] sha1_file: make the pack machinery thread-safe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 10, 2012 at 7:29 PM, Thomas Rast <trast@xxxxxxxxxxx> wrote:
> Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:
>
>> On Fri, Dec 9, 2011 at 3:39 PM, Thomas Rast <trast@xxxxxxxxxxxxxxx> wrote:
>>> More precisely speaking, this pushes the locking down from
>>> read_object() into bits of the pack machinery that cannot (yet) run in
>>> parallel.
>>>
>>> There are several hacks here:
>>>
>>> a) prepare_packed_git() must be called before any parallel accesses
>>>   happen.  It now unconditionally opens and maps all index files.
>>>
>>> b) similarly, prepare_replace_object() must be called before any
>>>   parallel read_sha1_file() happens
>>>
>>> This simplification lets us avoid locking outright to guard the index
>>> accesses; locking is then mainly required for open_packed_git(),
>>> [un]use_pack(), and such.
>>>
>>> The ultimate goal would of course be to let at least _some_ pack
>>> accesses happen without any locking whatsoever.  But grep already
>>> benefits from it with a nice speed boost on non-worktree greps.
>>
>> (I'm running into multithread pack access problem in rev-list..)
>>
>> Why not put the global pointer "struct packed_git *packed_git" to
>> "struct pack_context" and avoid locking entirely? Resource usage is
>> like we run <n> different processes, I think, which is not too bad. We
>> may want to share a few static pack_* variables such as
>> pack_open_fds.. to avoid hitting system limits too fast.
>
> I was hesitating to do that because I think it's not the best solution
> yet.  At least for 64bit systems, I thought of doing some or all of:
>
> * opening/mapping the pack indexes immediately to avoid locking there
>  (perhaps the POC already does this, I haven't looked again).  If you
>  have many packs this isn't cheap because the index must be verified.

Sharing mmapped pack indexes makes sense. We do full mmap there so it
eats address space in 32 bit systems (but still not a lot, linux-2.6
pack index is about 60MB).

I tried but could not find the index verifying code (i.e. recalculate
sha-1 and match with stored one) anywhere, so I suppose opening packs
and indexes is cheap.

> * mapping small packs immediately

We need to partition file handle space to avoid running out of file handles.

> * mapping "the" big pack immediately (many repos will have a huge pack
>  from the initial clone)

We have sliding pack windows exactly for that: accessing >4GB packs in
32bit systems. So address space should not be an issue here.

> Put another way, my current concern is that on 64bit systems it's
> incredibly easy to share (who cares about a few GBs of mmap()?), whereas
> on 32bit systems it probably matters much more, but there we also suffer
> more from not sharing.

Having said all that, lock-free pack access does not work for me yet.
I keep get crashes deep in cache_or_unpack_entry :(
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]