Re: About struct blockgroup_lock

Ryan Wang <openspace.wang@xxxxxxxxx> · Tue, 18 May 2010 17:27:41 +0800



2010/5/16 Simon Kitching <simon.kitching@xxxxxxxxxx>:
> On Sat, 2010-05-15 at 22:58 +0800, Ryan Wang wrote:
>> Hi,
>>
>>      Recently I was reading some code about filesystems, and I found
>> the usage of struct blockgroup_lock is confusing. I have 3 questions
>> about it, and please help give me some instuctions on them.
>>
>> http://lxr.linux.no/#linux+v2.6.32/include/linux/blockgroup_lock.h
>>
>> 1)  What is the relationship between NR_CPUS and NR_BG_LOCKS ?
>> 2)  And how to understand the usage of struct blockgroup_lock?
>> 3)  What is the relationship between block_group and NR_BG_LOCKS, e.g.
>> in blk_lock_ptr?
>>
>>   56static inline spinlock_t *
>>   57bgl_lock_ptr(struct blockgroup_lock *bgl, unsigned int block_group)
>>   58{
>>   59        return &bgl->locks[(block_group) & (NR_BG_LOCKS-1)].lock;
>>   60}
>
> I'm no expert, but your question sounded interesting so I took a look.
>
> >From my reading of the code, the BG_LOCKS stuff is a "heuristic" to
> improve performance on multi-CPU systems.
>
> An ext2/ext3 system keeps information about the free (unused) blocks in
> the filesystem. When a thread calls into the filesystem
> to allocate a new block for a file (or allocate an inode, or release an
> existing used block back onto the freelist), it must modify freelist
> info. And to avoid race conditions, it must take a lock before modifying
> the structure.
>
> Presumably, on a system running multiple threads this operation can be
> contended, ie throughput is harmed by threads waiting to obtain locks.
> So (as I read it), ext2 (and ext3) split their freelist information into
> N independent pieces ("groups").
>
> Ideally, there would then be a separate lock for each group, and N
> threads would then be able to allocate blocks from the filesystem in
> parallel, where N=#of groups.
>
> But locks do take up memory, and there is no point in having N locks
> when the number of threads actually doing filesystem ops in parallel is
> much less than N. So AIUI this header is effectively a "heuristic" that
> attempts to allocate a reasonable number of locks, so that performance
> is not damaged by too few locks but memory is not wasted by having too
> many locks. A modulo operation then maps "group" number onto a lock; the
> "NR_BG_LOCKS must be power of two" comment is there so that bit-and can
> be used as a modulo operation.
>
> Header defines:
>
> static inline spinlock_t *
> bgl_lock_ptr(struct blockgroup_lock *bgl, unsigned int block_group)
> {
>    return &bgl->locks[(block_group) & (NR_BG_LOCKS-1)].lock;
> }
>
> So when NR_CPUS=2, NR_BG_LOCKS is 8. And calling bgl_lock_ptr for group #22 (freelist fragment#22) will then return lock#6 (22%8) while operations on group#23 use lock#7. As they
> use different locks, the freespace-update operations can occur in parallel.
>
> When NR_CPUS=4096, 128 locks are allocated; more memory is therefore
> needed to hold the locks but presumably on a large system the reduced
> lock contention will be worth the memory use.
>
> Of course this assumes that each thread choses a different freelist
> "group" - but even if they chose randomly, performance would be better
> than just having one lock.
>
> IMO, using NR_CPUS (a compile-time constant) isn't a very elegant way to
> do this; perhaps some kind of "on demand" system where the number of
> locks increases depending on the actual amount of contention occurring
> would be nicer. But of course that takes overhead to measure and adapt.
> For a start, NR_CPUS doesn't indicate how many cpus are actually online
> in the system. But I presume that in practice, the hard-wired approach
> works well enough; currently anybody with a large system probably
> recompiles their own kernel with a suitable value for NR_CPUS.
>
> It would be interesting to know how many "groups" ext2 divides its
> freespace into. Does it also chose a value based on the size of the
> server the filesystem was created on?
>
> Warning: I could be wrong about any of the above; this is just my
> understanding from a brief look at the code.
>
> Cheers
> Simon
>
>

Thanks！

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ