2010/5/16 Simon Kitching <simon.kitching@xxxxxxxxxx>: > On Sat, 2010-05-15 at 22:58 +0800, Ryan Wang wrote: >> Hi, >> >> Recently I was reading some code about filesystems, and I found >> the usage of struct blockgroup_lock is confusing. I have 3 questions >> about it, and please help give me some instuctions on them. >> >> http://lxr.linux.no/#linux+v2.6.32/include/linux/blockgroup_lock.h >> >> 1) What is the relationship between NR_CPUS and NR_BG_LOCKS ? >> 2) And how to understand the usage of struct blockgroup_lock? >> 3) What is the relationship between block_group and NR_BG_LOCKS, e.g. >> in blk_lock_ptr? >> >> 56static inline spinlock_t * >> 57bgl_lock_ptr(struct blockgroup_lock *bgl, unsigned int block_group) >> 58{ >> 59 return &bgl->locks[(block_group) & (NR_BG_LOCKS-1)].lock; >> 60} > > I'm no expert, but your question sounded interesting so I took a look. > > >From my reading of the code, the BG_LOCKS stuff is a "heuristic" to > improve performance on multi-CPU systems. > > An ext2/ext3 system keeps information about the free (unused) blocks in > the filesystem. When a thread calls into the filesystem > to allocate a new block for a file (or allocate an inode, or release an > existing used block back onto the freelist), it must modify freelist > info. And to avoid race conditions, it must take a lock before modifying > the structure. > > Presumably, on a system running multiple threads this operation can be > contended, ie throughput is harmed by threads waiting to obtain locks. > So (as I read it), ext2 (and ext3) split their freelist information into > N independent pieces ("groups"). > > Ideally, there would then be a separate lock for each group, and N > threads would then be able to allocate blocks from the filesystem in > parallel, where N=#of groups. > > But locks do take up memory, and there is no point in having N locks > when the number of threads actually doing filesystem ops in parallel is > much less than N. So AIUI this header is effectively a "heuristic" that > attempts to allocate a reasonable number of locks, so that performance > is not damaged by too few locks but memory is not wasted by having too > many locks. A modulo operation then maps "group" number onto a lock; the > "NR_BG_LOCKS must be power of two" comment is there so that bit-and can > be used as a modulo operation. > > Header defines: > > static inline spinlock_t * > bgl_lock_ptr(struct blockgroup_lock *bgl, unsigned int block_group) > { > return &bgl->locks[(block_group) & (NR_BG_LOCKS-1)].lock; > } > > So when NR_CPUS=2, NR_BG_LOCKS is 8. And calling bgl_lock_ptr for group #22 (freelist fragment#22) will then return lock#6 (22%8) while operations on group#23 use lock#7. As they > use different locks, the freespace-update operations can occur in parallel. > > When NR_CPUS=4096, 128 locks are allocated; more memory is therefore > needed to hold the locks but presumably on a large system the reduced > lock contention will be worth the memory use. > > Of course this assumes that each thread choses a different freelist > "group" - but even if they chose randomly, performance would be better > than just having one lock. > > IMO, using NR_CPUS (a compile-time constant) isn't a very elegant way to > do this; perhaps some kind of "on demand" system where the number of > locks increases depending on the actual amount of contention occurring > would be nicer. But of course that takes overhead to measure and adapt. > For a start, NR_CPUS doesn't indicate how many cpus are actually online > in the system. But I presume that in practice, the hard-wired approach > works well enough; currently anybody with a large system probably > recompiles their own kernel with a suitable value for NR_CPUS. > > It would be interesting to know how many "groups" ext2 divides its > freespace into. Does it also chose a value based on the size of the > server the filesystem was created on? > > Warning: I could be wrong about any of the above; this is just my > understanding from a brief look at the code. > > Cheers > Simon > > Thanks! -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ