On Sat, 09 May 2015 21:39:22 +0200, Andreas Rohner wrote: > On 2015-05-09 06:09, Ryusuke Konishi wrote: >> On Sun, 3 May 2015 12:05:18 +0200, Andreas Rohner wrote: <snip> >>> + lock = bgl_lock_ptr(&sui->nlive_blks_cache_bgl, (unsigned int)group); >>> + spin_lock(lock); >>> + node->values[segnum & ((1 << NILFS_SUFILE_CACHE_NODE_SHIFT) - 1)] += 1; >>> + sui->nlive_blks_cache_dirty = 1; >>> + spin_unlock(lock); >>> + rcu_read_unlock(); >>> + >>> + return 0; >>> +} >> >> Consider using cmpxchg() or atomic_inc(), and using >> NILFS_SUFILE_CACHE_NODE_MASK to mask segnum. Sorry, I meant NILFS_SUFILE_CACHE_NODE_COUNT as below. >> The following is an >> example in the case of using cmpxchg(): >> >> __u32 old, new, *valuep; >> ... >> old = node->values[segnum & (NILFS_SUFILE_CACHE_NODE_COUNT - 1)]; >> do { >> old = ACCESS_ONCE(*valuep); >> new = old + 1; >> } while (cmpxchg(valuep, old, new) != old); >> >> sui->nlive_blks_cache_dirty = 1; >> >> rcu_read_unlock(); >> return 0; >> } >> >> The current atomic_xxxx() macros are actually defined in the same way >> to the reduce overheads in smp environment. >> >> Using atomic_xxxx() is more preferable but formally it requires >> initialization with "atomic_set(&counter, 0)" or "ATOMIC_INIT(0)" for >> every element. I don't know whether initialization with memset() >> function is allowed or not for atomic_t type variables. > > Ok, but I would also have to use cmpxchg() in > nilfs_sufile_flush_cache_node(), if the value needs to be set to 0. > > Currently the cache is only flushed during segment construction, so > there should be no concurrent calls to nilfs_sufile_dec_nlive_blks(). > But I thought it would be best do design a thread safe flush function. You don't have to use cmpxchg() if you just want to set the value to 0 since store operation is already atomic (see implementation of atomic_set). Do you mean atomic substraction is needed in nilfs_sufile_flush_cache_node() ? If so, using cmpxchg() is right. But, note that, under the premise, nilfs_sufile_shrink_cache() may free counters with a non-zero value. At present, this is not a problem because nilfs_sufile_shrink_cache() is called within nilfs_segctor_do_construct(). And, even when we add its callsite in destructor of sufile, ignoring counters with a non-zero value is not critical. <snip> >>> +void nilfs_sufile_shrink_cache(struct inode *sufile) >>> +{ >>> + struct nilfs_sufile_info *sui = NILFS_SUI(sufile); >>> + struct nilfs_sufile_cache_node *node; >>> + struct radix_tree_iter iter; >>> + void **slot; >>> + >> >>> + /* prevent flush form running at the same time */ >> >> "flush from" ? > > Yes that is a typo. It should be "flush from". > >>> + down_read(&NILFS_MDT(sufile)->mi_sem); >> >> This protection with mi_sem seems to be needless because the current >> implementation of nilfs_sufile_shrink_cache() doesn't touch buffers of >> sufile. The delete operation is protected by a spinlock and the >> counter operations are protected with rcu. What does this >> down_read()/up_read() protect ? > > It is intended to protect against a concurrently running > nilfs_sufile_flush_cache() function. This should not happen the way the > functions are called currently, but I wanted to make them thread safe. > > In nilfs_sufile_flush_cache() I use references to nodes outside of the > spinlock, which I am not allowed to do if they can be deallocated at any > moment. Ok, that's reasonable. Thanks for detail explanation. Regards, Ryusuke Konishi > I cannot hold the spinlock for the entire flush, because > nilfs_sufile_flush_cache_node() needs to be able to sleep. I cannot use > a mutex instead of a spinlock, because this would lead to a potential > deadlock: > > nilfs_sufile_alloc_cache_node(): > 1. bmap->b_sem > 2. sui->nlive_blks_cache_lock > > nilfs_sufile_flush_cache_node(): > 1. sui->nlive_blks_cache_lock > 2. bmap->b_sem > > So I decided to "abuse" mi_sem for this purpose, since I already need to > hold mi_sem in nilfs_sufile_flush_cache(). -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html