Thanks for the review, Honza. > On Wed 28-05-14 09:55:10, Niu Yawei wrote: >> Remove dqptr_sem to make quota code scalable: Remove the dqptr_sem, >> accessing inode->i_dquot now protected by dquot_srcu, and changing >> inode->i_dquot is now serialized by dq_data_lock. > The patch is mostly fine. Just some minor comments below. > > Honza > >> Signed-off-by: Lai Siyao <lai.siyao@xxxxxxxxx> >> Signed-off-by: Niu Yawei <yawei.niu@xxxxxxxxx> >> --- >> fs/quota/dquot.c | 105 +++++++++++++++++++------------------------------ >> fs/super.c | 1 - >> include/linux/quota.h | 1 - >> 3 files changed, 41 insertions(+), 66 deletions(-) >> >> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c >> index dc6f711..b86c88b 100644 >> --- a/fs/quota/dquot.c >> +++ b/fs/quota/dquot.c >> @@ -96,13 +96,15 @@ >> * Note that some things (eg. sb pointer, type, id) doesn't change during >> * the life of the dquot structure and so needn't to be protected by a lock >> * >> - * Any operation working on dquots via inode pointers must hold dqptr_sem. If >> - * operation is just reading pointers from inode (or not using them at all) the >> - * read lock is enough. If pointers are altered function must hold write lock. >> + * Operation accessing dquots via inode pointers are protected by dquot_srcu. >> + * Operation of reading pointer needs srcu_read_lock(&dquot_srcu), and >> + * synchronize_srcu(&dquot_srcu) is called before clear pointers to avoid > This is not actually precise. It should be: > and synchronize_srcu(&dquot_srcu) is called after clearing pointers from > inode and before dropping dquot references to avoid use of dquots after > they are freed. > > Now that we have the rule spelled out exactly, I think we should update > what remove_inode_dquot_ref() does. It should do something like: > > if (list_empty(&dquot->dq_free)) { > spin_lock(&dq_list_lock); > /* > * The inode still has reference to dquot so it can't be in the > * free list > */ > list_add(&dquot->dq_free, tofree_head); > spin_unlock(&dq_list_lock); > } else { > /* > * Dquot is already in a list to put so we won't drop the last > * reference here. > */ > dqput(dquot); > } > > Although in practice this should be mostly the same as the current code > this makes it more obvious we keep one reference to each dquot from inodes > until after we call synchronize_srcu(). And you can make this change as a > separate patch before the dqptr_sem removal. I don't quite follow this: in which condition the dq_free is not empty? I think it could be that dquot has been put in tofree_head before, and it was found by dqget() and become inuse again, right? Then won't this race with drop_dquot_ref() -> put_dquot_list()? Actually, it looks to me that the old version of remove_inode_dquot_ref() has the same race. Did I miss anyting? My another concern is: in dqcache_shrink_scan(), we scan free_dquots list without holding the dq_list_lock, won't this race with dqget()/dqput()? >> + * use after free. dq_data_lock is used to serialize the pointer setting and >> + * clearing operations. >> * Special care needs to be taken about S_NOQUOTA inode flag (marking that >> * inode is a quota file). Functions adding pointers from inode to dquots have >> - * to check this flag under dqptr_sem and then (if S_NOQUOTA is not set) they >> - * have to do all pointer modifications before dropping dqptr_sem. This makes >> + * to check this flag under dq_data_lock and then (if S_NOQUOTA is not set) they >> + * have to do all pointer modifications before dropping dq_data_lock. This makes >> * sure they cannot race with quotaon which first sets S_NOQUOTA flag and >> * then drops all pointers to dquots from an inode. >> * > ... >> @@ -1485,12 +1473,13 @@ static void __dquot_drop(struct inode *inode) >> int cnt; >> struct dquot *put[MAXQUOTAS]; >> >> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem); >> + spin_lock(&dq_data_lock); >> for (cnt = 0; cnt < MAXQUOTAS; cnt++) { >> put[cnt] = inode->i_dquot[cnt]; >> inode->i_dquot[cnt] = NULL; >> } >> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem); >> + spin_unlock(&dq_data_lock); >> + synchronize_srcu(&dquot_srcu); >> dqput_all(put); >> } > You don't have to call sychronize_srcu() here. There can be no other > users of the inode when __dquot_drop() is called. So noone should be using > inode dquot pointers as well. Probably we should document this assumption > before dquot_drop(). > I'm fine to remove this and add comments before this fucntion, but I'm wondering that if it's safer to call an additional synchronize_srcu() here? (In case of someone use this function for other purpose in the future.) >> @@ -1868,12 +1847,12 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to) >> warn_from_inodes[cnt].w_type = QUOTA_NL_NOWARN; >> warn_from_space[cnt].w_type = QUOTA_NL_NOWARN; >> } >> - down_write(&sb_dqopt(inode->i_sb)->dqptr_sem); >> + >> + spin_lock(&dq_data_lock); >> if (IS_NOQUOTA(inode)) { /* File without quota accounting? */ >> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem); >> + spin_unlock(&dq_data_lock); >> return 0; >> } >> - spin_lock(&dq_data_lock); >> cur_space = inode_get_bytes(inode); >> rsv_space = inode_get_rsv_space(inode); >> space = cur_space + rsv_space; >> @@ -1927,7 +1906,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to) >> inode->i_dquot[cnt] = transfer_to[cnt]; >> } >> spin_unlock(&dq_data_lock); >> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem); >> >> mark_all_dquot_dirty(transfer_from); >> mark_all_dquot_dirty(transfer_to); >> @@ -1941,7 +1919,6 @@ int __dquot_transfer(struct inode *inode, struct dquot **transfer_to) >> return 0; >> over_quota: >> spin_unlock(&dq_data_lock); >> - up_write(&sb_dqopt(inode->i_sb)->dqptr_sem); >> flush_warnings(warn_to); >> return ret; > Hum, you are missing srcu protection in __dquot_transfer()... Now we are > holding extra dquot references here so we are fine but it really deserves a > comment somewhere in the header before the function. Yes, we are holding reference. I'll add comments to explain it. Thanks. > > Honza -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html