Hello Baokun! On Fri 16-06-23 16:56:08, Baokun Li wrote: > We ran into a problem that dqput() and dquot_mark_dquot_dirty() may race > like the function graph below, causing a released dquot to be added to the > dqi_dirty_list, and this leads to that dquot being released again in > dquot_writeback_dquots(), making two identical quotas in free_dquots. > > cpu1 cpu2 > _________________|_________________ > wb_do_writeback CHOWN(1) > ... > ext4_da_update_reserve_space > dquot_claim_block > ... > dquot_mark_dquot_dirty // try to dirty old quota > test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE > if (test_bit(DQ_MOD_B, &dquot->dq_flags)) > // test no dirty, wait dq_list_lock > ... > dquot_transfer > __dquot_transfer > dqput_all(transfer_from) // rls old dquot > dqput // last dqput > dquot_release > clear_bit(DQ_ACTIVE_B, &dquot->dq_flags) > atomic_dec(&dquot->dq_count) > put_dquot_last(dquot) > list_add_tail(&dquot->dq_free, &free_dquots) > // first add the dquot to free_dquots > if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags)) > add dqi_dirty_list // add freed dquot to dirty_list > P3: > ksys_sync > ... > dquot_writeback_dquots > WARN_ON(!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) > dqgrab(dquot) > WARN_ON_ONCE(!atomic_read(&dquot->dq_count)) > WARN_ON_ONCE(!test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) > dqput(dquot) > put_dquot_last(dquot) > list_add_tail(&dquot->dq_free, &free_dquots) > // Double add the dquot to free_dquots > > This causes a list_del corruption when removing the entry from free_dquots, > and even trying to free the dquot twice in dqcache_shrink_scan triggers a > use-after-free. > > A warning may also be triggered by a race like the function diagram below: > > cpu1 cpu2 cpu3 > ________________|_______________|________________ > wb_do_writeback CHOWN(1) QUOTASYNC(1) > ... ... > ext4_da_update_reserve_space > ... __dquot_transfer > dqput // last dqput > dquot_release > dquot_is_busy > if (test_bit(DQ_MOD_B, &dquot->dq_flags)) > // not dirty and still active > dquot_mark_dquot_dirty > if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags)) > add dqi_dirty_list > clear_bit(DQ_ACTIVE_B, &dquot->dq_flags) > dquot_writeback_dquots > WARN_ON(!test_bit(DQ_ACTIVE_B)) > > To solve this problem, it is similar to the way dqget() avoids racing with > dquot_release(). First set the DQ_MOD_B flag, then execute wait_on_dquot(), > after this we know that either dquot_release() is already finished or it > will be canceled due to DQ_MOD_B flag test, at this point if the quota is > DQ_ACTIVE_B, then we can safely add the dquot to the dqi_dirty_list, > otherwise clear the DQ_MOD_B flag and exit directly. > > Fixes: 4580b30ea887 ("quota: Do not dirty bad dquots") > Signed-off-by: Baokun Li <libaokun1@xxxxxxxxxx> > --- > > Hello Honza, > > This problem can also be solved by modifying the reference count mechanism, > where dquots hold a reference count after they are allocated until they are > destroyed, i.e. the dquots in the free_dquots list have dq_count == 1. This > allows us to reduce the reference count as soon as we enter the dqput(), > and then add the dquot to the dqi_dirty_list only when dq_count > 1. This > also prevents the dquot in the dqi_dirty_list from not having the > DQ_ACTIVE_B flag, but this is a more impactful modification, so we chose to > refer to dqget() to avoid racing with dquot_release(). If you prefer this > solution by modifying the dq_count mechanism, I would be happy to send > another version of the patch. The way this *should* work is that dquot_mark_dquot_dirty() using dquot references from the inode should be protected by dquot_srcu. quota_off code takes care to call synchronize_srcu(&dquot_srcu) to not drop dquot references while they are used by other users. But you are right dquot_transfer() breaks this assumption. Most callers are fine since they are also protected by inode->i_lock but still I'd prefer to fix dquot_transfer() to follow the guarantees dquot_srcu should provide. Now calling synchronize_srcu() directly from dquot_transfer() is too expensive (and mostly unnecessary) so what I would rather suggest is to create another dquot list (use dq_free list_head inside struct dquot for it) and add dquot whose last reference should be dropped there. We'd then queue work item which would call synchronize_srcu() and after that perform the final cleanup of all the dquots on the list. Now this also needs some modifications to dqget() and to quotaoff code to handle various races with the new dqput() code so if you feel it is too complex for your taste, I can implement this myself. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR