Re: deadlocks with fs quotas

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 27, 2009 at 11:31 AM, Jan Kara<jack@xxxxxxx> wrote:
>  Hello,
>
> On Fri 26-06-09 18:05:15, Jiaying Zhang wrote:
>> I am looking at the fs quota code while debugging a deadlock problem
>> on ext2 file system. I found there is a potential deadlock between quotaon
>> and quotaoff (or quotasync). Basically, all of quotactl operations need to
>> be protected by dqonoff_mutex. vfs_quota_off and vfs_quota_sync also call
>> sb->s_op->quota_write that needs to grab the i_mutex of the quota file.
>> But in vfs_quota_on_inode (called from quotaon operation), the current
>> code tries to grab  the i_mutex of the quota file first before getting
>> quonoff_mutex.
>>
>> Here is a simple test to show the problem:
>> $ while true; do quotaon /dev/hda >&/dev/null; usleep $RANDOM; done &
>> $ while true; do quotaoff /dev/hda >&/dev/null; usleep $RANDOM; done &
>>
>> After running for a while, the two processes get deadlocked.
>> Below is my proposed change to fix the problem:
>  Thanks for the analysis and the patch. You are right, it is a bug and your
> fix seems to be correct. Only, we should also use I_MUTEX_QUOTA for
> acquiring i_mutex to be consistent with other places (and silence lockdep).
>
>> Index: git-linux/fs/quota/dquot.c
>> ===================================================================
>> --- git-linux.orig/fs/quota/dquot.c     2009-05-20 18:05:55.000000000 -0700
>> +++ git-linux/fs/quota/dquot.c  2009-06-26 17:57:04.000000000 -0700
>> @@ -2042,8 +2042,8 @@
>>                  * changes */
>>                 invalidate_bdev(sb->s_bdev);
>>         }
>> -       mutex_lock(&inode->i_mutex);
>>         mutex_lock(&dqopt->dqonoff_mutex);
>> +       mutex_lock(&inode->i_mutex);
>>         if (sb_has_quota_loaded(sb, type)) {
>>                 error = -EBUSY;
>>                 goto out_lock;
>> @@ -2094,7 +2094,6 @@
>>         dqopt->files[type] = NULL;
>>         iput(inode);
>>  out_lock:
>> -       mutex_unlock(&dqopt->dqonoff_mutex);
>>         if (oldflags != -1) {
>>                 down_write(&dqopt->dqptr_sem);
>>                 /* Set the flags back (in the case of accidental quotaon()
>> @@ -2104,6 +2103,7 @@
>>                 up_write(&dqopt->dqptr_sem);
>>         }
>>         mutex_unlock(&inode->i_mutex);
>> +       mutex_unlock(&dqopt->dqonoff_mutex);
>>  out_fmt:
>>         put_quota_format(fmt);
>>
>>
>> Also while debugging the problem, I found the following code path:
>>
>> shrink_icache_memory -> prune_icache (grab iprune_mutex) -> dispose_list ->
>> clear_inode -> dquot_drop -> dqput -> dquot_release ->
>> dqopt->ops[dquot->dq_type]->write_file_info ->
>> sb->s_op->quota_write (i.e., ext2_quota_write for ext2)
>  Yes, I'm aware of this.
>
>> AFAICT, it seems very deadlock prone that the quota system tries to write
>> the quota file while clearing an inode. The ext2_quota_write calls
>> ext2_get_block that may block at alloc_page that in turn may try to call
>> shrink_icache_memory when the system is low in memory. But the
>> iprune_mutex is already hold by the process so the system will get
>> into deadlock here. I haven't got a test case that shows the deadlock but
>> want to raise this issue here first to see if I have missed anything.
>  I guess, you mean page allocation from sb_bread() or similar functions...
> The point is that all these functions should perform allocations with GFP_FS
> flag cleared and thus we can never reenter the inode pruning (or any other
> filesystem) code.

I see. We are using GFP_NOFS so the checking for (gfp_mask & __GFP_FS)
in shrinkg_icache_memory fails and it stops there. Thanks for pointing this out.

Jiaying

>
>                                                                        Honza
>
> PS: I'm going on vacation for a week, I'll merge your patch when I return.
> --
> Jan Kara <jack@xxxxxxx>
> SUSE Labs, CR
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux