Re: [RFC][PATCH] Re: [BUG] ext4: cannot unfreeze a filesystem due to a deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 04/25/2011 09:28 AM, Toshiyuki Okajima wrote:
Hi.

On Sat, 23 Apr 2011 00:10:25 +0200
Jan Kara<jack@xxxxxxx>  wrote:
On Fri 22-04-11 15:58:39, Toshiyuki Okajima wrote:
I have confirmed that the following patch works fine while my or
Mizuma-san's reproducer is running. Therefore,
  we can block to write the data, which is mmapped to a file, into a disk
by a page-fault while fsfreezing.

I think this patch fixes the following two problems:
- A deadlock occurs between ext4_da_writepages() (called from
writeback_inodes_wb) and thaw_super(). (reported by Mizuma-san)
- We can also write the data, which is mmapped to a file,
   into a disk while fsfreezing (ext3/ext4).
                                        (reported by me)

Please examine this patch.
   Thanks for the patch. The ext3 part is not as easy as this. You cannot
really get i_alloc_sem in ext3_page_mkwrite() because mmap_sem is already
held by page fault code and i_alloc_sem should be acquired before it (yes I
know, ext4 already has this bug which should be fixed when I get to it).
Also you'll find that performance of random writers via mmap (which is
relatively common) is going to be rather bad with this patch (because the
file will be heavily fragmented). We have to be more clever which is
exactly why it's taking me so long with my patch :) But tests are already
running so if everything goes fine, I should have patches to submit next
week.
OK, I'll wait your patch. :)


The ext4 part looks correct. I'd just also like to have some comments about
how freeze handling is done because it's kind of subtle.

How about this?


We can have a race here too - since we are only checking if the F.S is in a frozen state or not at _that_ point. We are _not_ preventing a F.S freeze from happening _after_ this point. So here is what can happen:

Key:
(tx: time at xth unit)

Scenario:

Task 1: mmapped write - (case: page mapped to disk and is in page cache)
t1) ext4_page_mkwrite()
t2)   vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
t3) ---- Preempted ----


Task 2: Freeze Task
t4) freezes the super block...
...(continues)....
tn) the page cache is clean and the F.S is frozen. Freeze has completed execution.

Task1: mmapped write - (case: page mapped to disk and is in page cache)
tn+1)ext4_page_mkwrite() returns 0. The write to the mmapped page continues with writing to a page in the page cache when the F.S is frozen! So after the vfs_check_frozen() we _are_ susceptible to "dirtying the page cache when F.S is frozen"

In this case we are not protected by a transaction. Are we?

Warm Regards,
Surbhi.





Thanks,
Toshiyuki Okajima

----------------------------------------------------------------------------------------------------
Subject: [PATCH] ext4: prevent the mmapped page flushing to disk while fsfreezing

Signed-off-by: Toshiyuki Okajima<toshi.okajima@xxxxxxxxxxxxxx>
---
  fs/ext4/inode.c |   10 +++++++++-
  1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f2fa5e8..411b177 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5812,7 +5812,7 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
  	}
  	ret = 0;
  	if (PageMappedToDisk(page))
-		goto out_unlock;
+		goto out_frozen;

  	if (page->index == size>>  PAGE_CACHE_SHIFT)
  		len = size&  ~PAGE_CACHE_MASK;
@@ -5830,6 +5830,14 @@ int ext4_page_mkwrite(struct vm_area_struct *vma, struct vm_fault *vmf)
  		if (!walk_page_buffers(NULL, page_buffers(page), 0, len, NULL,
  					ext4_bh_unmapped)) {
  			unlock_page(page);
+out_frozen:
+			/*
+			 * We must wait here while the filesystem is being
+			 * frozen otherwise a flushing thread can write this
+			 * page to the disk (we can update the filesystem even
+			 * if it is frozen).
+			 */
+			vfs_check_frozen(inode->i_sb, SB_FREEZE_WRITE);
  			goto out_unlock;
  		}
  	}

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux