Re: [PATCH] ext4: handle NULL p_ext in ext4_ext_next_allocated_block()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 10 Oct 2011 08:28:23 -0700, Curt Wohlgemuth <curtw@xxxxxxxxxx> wrote:
> Hi Lukas:
> 
> On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@xxxxxxxxxx> wrote:
> > On Sat, 8 Oct 2011, Curt Wohlgemuth wrote:
> >
> >> In ext4_ext_next_allocated_block(), the path[depth] might
> >> have a p_ext that is NULL -- see ext4_ext_binsearch().  In
> >> such a case, dereferencing it will crash the machine.
> >>
> >> This patch checks for p_ext == NULL in
> >> ext4_ext_next_allocated_block() before dereferencinging it.
> >>
> >> Tested using a hand-crafted an inode with eh_entries == 0 in
> >> an extent block, verified that running FIEMAP on it crashes
> >> without this patch, works fine with it.
> >
> > Hi Curt,
> >
> > It seems to me that even that the patch fixes the NULL dereference, it
> > is not a complete solution. Is it possible that in "normal" case p_ext
> > would be NULL ? I think that this is only possible in extent split/add
> > case (as noted in ext4_ext_binsearch()) which should be atomic to the
> > other operations (locked i_mutex?).
> 
> Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL.
> 
> We've seen this problem during what appears to be a race between an
> inode growth (or truncate?) and another task doing a FIEMAP ioctl.
> The real problem is that FIEMAP handing in ext4 is just, well, buggy?
Wow, IMHO it not just buggy, it is obviously incorrect, IMHO it is more
fair just return -ENOTSUPP, at least it is much safer.
Yes calling FIEMAP and truncate/write in parallel is stupid, but not
prohibited. 
> 
> ext4_ext_walk_space() will get the i_data_sem, construct the path
> array, then release the semaphore.  But then it does a bazillion
> accesses on the extent/header/index pointers in the path array, with
> no protection against truncate, growth, or any other changes.  As far
> as I can tell, this is the only use of a path array retrieved from
> ext4_ext_find_extent() that isn't completely covered by i_data_sem.
In that case i_data sem protects us from nothing. Path collected can
simply disappear under us. And in fact i don't understand the
reason why we drop i_data_sem too soon. Are any reason to do that?
Seems like following patch should fix the issue.
>From 12cd56ccd86c4d132f186034a9c11b0a2441a19f Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
Date: Tue, 11 Oct 2011 10:44:31 +0400
Subject: [PATCH] ext4: do not drop i_data_sem too soon

Path returned from ext4_find_extent is valid only while we hold
i_data_sem, so we can drop it only after we nolonger need it.

Signed-off-by: Dmitry Monakhov <dmonakhov@xxxxxxxxxx>
---
 fs/ext4/extents.c |   20 +++++++++++---------
 1 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index da4583f..c716a1f 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1847,23 +1847,32 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
 		num = last - block;
 		/* find extent for this block */
 		down_read(&EXT4_I(inode)->i_data_sem);
+		if (ext_depth(inode) != depth) {
+			/* depth was changed. we have to realloc path */
+			kfree(path);
+			path = NULL;
+		}
+
 		path = ext4_ext_find_extent(inode, block, path);
-		up_read(&EXT4_I(inode)->i_data_sem);
 		if (IS_ERR(path)) {
 			err = PTR_ERR(path);
+			up_read(&EXT4_I(inode)->i_data_sem);
 			path = NULL;
 			break;
 		}
 
 		depth = ext_depth(inode);
 		if (unlikely(path[depth].p_hdr == NULL)) {
+			up_read(&EXT4_I(inode)->i_data_sem);
 			EXT4_ERROR_INODE(inode, "path[%d].p_hdr == NULL", depth);
 			err = -EIO;
 			break;
 		}
+
 		ex = path[depth].p_ext;
 		next = ext4_ext_next_allocated_block(path);
-
+		up_read(&EXT4_I(inode)->i_data_sem);
+		ext4_ext_drop_refs(path);
 		exists = 0;
 		if (!ex) {
 			/* there is no extent yet, so try to allocate
@@ -1915,7 +1924,6 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
 			break;
 		}
 		err = func(inode, next, &cbex, ex, cbdata);
-		ext4_ext_drop_refs(path);
 
 		if (err < 0)
 			break;
@@ -1927,12 +1935,6 @@ static int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
 			break;
 		}
 
-		if (ext_depth(inode) != depth) {
-			/* depth was changed. we have to realloc path */
-			kfree(path);
-			path = NULL;
-		}
-
 		block = cbex.ec_block + cbex.ec_len;
 	}
 
-- 
1.7.1


> 
> > However this seems like an inode corruption so we should probably be
> > more verbose about it and print an appropriate EXT4_ERROR_INODE() or
> > even better check for the corrupted tree in the ext4_ext_find_extent()
> > (instead in ext4_ext_map_blocks()), however this will need to distinguish
> > between the normal and the tree modification case.
> 
> What we've observed many times is a crash during a FIEMAP call to
> ext4_ext_next_allocated_block(), which appears to me to be during a
> race with another thread that's splitting the extent tree.  This
> causes the machine to go down with the inode in a bad state.  But of
> course, fsck won't detect and fix this, so when the machine comes back
> up, and a FIEMAP call is done on this same inode -- without any other
> threads -- it'll crash again.  Hence a nasty crash loop.
> 
> So you're right, in that this isn't the "real solution."  But devising
> a safe, non-racy design for FIEMAP is not so simple, unless of course
> you want to just hold the i_data_sem during the entire loop body of
> ext4_ext_walk_space(), which would be pretty ugly.  Hence the
> "band-aid" approach in my patch, which at least seems correct, if not
> thorough.
> 
> Thanks,
> Curt
> 
> >
> > Thanks!
> > -Lukas
> >
> >>
> >> Signed-off-by: Curt Wohlgemuth <curtw@xxxxxxxxxx>
> >> ---
> >>  fs/ext4/extents.c |    4 +++-
> >>  1 files changed, 3 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> >> index 57cf568..063a5b8 100644
> >> --- a/fs/ext4/extents.c
> >> +++ b/fs/ext4/extents.c
> >> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
> >>       while (depth >= 0) {
> >>               if (depth == path->p_depth) {
> >>                       /* leaf */
> >> -                     if (path[depth].p_ext !=
> >> +                     /* p_ext can be NULL */
> >> +                     if (path[depth].p_ext &&
> >> +                             path[depth].p_ext !=
> >>                                       EXT_LAST_EXTENT(path[depth].p_hdr))
> >>                         return le32_to_cpu(path[depth].p_ext[1].ee_block);
> >>               } else {
> >>
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux