Hi Lukas: On Mon, Oct 10, 2011 at 12:19 AM, Lukas Czerner <lczerner@xxxxxxxxxx> wrote: > On Sat, 8 Oct 2011, Curt Wohlgemuth wrote: > >> In ext4_ext_next_allocated_block(), the path[depth] might >> have a p_ext that is NULL -- see ext4_ext_binsearch(). In >> such a case, dereferencing it will crash the machine. >> >> This patch checks for p_ext == NULL in >> ext4_ext_next_allocated_block() before dereferencinging it. >> >> Tested using a hand-crafted an inode with eh_entries == 0 in >> an extent block, verified that running FIEMAP on it crashes >> without this patch, works fine with it. > > Hi Curt, > > It seems to me that even that the patch fixes the NULL dereference, it > is not a complete solution. Is it possible that in "normal" case p_ext > would be NULL ? I think that this is only possible in extent split/add > case (as noted in ext4_ext_binsearch()) which should be atomic to the > other operations (locked i_mutex?). Yes, unfortunately, it is possible in "normal" cases for p_ext to be NULL. We've seen this problem during what appears to be a race between an inode growth (or truncate?) and another task doing a FIEMAP ioctl. The real problem is that FIEMAP handing in ext4 is just, well, buggy? ext4_ext_walk_space() will get the i_data_sem, construct the path array, then release the semaphore. But then it does a bazillion accesses on the extent/header/index pointers in the path array, with no protection against truncate, growth, or any other changes. As far as I can tell, this is the only use of a path array retrieved from ext4_ext_find_extent() that isn't completely covered by i_data_sem. > However this seems like an inode corruption so we should probably be > more verbose about it and print an appropriate EXT4_ERROR_INODE() or > even better check for the corrupted tree in the ext4_ext_find_extent() > (instead in ext4_ext_map_blocks()), however this will need to distinguish > between the normal and the tree modification case. What we've observed many times is a crash during a FIEMAP call to ext4_ext_next_allocated_block(), which appears to me to be during a race with another thread that's splitting the extent tree. This causes the machine to go down with the inode in a bad state. But of course, fsck won't detect and fix this, so when the machine comes back up, and a FIEMAP call is done on this same inode -- without any other threads -- it'll crash again. Hence a nasty crash loop. So you're right, in that this isn't the "real solution." But devising a safe, non-racy design for FIEMAP is not so simple, unless of course you want to just hold the i_data_sem during the entire loop body of ext4_ext_walk_space(), which would be pretty ugly. Hence the "band-aid" approach in my patch, which at least seems correct, if not thorough. Thanks, Curt > > Thanks! > -Lukas > >> >> Signed-off-by: Curt Wohlgemuth <curtw@xxxxxxxxxx> >> --- >> fs/ext4/extents.c | 4 +++- >> 1 files changed, 3 insertions(+), 1 deletions(-) >> >> diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c >> index 57cf568..063a5b8 100644 >> --- a/fs/ext4/extents.c >> +++ b/fs/ext4/extents.c >> @@ -1395,7 +1395,9 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path) >> while (depth >= 0) { >> if (depth == path->p_depth) { >> /* leaf */ >> - if (path[depth].p_ext != >> + /* p_ext can be NULL */ >> + if (path[depth].p_ext && >> + path[depth].p_ext != >> EXT_LAST_EXTENT(path[depth].p_hdr)) >> return le32_to_cpu(path[depth].p_ext[1].ee_block); >> } else { >> > -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html