Re: More ext4 acl/xattr corruption - 4th occurence now

Theodore Tso <tytso@xxxxxxx> · Fri, 15 May 2009 06:11:46 -0400

On Fri, May 15, 2009 at 10:25:08AM +0530, Aneesh Kumar K.V wrote:
> > commit 039ed7a483fdcb2dbbc29f00cd0d74c101ab14c5
> > Author: Theodore Ts'o <tytso@xxxxxxx>
> > Date:   Thu May 14 17:09:37 2009 -0400
> > 
> >     ext4: Fix race in ext4_inode_info.i_cached_extent
> >     
> >     If one CPU is reading from a file while another CPU is writing to the
> >     same file different locations, there is nothing protecting the
> >     i_cached_extent structure from being used and updated at the same
> >     time.  This could potentially cause the wrong location on disk to be
> >     read or written to, including potentially causing the corruption of
> >     the block group descriptors and/or inode table.
> 
> 
> It should be multiple readers. We don't allow read/write or multiple
> writers via ext4_ext_get_blocks. &EXT4_I(inode)->i_data_sem is supposed
> to protect read/write and multiple writers. What it allowed was
> multiple readers(get_block call with create = 0). And readers did cache
> the extent information which it read from the disk. So the fix is
> correct, but we need to update the commit message.

Yes, you're right.  The i_data_sem would have protected us from the
worst of these problems.  The commit message isn't totally wrong,
though, since even a writer will call ext4_ext_get_blocks() with
create == 0.  But we should describe the failure mode much more
accurately, and say that it is two callers of ext4_ext_get_blocks()
with create=0 racing against one another.  It's possible, perhaps even
likely, that one of those callers came from a call of
ext4_get_blocks() with create=1 --- given that the people who were
reporting this were doing so when doing backup jobs.

However, one could imagine multiple mysql (or DB2) threads writing
into random parts of a database, and if two CPU's try to bmap already
allocated (and initialized) blocks out of the database file, so that
ext4_ext_get_blocks() is always being called with create=0, you could
very easily end up with a situation where blocks would be written to
the wrong location on disk.  <<shiver>>

We should make sure that all of the enterprise distributions that are
shipping ext4 as a "technology preview" get a copy of this patch as a
fixing a high-priority bug, since a database workload should trip this
pretty easily.

							- Ted
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html