Re: [RFC] ext4_bmap() may return blocks outside filesystem

Ric Wheeler <rwheeler@xxxxxxxxxx> · Thu, 05 Feb 2009 10:39:59 -0500

Greg Freemyer wrote:
On Thu, Feb 5, 2009 at 8:49 AM, Theodore Tso <tytso@xxxxxxx> wrote:

On Thu, Feb 05, 2009 at 01:03:23PM +0100, Thiemo Nagel wrote:

But there also are cases which are not handled gracefully by bmap() callers.

I've attached a conceptual patch against 2.6.29-rc2 which fixes one case
in which invalid block numbers are returned (there might be more) by
adding sanity checks to ext4_ext_find_extent(), but before I start
looking for further occurences, I'd like to ask whether you think my
approach is reasonable.

Yes, it's reasonable; the right thing is not just to jump out to
errout, though, but to call ext4_error() first, since the filesystem is
clearly corrupted, so we want to mark the filesystem as needing to be
fsck'ed, and so if the filesystem is marked "remount readonly" or
"panic" on filesystem errors, that the right thing happens.  We should
also log the device name, inode number and logical block number that
was requested, so that someone who is looking in the console logs can
see what was going on at the time.

As an unrelated patch, might also want to put a check in
fs/ext4/inode.c's ext4_get_branch(), so we can equivalently detect
bogus direct/indirect blocks and flag them with the appropriate
errors.

                                               - Ted

This is just a rant, and I doubt anyone can do anything about it, but
it is still worth reading imho.

<rant>
This brings up a concern I have with the proposed Thin Provisioning
updates to the SCSI and ATA specs.

As I'm sure most know, both are looking at supporting the concept of
mapped / unmapped sectors being tracked not only in the filesystem but
also in the storage device.

[SSDs are one use case, and  storage arrays are the other.  Many
storage arrays already support thin provisioning but not via the new
"discard" functionality in the linux kernel.]

My big concern is that neither is proposing a way for a tool like fsck
to query the storage device to verify the filesystem's view of what is
mapped vs unmapped agrees with the storage devices view.

I think that from a file system point of view (including tools like 
fsck), that is a feature, not a bug. The features should be, if done 
right, invisible to us and this should be irrelevant to fsck .....

For both sets of proposed spec updates there are circumstances where
the storage device spec allows garbage to be returned for non-mapped
sectors.  Thus in the situation of a corrupt filesystem, it is very
possible that some of the sectors that the filesystem is relying on
are actually unmapped and potentially garbage.

Lacking any knowledge of which specific sectors the underlying storage
systems treats as reliable vs. unreliable, I can imagine the
filesystem corruption will go from a correctable situation to a
"restore from backups" situation.

I disagree - any written data, specifically all meta-data, will have the 
correct data returned on read. All unmapped data is also by definition 
un-allocated at the fs layer (for fsck as well) and we should not be 
reading it back if the tools work correctly.

The solution in my mind is that both specs add a way for diagnostic
tools to query the status of a sector to see if it is mapped vs
unmapped, etc.
</rant>.

Greg

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html