RE: [LSF/MM TOPIC] online filesystem repair

Slava Dubeyko <Vyacheslav.Dubeyko@xxxxxxx> · Wed, 18 Jan 2017 00:37:02 +0000

-----Original Message-----
From: Darrick J. Wong [mailto:darrick.wong@xxxxxxxxxx] 
Sent: Monday, January 16, 2017 10:25 PM
To: Viacheslav Dubeyko <slava@xxxxxxxxxxx>
Cc: lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; linux-xfs@xxxxxxxxxxxxxxx; Slava Dubeyko <Vyacheslav.Dubeyko@xxxxxxx>
Subject: Re: [LSF/MM TOPIC] online filesystem repair

> > How do you imagine a generic way to support repairs for different file 
> > systems? From one point of view, to have generic way of the online 
> > file system repairing could be the really great subsystem.
>
> I don't, sadly.  There's not even a way to /check/ all fs metadata
> in a "generic" manner -- we can use the standard VFS interfaces to read
> all metadata, but this is fraught.  Even if we assume the fs can spot check obviously
> garbage values, that's still not the appropriate place for a full scan.

Let's try to imagine a possible way of generalization. I can see such critical points:
(1) mount operation;
(2) unmount/fsync operation;
(3) readpage;
(4) writepage;
(5) read metadata block/node;
(6) write/flush metadata block/node.
(7) metadata's item modification/access.

Let's imagine that file system will register every metadata structure in generic
online file checking subsystem. Then the file system will need to register some
set of checking methods or checking events for every registered metadata
structure. For example:
(1) check_access_metadata();
(2) check_metadata_modification();
(3) check_metadata_node();
(4) check_metadata_node_flush();
(5) check_metadata_nodes_relation().

I think that it is possible to consider several possible level of generic online
file system checking subsystem's activity: (1) light check mode;
(2) regular check mode; (3) strict check mode.

The "light check mode" can  be resulted in "fast" metadata nodes' check on write operation with
generation of error messages in the syslog with the request to check/recover
file system volume by means of fsck tool.

The "regular check mode" can be resulted in: (1) the checking of any metadata modification
with trying to correct the operation in the modification place; (2) metadata nodes' check
on write operation with generation of error messages in the syslog. 

The "strict check mode" can be resulted in: (1) check mount operation with trying to recover
the affected metadata structures; (2) the checking of any metadata modification
with trying to correct the operation in the modification place; (3) check and recover
metadata nodes on flush operation; (4) check/recover during unmount operation.

What do you like to expose to VFS level as generalized methods for your implementation?

> > But, from another point of view, every file system has own 
> > architecture, own set of metadata and own way to do fsck 
> > check/recovering.
>
> Yes, and this wouldn't change.  The particular mechanism of fixing a piece of
> metadata will always be fs-dependent, but the thing that I'm interested in
> discussing is how do we avoid having these kinds of things interact badly with the VFS?

Let's start from the simplest case. You have the current implementation.
How do you see the way to delegate to VFS some activity in your implementation
in the form of generalized methods? Let's imagine that VFS will have some callbacks
from file system side. What could it be?

> > As far as I can judge, there are significant amount of research 
> > efforts in this direction (Recon [1], [2], for example).
>
> Yes, I remember Recon.  I appreciated the insight that while it's impossible
> to block everything for a full scan, it /is/ possible to check a single object and
> its relation to other metadata items.  The xfs scrubber also takes an incremental
> approach to verifying a filesystem; we'll lock each metadata object and verify that
> its relationships with the other metadata make sense.  So long as we aren't bombarding
> the fs with heavy metadata update workloads, of course.
>
> On the repair side of things xfs added reverse-mapping records, which the repair code
> uses to regenerate damaged primary metadata.  After we land inode parent pointers
> we'll be able to do the same reconstructions that we can now do for block allocations...
>
> ...but there are some sticky problems with repairing the reverse mappings.
> The normal locking order for that part of xfs is sb_writers
> -> inode -> ag header -> rmap btree blocks, but to repair we have to
> freeze the filesystem against writes so that we can scan all the inodes.

Yes, the necessary freezing of file system is really tricky point. From one point of view,
it is possible to use "light checking mode" that will simply check and complain
about possible troubles at proper time (maybe with remount in RO mode).
Otherwise, from another point of view, we need in special file system architecture
or/and special way of VFS functioning. Let's imagine that file system volume will
be split on some groups/aggregations/objects with dedicated metadata. Then, theoretically,
VFS is able to freeze such group/aggregation/object for check and recovering
without affection the availability of the whole file system volume. It means that
file system operations should be redirected into active (not frozen) groups/aggregations/objects.

Thanks,
Vyacheslav Dubeyko.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html