On Thu, 30 Jan 2014 11:05:35 +0100, Jan Kara <jack@xxxxxxx> wrote: > On Thu 30-01-14 11:51:20, Dmitry Monakhov wrote: > > > > B) Reduce fsck time. Theodore Tso have announced initiative to implement > > > > ffck for ext4 [3]. I want to discuss perspectives of design and > > > > implementation online fsck for ext4. > > > Well, this comes up every once in a while and the answer is always the > > > same. Checking might be reasonably doable but comes almost for free when > > > using LVM snapshots and doing fsck on the snapshot. Fixing read-write > > > filesystem - good luck. > > But. What what about merging data from fixed snapshot back to original image? > > > > ---time-axis-------------------------------------------------> > > FS0----[Error]---[write-new-data]----------------->X???? > > | | > > FS0-snap \-----[start fsck]-----[errors corrected]-/ > > Obviously there are no way how we can merge fixed snapshot to modified filesystem > Yes, snapshots are good only for read-only checks. If they find errors, > you have to bite the bullet, unmount the fs and run fsck. However fsck > finding errors should be rare enough, or do you have other experience? Well, most of errors we observed was caused by instability in block-layer. But we have faced law of large numbers effect, in our case each HW node has 100-1000 containers, each container has didicated fsimage so number of errors are not neglectable. > > > So the only option we have after we have discovered error on FS0-snap is > > to umount FS0 and run fsck on it. As result we double disk load, and > > still have big downtime, but what if error was relatively simple (wrong > > group stats, or wrong i_blocks for inode) it is possible to fix it > > online. My proposal is to start a discussion about list issues which can be > > fixed online. > The trouble is that to reliably check even such simple thing as group > stats or i_blocks, you have to freeze all modifications to the group / > inode, make kernel flush all its internal state for these objects, check + > fix them, make kernel reread the new info, and unfreeze these objects. So a > lot of work for even the simplest fixes and it's not clear to me why people > should hit fs corruption often enough to warrant the complications. > > There are also other guys who want to be able to make some groups not > available for allocation so if we spot some inconsistency in group metadata, > we simply won't do allocation from it anymore and then run fsck to fix the > damage during scheduled downtime. That is much easier to implement and > approach like this should go a long way towards making corrupted filesystem > still usable. That looks reasonable. > > Honza > -- > Jan Kara <jack@xxxxxxx> > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html