Re: [PATCH v1 0/5] ext4: Shut down block groups when damage is detected

Zheng Liu <gnehzuil.liu@xxxxxxxxx> · Tue, 30 Jul 2013 08:31:09 +0800

Hi Jeff,

On Mon, Jul 29, 2013 at 11:28:38AM -0400, Jeff Moyer wrote:
> Zheng Liu <gnehzuil.liu@xxxxxxxxx> writes:
> 
> > My idea is to let file system can ignore the currurted block.  Namely,
> > when we meet a currupted block, we will track it as bad block in bad
> > block inode and find another block to save data.  This currupted block
> > will never be used.  The first step in my mind is to detect a currpted
> > block and mark it as bad block.  After reading the thread and Darrick's
> > original patch, I think Darrick's patch is a good start.
> 
> I think it's important to call out the exact failure scenario you're
> trying to address.  For hard disks, if you get a read error, it can
> typically be recovered by re-writing the block.  I imagine this is what
> fsck would be doing for metadata repair.  So, I'm not at all sure why
> you'd want to track bad blocks in the file system itself.  Could you
> elaborate, please?

In our product system at Taobao, we have a large CDN system around the
country.  These servers cache the most of web pages, images, etc....
These servers have some disks, and the disk must break down at some
time.  Now we need to umount this disk, and the whole disk just be left
in server until the whole server is dropped.  But as you have pointed
out, when we meet a disk failure, the whole disk might still works.  So
we hope that the file system could track the bad block, doesn't allocate
them, and the rest of spaces also can be used.  This can help us to
reduce the cost.

As you said above, some faliure scenarios are hard to be addressed.
E.g., we couldn't read any data from the disk.  But most scenarios are
that the disk just has some bad sectors.  So that would be great if the
disk still can be used.  In addition, we don't care about whether fsck
can fix these bad blocks because we don't want to reboot the server.  As
I describe before, these servers are as a cache of web site.  If they
are rebooted, they must take some time to preload the content from the
other servers and can not provide service.  This is not better than what
we do now (umount the disk).

Certainly, this might makes no sense to SSD/Flash device because when we
get an error from these devices, it is possible that they couldn't be
used.

Regards,
                                                - Zheng
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html