On 2020/6/12 0:55, Theodore Y. Ts'o wrote: > On Thu, Jun 11, 2020 at 10:21:03AM +0200, Jan Kara wrote: >>> I have thought about this solution, we could add a hook in 'struct super_operations' >>> and call it in blkdev_writepage() like blkdev_releasepage() does, and pick out a >>> wrapper from block_write_full_page() to pass our endio handler in, something like >>> this. >>> >>> static const struct super_operations ext4_sops = { >>> ... >>> .bdev_write_page = ext4_bdev_write_page, >>> ... >>> }; >>> >>> static int blkdev_writepage(struct page *page, struct writeback_control *wbc) >>> { >>> struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super; >>> >>> if (super && super->s_op->bdev_write_page) >>> return super->s_op->bdev_write_page(page, blkdev_get_block, wbc); >>> >>> return block_write_full_page(page, blkdev_get_block, wbc); >>> } >>> >>> But I'm not sure it's a optimal ieda. So I continue to realize the "wb_err" >>> solution now ? >> >> The above idea looks good to me. I'm fine with either that solution or >> "wb_err" idea so maybe let's leave it for Ted to decide... > > My preference would be to be able to get the (error from the callback > right away. My reasoning behind that is (a) it allows the file system > to be notified about the problem right away, (b) in the case of a file > system resize, we _really_ want to know about the failure ASAP, so we > can fail the resize before we start allocating inodes and blocks to > use the new space, and (c) over time, we might be able to add some > more intelligence handling of some write errors. > > For example, we already have a way of handling CRC errors when we are > reading an allocation bitmap; we simply avoid allocating blocks and > inodes from that blockgroup. Over time, we could theoretically do > other things to try to recover from some write errors --- for example, > we could try allocating a new block for an extent tree block, and try > writing it, and if that succeeds, updating its parent node to point at > the new location. Is it worth it to try to add that kind of > complexity? I'm really not sure; at the end of the day, it might be > simpler to just call ext4_error() and abort using the entire file > system until a system administrator can sort out the mess. But I > think (a) and (b) are still reasons for doing this by intercepting the > writeback error from the buffer head. > Yeah, it make sense to me, I will realize this callback solution. Thanks, Yi.