On Thu, Jun 11, 2020 at 10:21:03AM +0200, Jan Kara wrote: > > I have thought about this solution, we could add a hook in 'struct super_operations' > > and call it in blkdev_writepage() like blkdev_releasepage() does, and pick out a > > wrapper from block_write_full_page() to pass our endio handler in, something like > > this. > > > > static const struct super_operations ext4_sops = { > > ... > > .bdev_write_page = ext4_bdev_write_page, > > ... > > }; > > > > static int blkdev_writepage(struct page *page, struct writeback_control *wbc) > > { > > struct super_block *super = BDEV_I(page->mapping->host)->bdev.bd_super; > > > > if (super && super->s_op->bdev_write_page) > > return super->s_op->bdev_write_page(page, blkdev_get_block, wbc); > > > > return block_write_full_page(page, blkdev_get_block, wbc); > > } > > > > But I'm not sure it's a optimal ieda. So I continue to realize the "wb_err" > > solution now ? > > The above idea looks good to me. I'm fine with either that solution or > "wb_err" idea so maybe let's leave it for Ted to decide... My preference would be to be able to get the (error from the callback right away. My reasoning behind that is (a) it allows the file system to be notified about the problem right away, (b) in the case of a file system resize, we _really_ want to know about the failure ASAP, so we can fail the resize before we start allocating inodes and blocks to use the new space, and (c) over time, we might be able to add some more intelligence handling of some write errors. For example, we already have a way of handling CRC errors when we are reading an allocation bitmap; we simply avoid allocating blocks and inodes from that blockgroup. Over time, we could theoretically do other things to try to recover from some write errors --- for example, we could try allocating a new block for an extent tree block, and try writing it, and if that succeeds, updating its parent node to point at the new location. Is it worth it to try to add that kind of complexity? I'm really not sure; at the end of the day, it might be simpler to just call ext4_error() and abort using the entire file system until a system administrator can sort out the mess. But I think (a) and (b) are still reasons for doing this by intercepting the writeback error from the buffer head. Cheers, - Ted