On Tue 25-08-20 10:10:20, Theodore Y. Ts'o wrote: > (Adding the OCFS2 maintainers, since my possibly insane idea proposed > below would definitely impact them!) > > On Tue, Aug 25, 2020 at 01:16:16PM +0100, Christoph Hellwig wrote: > > On Tue, Aug 25, 2020 at 02:05:54PM +0200, Jan Kara wrote: > > > Discarding blocks and buffers under a mounted filesystem is hardly > > > anything admin wants to do. Usually it will confuse the filesystem and > > > sometimes the loss of buffer_head state (including b_private field) can > > > even cause crashes like: > > > > Doesn't work if the file system uses multiple devices. I think we > > just really need to split the fs buffer_head address space from the > > block device one. Everything else is just going to cause a huge mess. > > I wonder if we should go a step further, and stop using struct > buffer_head altogether in jbd2 and ext4 (as well as ocfs2). What about the cache coherency issues I've pointed out in my reply to Christoph? > This would involve moving whatever structure elements from the > buffer_head struct into journal_head, and manage writeback and reads > requests directly in jbd2. This would allow us to get detailed write > errors back, which is currently not possible from the buffer_head > infrastructure. > > The downside is this would be a pretty massive change in terms of LOC, > since we use struct buffer_head in a *huge* number of places. If > we're careful, most of it could be handled by a Coccinelle script to > rename "struct buffer_head" to "struct journal_head". Fortunately, we > don't actually use that much of the fs/buffer_head functions in > fs/{ext4,ocfs2}/*.c. > > One potentially tricky bit is that ocfs2 hasn't been converted to > using iomap, so it's still using __blockdev_direct_IO. So it's data > blocks for DIO would still have to use struct buffer_head (which means > the Coccinelle script won't really work for fs/ocfs2, without a lot of > manual rework) --- or ocfs2 would have to switched to use iomap at > least for DIO support. > > What do folks think? Otherwise yes, this would be doable although pretty invasive as you mention. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR