On Tue, Sep 19, 2023 at 06:17:21AM +0100, Matthew Wilcox wrote: > On Tue, Sep 19, 2023 at 11:15:54AM +1000, Dave Chinner wrote: > > This was easy to do with iomap based filesystems because they don't > > carry per-block filesystem structures for every folio cached in page > > cache - we carry a single object per folio that holds the 2 bits of > > per-filesystem block state we need for each block the folio maps. > > Compare that to a bufferhead - it uses 56 bytes of memory per > > fielsystem block that is cached. > > 56?1 What kind of config do you have? It's 104 bytes on Debian: > buffer_head 936 1092 104 39 1 : tunables 0 0 0 : slabdata 28 28 0 > > Maybe you were looking at a 32-bit system; most of the elements are > word-sized (pointers, size_t or long) Perhaps so, it's been years since I actually paid attention to the exact size of a bufferhead (XFS completely moved away from them back in 2018). Regardless, underestimating the size of the bufferhead doesn't materially change the reasons iomap is a better choice for filesystems running on modern storage hardware... > > So we have to consider that maybe it is less work to make high-order > > folios work with bufferheads. And that's where we start to get into > > the maintenance problems with old filesysetms using bufferheads - > > how do we ensure that the changes for high-order folio support in > > bufferheads does not break the way one of these old filesystems > > that use bufferheads? > > I don't think we can do it. Regardless of the question you're proposing > here, the model where we complete a BIO, then walk every buffer_head > attached to the folio to determine if we can now mark the folio as being > (uptodate / not-under-writeback) just doesn't scale when you attach more > than tens of BHs to the folio. It's one bit per BH rather than having > a summary bitmap like iomap has. *nod* I said as much earlier in the email: "The pointer chasing model per-block bufferhead iteration requires to update state and retrieve mapping information just does not scale to marshalling millions of objects a second through the page cache." > I have been thinking about spitting the BH into two pieces, something > like this: > > struct buffer_head_head { > spinlock_t b_lock; > struct buffer_head *buffers; > unsigned long state[]; > }; > > and remove BH_Uptodate and BH_Dirty in favour of setting bits in state > like iomap does. Yes, that woudl make it similar to the way iomap works, but I think that then creates more problems in that bufferhead state is used for per-block locking and blocking waits. I don't really want to think about much more how complex stuff like __block_write_full_folio() becomes with this model... > But, as you say, there are a lot of filesystems that would need to be > audited and probably modified. Yes, this is the common problem all these "modernise old API" ideas end up at - this is the primary issue that needs to be sorted out, and we're no closer to that now than when the thread started. We can deal with this problem for filesystems that we can test. For stuff we can't test and verify, then we really have to start considering the larger picture around shipping unverified code to users. Go read this article on LWN about new EU laws for software development that aren't that far off being passed into law: https://lwn.net/Articles/944300/ And it's clear that there are also current policy discussions going through the US federal government that are, most likely, going to end up in a similar place with respect to secure development practices for critical software infrastructure like the Linux kernel. Now combine that with this one about the problem of bogus CVEs (which could have been written about syzbot and filesystems!): https://lwn.net/Articles/944209/ And it's pretty clear that the current issues with unmaintained code will only get worse from here. All it will take is a CVE to be issued on one of these unmaintained filesystems, and the safest thing for us to do will be to remove the code to remove all potential liability for it. The basic message is that we aren't going to be able to ignore code that we can't substantially verify for much longer. We simply won't have a choice about the code we ship: if is not testable and verified to the best of our abilities then nobody will risk shipping it regardless of whether they have users or not. That's the model the cybersecurity-industrial complex is pushing us towards whether we like it or not. If this is the future in which we develop software, then this has substantial impact on any discussion about how to manage old unmaintained, untestable code in any project we work on, not just the Linux kernel... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx