Hello, I've recently gotten annoyed with the current reference counting rules that exist in the file system arena, specifically this pattern of having 0 referenced objects that indicate that they're ready to be reclaimed. This pattern consistently bites us in the ass, is error prone, gives us a lot of complicated logic around when an object is actually allowed to be touched versus when it is not. We do this everywhere, with inodes, dentries, and folios, but I specifically went to change inodes recently thinking it would be the easiest, and I've run into a few big questions. Currently I've got about ~30 patches, and that is mostly just modifying the existing file systems for a new inode_operation. Before I devote more time to this silly path, I figured it'd be good to bring it up to the group to get some input on what possible better solutions there would be. I'll try to make this as easy to follow as possible, but I spent a full day and a half writing code and thinking about this and it's kind of complicated. I'll break this up into sections to try and make it easier to digest. WHAT DO I WANT I want to have refcount 0 == we're freeing the object. This will give us clear "I'm using this object, thus I have a reference count on it" rules, and we can (hopefully) eliminate a lot of the complicated freeing logic (I_FREEING | I_WILL_FREE). HOW DO I WANT TO DO THIS Well obviously we keep a reference count always whenever we are using the inode, and we hold a reference when it is on a list. This means the i_io_list holds a reference to the inode, that means the LRU list holds a reference to the inode. This makes LRU handling easier, we just walk the objects and drop our reference to the object. If it was truly the last reference then we free it, otherwise it will get added back onto the LRU list when the next guy does an iput(). POTENTIAL PROBLEM #1 Now we're actively checking to see if this inode is on the LRU list and potentially taking the lru list lock more often. I don't think this will be the case, as we would check the inode flags before we take the lock, so we would martinally increase the lock contention on the LRU lock. We could mitigate this by doing the LRU list add at lookup time, where we already have to grab some of these locks, but I don't want to get into premature optimization territory here. I'm just surfacing it as a potential problem. POTENTIAL PROBLEM #2 We have a fair bit of logic in writeback around when we can just skip writeback, which amounts to we're currently doing the final truncate on an inode with ->i_nlink set. This is kind of a big problem actually, as we could no potentially end up with a large dirty inode that has an nlink of 0, and no current users, but would now be written back because it has a reference on it from writeback. Before we could get into the iput() and clean everything up before writeback would occur. Now writeback would occur, and then we'd clean up the inode. SOLUTION FOR POTENTIAL PROBLEM #1 I think we ignore this for now, get the patches written, do some benchmarking and see if this actually shows up in benchmarks. If it does then we come up with strategies to resolve this at that point. SOLUTION FOR POTENTIAL PROBLEM #2 <--- I would like input here My initial thought was to just move the final unlink logic outside of evict, and create a new reference count that represents the actual use of the inode. Then when the actual use went to 0 we would do the final unlink, de-coupling the cleanup of the on-disk inode (in the case of local file systems) from the freeing of the memory. This is a nice to have because the other thing that bites us occasionally is an iput() in a place where we don't necessarily want to be/is safe to do the final truncate on the inode. This would allow us to do the final truncate at a time when it is safe to do so. However this means adding a different reference count to the inode. I started to do this work, but it runs into some ugliness around ->tmpfile and file systems that don't use the normal inode caching things (bcachefs, xfs). I do like this solution, but I'm not sure if it's worth the complexity. The other solution here is to just say screw it, we'll just always writeback dirty inodes, and if they were unlinked then they get unlinked like always. I think this is also a fine solution, because generally speaking if you've got memory pressure on the system and the file is dirty and still open, you'll be writing it back normally anyway. But I don't know how people feel about this. CONCLUSION I'd love some feedback on my potential problems and solutions, as well as any other problems people may see. If we can get some discussion beforehand I can finish up these patches and get some testing in before LSFMMBPF and we can have a proper in-person discussion about the realities of the patchset. Thanks, Josef