On Tue, Mar 29, 2016 at 04:15:58PM +1100, Dave Chinner wrote: > On Mon, Mar 28, 2016 at 08:25:46PM -0800, Kent Overstreet wrote: > > Bit of previous discussion: > > http://thread.gmane.org/gmane.linux.file-systems/101201/ > > > > The underlying issue is that we have no mechanism for invalidating a range of > > the pagecache and then _keeping it invalidated_ while we Do Stuff. > > > > The fallocate INSERT_RANGE/COLLAPSE_RANGE situation seems likely to be worse > > than I initially thought. I've been digging into this in the course of bcachefs > > testing - I was hitting assertions that meant state hanging off the page cache > > (in this case, allocation information, i.e. whether we needed to reserve space > > on write) was inconsistent with the btree in writepages(). > > > > Well, bcachefs isn't the only filesystem that hangs additional state off the > > pagecache, and the situation today is that an unpriviliged user can cause > > inconsistencies there by just doing buffered reads concurrently with > > INSERT_RANGE/COLLAPSE_RANGE. I highly highly doubt this is an issue of just > > "oops, you corrupted your file because you were doing stupid stuff" - who knows > > what internal invariants are getting broken here, and I don't particularly care > > to find out. > > I'd like to see a test case for this. Concurrent IO and/or page > faults should not run at the same as fallocate on XFS. Hence I'd > like to see the test cases that demonstrate buffered reads are > causing corruption during insert/collapse range operations. We use > the same locking strategy for fallocate as we use for truncate and > all the other internal extent manipulation operations, so if there's > something wrong, we need to fix it. It's entirely possible I'm wrong about XFS - your fault path locking looked correct, and I did see you had extra locking in your buffered read path but I thought it was a different lock. I'll recheck later, but for the moment I'm just going to assume I misspoke (and tbh always found xfs's locking to be quite rigorous). ext4 uses the generic code in all the places you're hooking into though - .fault, .read_iter, etc. The scheme I've got in this patch should perform quite a bit better than what you're doing - only locking in the slow cache miss path, vs. every time you touch the page cache. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html