On Tue, May 22, 2018 at 10:24:54AM +0200, Christoph Hellwig wrote: > On Tue, May 22, 2018 at 10:07:45AM +1000, Dave Chinner wrote: > > > Something doesn't smell right here. The only pages we need to read in > > > are the first and last pages in the write_begin range, and only if they > > > aren't page aligned and the underlying extent is IOMAP_MAPPED, right? > > > > And not beyond EOF, too. > > > > The bufferhead code handles this via the buffer_new() flag - it > > triggers the skipping of read IO and the states in which it is > > set are clearly indicated in iomap_to_bh(). That same logic needs to > > apply here. > > The buffer_new logic itself isn't really something to copy directly > as it has all kinds of warts.. Sure, my point was that it documents all the cases where we can avoid reading from disk, not that we should copy the logic. > > > I also noticed that speculative preallocation kicks in by the second 80M > > > write() call and writeback for the second call can successfully allocate > > > the entire preallocation, which means that the third (or nth) write call > > > can have a real extent already mapped in, and then we end up reading it. > > > > Yeah, that's because there's no check against EOF here. These writes > > are all beyond EOF, so there shouldn't be any read at all... > > The EOF case is already handled in iomap_block_needs_zeroing. Ok, I missed that detail as it's in a different patch. It looks like if (pos > EOF) it will zeroed. But in this case I think that pos == EOF and so it was reading instead. That smells like off-by-one bug to me. > We just > need to skip the read for ranges entirely covered by the write. Yup. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx