On Tue, Jun 13, 2023 at 03:00:14AM +0100, Matthew Wilcox wrote: > On Tue, Jun 13, 2023 at 11:30:13AM +1000, Dave Chinner wrote: > > Indeed, if we do a 1MB write at offset 4KB, we'll get 4kB at 4KB, 8KB > > and 12kB (because we can't do order-1 folios), then order-2 at 16KB, > > order-3 at 32kB, and so on until we hit offset 1MB where we will do > > an order-0 folio allocation again (because the remaining length is > > 4KB). The next 1MB write will then follow the same pattern, right? > > Yes. Assuming we get another write ... > > > I think this ends up being sub-optimal and fairly non-obvious > > non-obvious behaviour from the iomap side of the fence which is > > clearly asking for high-order folios to be allocated. i.e. a small > > amount of allocate-around to naturally align large folios when the > > page cache is otherwise empty would make a big difference to the > > efficiency of non-large-folio-aligned sequential writes... > > At this point we're arguing about what I/O pattern to optimise for. > I'm going for a "do no harm" approach where we only allocate exactly as > much memory as we did before. You're advocating for a > higher-risk/higher-reward approach. Not really - I'm just trying to understand the behaviour the change will result in, compared to what would be considered optimal as it's not clearly spelled out in either the code or the commit messages. If I hadn't looked at the code closely and saw a trace with this sort of behaviour (i.e. I understood large folios were in use, but not exactly how they worked), I'd be very surprised to see a weird repeated pattern of varying folio sizes. I'd probably think it was a bug in the implementation.... > I'd prefer the low-risk approach for now; we can change it later! That's fine by me - just document the limitations and expected behaviour in the code rather than expect people to have to discover this behaviour for themselves. > I'd like to see some amount of per-fd write history (as we have per-fd > readahead history) to decide whether to allocate large folios ahead of > the current write position. As with readahead, I'd like to see that even > doing single-byte writes can result in the allocation of large folios, > as long as the app has done enough of them. *nod* We already have some hints in the iomaps that can tell you this sort of thing. e.g. if ->iomap_begin() returns a delalloc iomap that extends beyond the current write, we're performing a sequence of multiple sequential writes..... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx