If RWF_UNCACHED is set for a write, mark the folios being written with drop_writeback. Then writeback completion will drop the pages. The write_iter handler simply kicks off writeback for the pages, and writeback completion will take care of the rest. This provides similar benefits to using RWF_UNCACHED with reads. Testing buffered writes on 32 files: writing bs 65536, uncached 0 1s: 196035MB/sec, MB=196035 2s: 132308MB/sec, MB=328147 3s: 132438MB/sec, MB=460586 4s: 116528MB/sec, MB=577115 5s: 103898MB/sec, MB=681014 6s: 108893MB/sec, MB=789907 7s: 99678MB/sec, MB=889586 8s: 106545MB/sec, MB=996132 9s: 106826MB/sec, MB=1102958 10s: 101544MB/sec, MB=1204503 11s: 111044MB/sec, MB=1315548 12s: 124257MB/sec, MB=1441121 13s: 116031MB/sec, MB=1557153 14s: 114540MB/sec, MB=1671694 15s: 115011MB/sec, MB=1786705 16s: 115260MB/sec, MB=1901966 17s: 116068MB/sec, MB=2018034 18s: 116096MB/sec, MB=2134131 where it's quite obvious where the page cache filled, and performance dropped from to about half of where it started, settling in at around 115GB/sec. Meanwhile, 32 kswapds were running full steam trying to reclaim pages. Running the same test with uncached buffered writes: writing bs 65536, uncached 1 1s: 198974MB/sec 2s: 189618MB/sec 3s: 193601MB/sec 4s: 188582MB/sec 5s: 193487MB/sec 6s: 188341MB/sec 7s: 194325MB/sec 8s: 188114MB/sec 9s: 192740MB/sec 10s: 189206MB/sec 11s: 193442MB/sec 12s: 189659MB/sec 13s: 191732MB/sec 14s: 190701MB/sec 15s: 191789MB/sec 16s: 191259MB/sec 17s: 190613MB/sec 18s: 191951MB/sec and the behavior is fully predictable, performing the same throughout even after the page cache would otherwise have fully filled with dirty data. It's also about 65% faster, and using half the CPU of the system compared to the normal buffered write. Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> --- mm/filemap.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/mm/filemap.c b/mm/filemap.c index 1e455ca872b5..d4c5928c5e2a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1610,6 +1610,8 @@ EXPORT_SYMBOL(folio_wait_private_2_killable); */ void folio_end_writeback(struct folio *folio) { + bool folio_uncached; + VM_BUG_ON_FOLIO(!folio_test_writeback(folio), folio); /* @@ -1631,6 +1633,7 @@ void folio_end_writeback(struct folio *folio) * reused before the folio_wake_bit(). */ folio_get(folio); + folio_uncached = folio_test_clear_uncached(folio); if (__folio_end_writeback(folio)) folio_wake_bit(folio, PG_writeback); acct_reclaim_writeback(folio); @@ -1639,12 +1642,10 @@ void folio_end_writeback(struct folio *folio) * If folio is marked as uncached, then pages should be dropped when * writeback completes. Do that now. */ - if (folio_test_uncached(folio)) { - folio_lock(folio); - if (invalidate_complete_folio2(folio->mapping, folio, 0)) - folio_clear_uncached(folio); + if (folio_uncached && folio_trylock(folio)) { + if (folio->mapping) + invalidate_complete_folio2(folio->mapping, folio, 0); folio_unlock(folio); - } folio_put(folio); } @@ -4082,6 +4083,9 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) if (unlikely(status < 0)) break; + if (iocb->ki_flags & IOCB_UNCACHED) + folio_set_uncached(folio); + offset = offset_in_folio(folio, pos); if (bytes > folio_size(folio) - offset) bytes = folio_size(folio) - offset; @@ -4122,6 +4126,12 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) if (!written) return status; + if (iocb->ki_flags & IOCB_UNCACHED) { + /* kick off uncached writeback, completion will drop it */ + __filemap_fdatawrite_range(mapping, iocb->ki_pos, + iocb->ki_pos + written, + WB_SYNC_NONE); + } iocb->ki_pos += written; return written; } -- 2.45.2