On Wed, Sep 11, 2024 at 10:38:00AM -0700, Shakeel Butt wrote: > The kernel truncates the page cache in batches of PAGEVEC_SIZE. For each > batch, it traverses the page cache tree and collects the entries (folio > and shadow entries) in the struct folio_batch. For the shadow entries > present in the folio_batch, it has to traverse the page cache tree for > each individual entry to remove them. This patch optimize this by > removing them in a single tree traversal. > > On large machines in our production which run workloads manipulating > large amount of data, we have observed that a large amount of CPUs are > spent on truncation of very large files (100s of GiBs file sizes). More > specifically most of time was spent on shadow entries cleanup, so > optimizing the shadow entries cleanup, even a little bit, has good > impact. > > To evaluate the changes, we created 200GiB file on a fuse fs and in a > memcg. We created the shadow entries by triggering reclaim through > memory.reclaim in that specific memcg and measure the simple truncation > operation. > > # time truncate -s 0 file > > time (sec) > Without 5.164 +- 0.059 > With-patch 4.21 +- 0.066 (18.47% decrease) > > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> Looks good to me. One thing that's a bit subtle is that the tree walk assumes indices[] are ordered, such that indices[0] and indices[nr-1] reliably denote the range of interest. AFAICS that's the case for the current callers but if not that could be a painful bug to hunt down. Assessing lowest and highest index in that first batch iteration seems a bit overkill though. Maybe just a comment stating the requirement? Otherwise, Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>