On Wed, Sep 11, 2024 at 05:08:24PM GMT, Johannes Weiner wrote: > On Wed, Sep 11, 2024 at 10:38:00AM -0700, Shakeel Butt wrote: > > The kernel truncates the page cache in batches of PAGEVEC_SIZE. For each > > batch, it traverses the page cache tree and collects the entries (folio > > and shadow entries) in the struct folio_batch. For the shadow entries > > present in the folio_batch, it has to traverse the page cache tree for > > each individual entry to remove them. This patch optimize this by > > removing them in a single tree traversal. > > > > On large machines in our production which run workloads manipulating > > large amount of data, we have observed that a large amount of CPUs are > > spent on truncation of very large files (100s of GiBs file sizes). More > > specifically most of time was spent on shadow entries cleanup, so > > optimizing the shadow entries cleanup, even a little bit, has good > > impact. > > > > To evaluate the changes, we created 200GiB file on a fuse fs and in a > > memcg. We created the shadow entries by triggering reclaim through > > memory.reclaim in that specific memcg and measure the simple truncation > > operation. > > > > # time truncate -s 0 file > > > > time (sec) > > Without 5.164 +- 0.059 > > With-patch 4.21 +- 0.066 (18.47% decrease) > > > > Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > > Looks good to me. One thing that's a bit subtle is that the tree walk > assumes indices[] are ordered, such that indices[0] and indices[nr-1] > reliably denote the range of interest. AFAICS that's the case for the > current callers but if not that could be a painful bug to hunt down. The current callers use find_get_entries() and find_lock_entries() to fill up the indices array which provides this guarantee. > > Assessing lowest and highest index in that first batch iteration seems > a bit overkill though. Maybe just a comment stating the requirement? I will add a comment in v2. > > Otherwise, > > Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> Thanks for the review.