On Mon, Oct 31, 2022 at 9:00 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Fri, Oct 28, 2022 at 10:41:17AM -0700, Yosry Ahmed wrote: > > On Fri, Oct 28, 2022 at 7:39 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > pgscan_user: User-requested reclaim. Could be confusing if we ever > > > have an in-kernel proactive reclaim driver - unless that would then go > > > to another counter (new or kswapd). > > > > > > pgscan_ext: Reclaim activity from extraordinary/external > > > requests. External as in: outside the allocation context. > > > > I imagine if the kernel is doing proactive reclaim on its own, we > > might want a separate counter for that anyway to monitor what the > > kernel is doing. So maybe pgscan_user sounds nice for now, but I also > > like that the latter explicitly says "this is external to the > > allocation context". But we can just go with pgscan_user and document > > it properly. > > Yes, I think you're right. pgscan_user sounds good to me. > > > How would khugepaged fit in this story? Seems like it would be part of > > pgscan_ext but not pgscan_user. I imagine we also don't want to > > pollute proactive reclaim counters with khugepaged reclaim (or other > > non-direct reclaim). > > > > Maybe pgscan_user and pgscan_kernel/pgscan_indirect for things like khugepaged? > > The problem with pgscan_kernel/indirect is that if we add a proactive > > reclaim kthread in the future it would technically fit there but we > > would want a separate counter for it. > > > > I am honestly not sure where to put khugepaged. The reasons I don't > > like a dedicated counter for khugepaged are: > > - What if other kthreads like khugepaged start doing the same, do we > > add one counter per-thread? > > It's unlikely there will be more. > > The reason khugepaged doesn't rely on kswapd is unique to THP > allocations: they can require an exorbitant amount of work to > assemble, but due to fragmentation those requests may fail > permanently. We don't want to burden a shared facility like kswapd > with large amounts of speculative work on behalf of what are (still*) > cornercase requests. > > This isn't true for other allocations. We do have __GFP_NORETRY sites > here and there that rather fall back early than put in the full amount > of work; but overall we expect allocations to succeed - and kswapd to > be able to balance for them!!** - because the alternative tends to be > OOMs, or drivers and workloads aborting on -ENOMEM. > > (* As we evolve the allocator and normalize huge page requests > (folios), kswapd may also eventually balance for THPs again. IOW, > it's more likely for this exception to disappear again than it is > that we'll see more of them.) > > (** This is also why it's no big deal if other kthreads that rely on > kswapd contribute to direct reclaim stats. First, it's highly > error prone to determine on a case by case basis whether userspace > could be waiting behind that direct reclaim - as Yang Shi's > writeback example demonstrates. Second, if kswapd is overwhelmed, > it's likely to impact userspace *anyway*! The benefit of this > classification work is questionable.) Thanks for the explanation :) > > > - What if we deprecate khugepaged (or such threads)? Seems more likely > > than deprecating kswapd. > > If that happens, we can remove the counter again. The bar isn't as > high for vmstat as it for other ABI, and we've updated it plenty of > times to reflect changes in the MM implementation. Good to know! I thought we'd be stuck with it forever. > > > Looks like we want a stat that would group all of this reclaim coming > > from non-direct kthreads, but would not include a future proactive > > reclaim kthread. > > I think the desire to generalize overcomplicates things here in a way > that isn't actually meaningful. > > Think of direct reclaim stats as a signal that either a) kswapd is > broken or b) memory pressure is high enough to cause latencies in the > class of requests that are of interest to userspace. This is true for > all cases but khugepaged. Agreed. I believe moving forward with pgscan_user and pgscan_khugepaged style stats makes sense. Thanks, Johannes!