On 22.03.22 04:12, CGEL wrote: > On Mon, Mar 21, 2022 at 04:45:40PM +0100, David Hildenbrand wrote: >> On 20.03.22 07:13, CGEL wrote: >>> On Fri, Mar 18, 2022 at 09:24:44AM +0100, David Hildenbrand wrote: >>>> On 18.03.22 02:41, CGEL wrote: >>>>> On Thu, Mar 17, 2022 at 11:05:22AM +0100, David Hildenbrand wrote: >>>>>> On 17.03.22 10:48, CGEL wrote: >>>>>>> On Thu, Mar 17, 2022 at 09:17:13AM +0100, David Hildenbrand wrote: >>>>>>>> On 17.03.22 03:03, CGEL wrote: >>>>>>>>> On Wed, Mar 16, 2022 at 03:56:23PM +0100, David Hildenbrand wrote: >>>>>>>>>> On 16.03.22 14:34, cgel.zte@xxxxxxxxx wrote: >>>>>>>>>>> From: Yang Yang <yang.yang29@xxxxxxxxxx> >>>>>>>>>>> >>>>>>>>>>> Delay accounting does not track the delay of ksm cow. When tasks >>>>>>>>>>> have many ksm pages, it may spend a amount of time waiting for ksm >>>>>>>>>>> cow. >>>>>>>>>>> >>>>>>>>>>> To get the impact of tasks in ksm cow, measure the delay when ksm >>>>>>>>>>> cow happens. This could help users to decide whether to user ksm >>>>>>>>>>> or not. >>>>>>>>>>> >>>>>>>>>>> Also update tools/accounting/getdelays.c: >>>>>>>>>>> >>>>>>>>>>> / # ./getdelays -dl -p 231 >>>>>>>>>>> print delayacct stats ON >>>>>>>>>>> listen forever >>>>>>>>>>> PID 231 >>>>>>>>>>> >>>>>>>>>>> CPU count real total virtual total delay total delay average >>>>>>>>>>> 6247 1859000000 2154070021 1674255063 0.268ms >>>>>>>>>>> IO count delay total delay average >>>>>>>>>>> 0 0 0ms >>>>>>>>>>> SWAP count delay total delay average >>>>>>>>>>> 0 0 0ms >>>>>>>>>>> RECLAIM count delay total delay average >>>>>>>>>>> 0 0 0ms >>>>>>>>>>> THRASHING count delay total delay average >>>>>>>>>>> 0 0 0ms >>>>>>>>>>> KSM count delay total delay average >>>>>>>>>>> 3635 271567604 0ms >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> TBH I'm not sure how particularly helpful this is and if we want this. >>>>>>>>>> >>>>>>>>> Thanks for replying. >>>>>>>>> >>>>>>>>> Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want >>>>>>>>> save memory, it's a tradeoff by suffering delay on ksm cow. Users can >>>>>>>>> get to know how much memory ksm saved by reading >>>>>>>>> /sys/kernel/mm/ksm/pages_sharing, but they don't know what the costs of >>>>>>>>> ksm cow delay, and this is important of some delay sensitive tasks. If >>>>>>>>> users know both saved memory and ksm cow delay, they could better use >>>>>>>>> madvise(, , MADV_MERGEABLE). >>>>>>>> >>>>>>>> But that happens after the effects, no? >>>>>>>> >>>>>>>> IOW a user already called madvise(, , MADV_MERGEABLE) and then gets the >>>>>>>> results. >>>>>>>> >>>>>>> Image user are developing or porting their applications on experiment >>>>>>> machine, they could takes those benchmark as feedback to adjust whether >>>>>>> to use madvise(, , MADV_MERGEABLE) or it's range. >>>>>> >>>>>> And why can't they run it with and without and observe performance using >>>>>> existing metrics (or even application-specific metrics?)? >>>>>> >>>>>> >>>>> I think the reason why we need this patch, is just like why we need >>>>> swap,reclaim,thrashing getdelay information. When system is complex, >>>>> it's hard to precise tell which kernel activity impact the observe >>>>> performance or application-specific metrics, preempt? cgroup throttle? >>>>> swap? reclaim? IO? >>>>> >>>>> So if we could get the factor's precise impact data, when we are tunning >>>>> the factor(for this patch it's ksm), it's more efficient. >>>>> >>>> >>>> I'm not convinced that we want to make or write-fault handler more >>>> complicated for such a corner case with an unclear, eventual use case. >>> >>> IIRC, KSM is designed for VM. But recently we found KSM works well for >>> system with many containers(save about 10%~20% of total memroy), and >>> container technology is more popular today, so KSM may be used more. >>> >>> To reduce the impact for write-fault handler, we may write a new function >>> with ifdef CONFIG_KSM inside to do this job? >> >> Maybe we just want to catch the impact of the write-fault handler when >> copying more generally? >> > We know kernel has different kind of COW, some are transparent for user. > For example child process may cause COW, and user should not care this > performance impact, because it's kernel inside mechanism, user is hard > to do something. But KSM is different, user can do the policy tuning in > userspace. If we metric all the COW, it may be noise, doesn't it? Only to some degree I think. The other delays (e.g., SWAP, RECLAIM) are also not completely transparent to the user, no? I mean, user space might affect them to some degree with some tunables, but it's not completely transparent for the user either. IIRC, we have these sources of COW that result in a r/w anon page (-> MAP_PRIVATE): (1) R/O-mapped, (possibly) shared anonymous page (fork() or KSM) (2) R/O-mapped, shared zeropage (e.g., KSM, read-only access to unpopulated page in MAP_ANON) (3) R/O-mapped, shared file/device/... page that requires a private copy on modifications (e.g., MAP_PRIVATE !MAP_ANON) Note that your current patch won't catch when KSM placed the shared zeropage (use_zero_page). Tracking the overall overhead might be of value I think, and it would still allow for determining how much KSM is involved by measuring with and without KSM enabled. >>> >>>> IIRC, whenever using KSM you're already agreeing to eventually pay a >>>> performance price, and the price heavily depends on other factors in the >>>> system. Simply looking at the number of write-faults might already give >>>> an indication what changed with KSM being enabled. >>>> >>> While saying "you're already agreeing to pay a performance price", I think >>> this is the shortcoming of KSM that putting off it being used more widely. >>> It's not easy for user/app to decide how to use madvise(, ,MADV_MERGEABLE). >> >> ... and my point is that the metric you're introducing might absolutely >> not be expressive for such users playing with MADV_MERGEABLE. IMHO >> people will look at actual application performance to figure out what >> "harm" will be done, no? >> >> But I do see value in capturing how many COW we have in general -- >> either via a counter or via a delay as proposed by you. >> > Thanks for your affirmative. As describe above, or we add a vm counter: > KSM_COW? As I'm messing with the COW logic lately (e.g., [1]) I'd welcome vm counters for all different kind of COW-related events, especially (1) COW of an anon, !KSM page (2) COW of a KSM page (3) COW of the shared zeropage (4) Reuse instead of COW I used some VM counters myself to debug/test some of my latest changes. >>> >>> Is there a more easy way to use KSM, enjoying memory saving while minimum >>> the performance price for container? We think it's possible, and are working >>> for a new patch: provide a knob for cgroup to enable/disable KSM for all tasks >>> in this cgroup, so if your container is delay sensitive just leave it, and if >>> not you can easy to enable KSM without modify app code. >>> >>> Before using the new knob, user might want to know the precise impact of KSM. >>> I think write-faults is indirection. If indirection is good enough, why we need >>> taskstats and PSI? By the way, getdelays support container statistics. >> >> Would anything speak against making this more generic and capturing the >> delay for any COW, not just for KSM? > I think we'd better to export data to userspace that is meaning for user. > User may no need kernel inside mechanism'data. Reading Documentation/accounting/delay-accounting.rst I wonder what we best put in there. "Tasks encounter delays in execution when they wait for some kernel resource to become available." I mean, in any COW event we are waiting for the kernel to create a copy. This could be of value even if we add separate VM counters. [1] https://lore.kernel.org/linux-mm/20220315104741.63071-2-david@xxxxxxxxxx/T/ -- Thanks, David / dhildenb