On 20.03.22 07:13, CGEL wrote: > On Fri, Mar 18, 2022 at 09:24:44AM +0100, David Hildenbrand wrote: >> On 18.03.22 02:41, CGEL wrote: >>> On Thu, Mar 17, 2022 at 11:05:22AM +0100, David Hildenbrand wrote: >>>> On 17.03.22 10:48, CGEL wrote: >>>>> On Thu, Mar 17, 2022 at 09:17:13AM +0100, David Hildenbrand wrote: >>>>>> On 17.03.22 03:03, CGEL wrote: >>>>>>> On Wed, Mar 16, 2022 at 03:56:23PM +0100, David Hildenbrand wrote: >>>>>>>> On 16.03.22 14:34, cgel.zte@xxxxxxxxx wrote: >>>>>>>>> From: Yang Yang <yang.yang29@xxxxxxxxxx> >>>>>>>>> >>>>>>>>> Delay accounting does not track the delay of ksm cow. When tasks >>>>>>>>> have many ksm pages, it may spend a amount of time waiting for ksm >>>>>>>>> cow. >>>>>>>>> >>>>>>>>> To get the impact of tasks in ksm cow, measure the delay when ksm >>>>>>>>> cow happens. This could help users to decide whether to user ksm >>>>>>>>> or not. >>>>>>>>> >>>>>>>>> Also update tools/accounting/getdelays.c: >>>>>>>>> >>>>>>>>> / # ./getdelays -dl -p 231 >>>>>>>>> print delayacct stats ON >>>>>>>>> listen forever >>>>>>>>> PID 231 >>>>>>>>> >>>>>>>>> CPU count real total virtual total delay total delay average >>>>>>>>> 6247 1859000000 2154070021 1674255063 0.268ms >>>>>>>>> IO count delay total delay average >>>>>>>>> 0 0 0ms >>>>>>>>> SWAP count delay total delay average >>>>>>>>> 0 0 0ms >>>>>>>>> RECLAIM count delay total delay average >>>>>>>>> 0 0 0ms >>>>>>>>> THRASHING count delay total delay average >>>>>>>>> 0 0 0ms >>>>>>>>> KSM count delay total delay average >>>>>>>>> 3635 271567604 0ms >>>>>>>>> >>>>>>>> >>>>>>>> TBH I'm not sure how particularly helpful this is and if we want this. >>>>>>>> >>>>>>> Thanks for replying. >>>>>>> >>>>>>> Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want >>>>>>> save memory, it's a tradeoff by suffering delay on ksm cow. Users can >>>>>>> get to know how much memory ksm saved by reading >>>>>>> /sys/kernel/mm/ksm/pages_sharing, but they don't know what the costs of >>>>>>> ksm cow delay, and this is important of some delay sensitive tasks. If >>>>>>> users know both saved memory and ksm cow delay, they could better use >>>>>>> madvise(, , MADV_MERGEABLE). >>>>>> >>>>>> But that happens after the effects, no? >>>>>> >>>>>> IOW a user already called madvise(, , MADV_MERGEABLE) and then gets the >>>>>> results. >>>>>> >>>>> Image user are developing or porting their applications on experiment >>>>> machine, they could takes those benchmark as feedback to adjust whether >>>>> to use madvise(, , MADV_MERGEABLE) or it's range. >>>> >>>> And why can't they run it with and without and observe performance using >>>> existing metrics (or even application-specific metrics?)? >>>> >>>> >>> I think the reason why we need this patch, is just like why we need >>> swap,reclaim,thrashing getdelay information. When system is complex, >>> it's hard to precise tell which kernel activity impact the observe >>> performance or application-specific metrics, preempt? cgroup throttle? >>> swap? reclaim? IO? >>> >>> So if we could get the factor's precise impact data, when we are tunning >>> the factor(for this patch it's ksm), it's more efficient. >>> >> >> I'm not convinced that we want to make or write-fault handler more >> complicated for such a corner case with an unclear, eventual use case. > > IIRC, KSM is designed for VM. But recently we found KSM works well for > system with many containers(save about 10%~20% of total memroy), and > container technology is more popular today, so KSM may be used more. > > To reduce the impact for write-fault handler, we may write a new function > with ifdef CONFIG_KSM inside to do this job? Maybe we just want to catch the impact of the write-fault handler when copying more generally? > >> IIRC, whenever using KSM you're already agreeing to eventually pay a >> performance price, and the price heavily depends on other factors in the >> system. Simply looking at the number of write-faults might already give >> an indication what changed with KSM being enabled. >> > While saying "you're already agreeing to pay a performance price", I think > this is the shortcoming of KSM that putting off it being used more widely. > It's not easy for user/app to decide how to use madvise(, ,MADV_MERGEABLE). ... and my point is that the metric you're introducing might absolutely not be expressive for such users playing with MADV_MERGEABLE. IMHO people will look at actual application performance to figure out what "harm" will be done, no? But I do see value in capturing how many COW we have in general -- either via a counter or via a delay as proposed by you. > > Is there a more easy way to use KSM, enjoying memory saving while minimum > the performance price for container? We think it's possible, and are working > for a new patch: provide a knob for cgroup to enable/disable KSM for all tasks > in this cgroup, so if your container is delay sensitive just leave it, and if > not you can easy to enable KSM without modify app code. > > Before using the new knob, user might want to know the precise impact of KSM. > I think write-faults is indirection. If indirection is good enough, why we need > taskstats and PSI? By the way, getdelays support container statistics. Would anything speak against making this more generic and capturing the delay for any COW, not just for KSM? -- Thanks, David / dhildenb