On Tue, Oct 08, 2013 at 07:08:11PM -0300, Marcelo Tosatti wrote: > On Tue, Oct 08, 2013 at 09:37:05AM -0400, Don Zickus wrote: > > On Mon, Oct 07, 2013 at 10:05:17PM -0300, Marcelo Tosatti wrote: > > > Implement reset of kernel watchdogs at pvclock read time. This avoids > > > adding special code to every watchdog. > > > > > > This is possible for watchdogs which measure time based on sched_clock() or > > > ktime_get() variants. > > > > > > Suggested by Don Zickus. > > > > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > > > Awesome. Thanks for figuring this out Marcelo. Does that mean we can > > revert commit 5d1c0f4a now? :-) > > Unfortunately no: soft lockup watchdog does not measure time based on > sched_clock but on hrtimer interrupt count :-( I believe it does. See __touch_watchdog() which calls get_timestamp() --> local_clock(). That is how it calculates the duration of the softlockup. Now with your patch, it just sets the timestamp to zero with touch_softlockup_watchdog_sync(), which is fine. It will just sync up the clock, set a new timestamp, and check again in the next hrtimer interrupt. So I guess I am confused what that commit does compared to this patch. > (see the the softlockup code in question, perhaps you can point to > something that i'm missing). > > BTW, are you OK with printing additional steal time information? > https://lkml.org/lkml/2013/6/27/755 Well, I thought this patch was supposed to replace that patch? Why do you still need that patch? Perhaps my confusion is centered around which softlockups are the problem the VM's or the host's. >From the host perspective, I didn't think you would have any problem because the VM is just another process that runs in its time slice. >From the VM perspective, the whole overcommit/'wait a couple of minutes to run again', could easily cause lockups. But I thought this patch set detected that and touched the watchdogs early enough that when the next iteration of the hrtimer came through, it would _not_ cause a softlockup (it would delay it an hrtimer cycle). So, if I am misunderstanding the problems (which I probably am :-) ), I could use a pointer or a quick explaination to remind what the issues are again and why you think the other patches are still necessary. :-) Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html