On Mon, 2020-01-13 at 13:01 +0000, Andrew Cooper wrote: > On 13/01/2020 11:43, Singh, Balbir wrote: > > On Mon, 2020-01-13 at 11:16 +0100, Peter Zijlstra wrote: > > > On Fri, Jan 10, 2020 at 07:35:20AM -0800, Eduardo Valentin wrote: > > > > Hey Peter, > > > > > > > > On Wed, Jan 08, 2020 at 11:50:11AM +0100, Peter Zijlstra wrote: > > > > > On Tue, Jan 07, 2020 at 11:45:26PM +0000, Anchal Agarwal wrote: > > > > > > From: Eduardo Valentin <eduval@xxxxxxxxxx> > > > > > > > > > > > > System instability are seen during resume from hibernation when > > > > > > system > > > > > > is under heavy CPU load. This is due to the lack of update of > > > > > > sched > > > > > > clock data, and the scheduler would then think that heavy CPU hog > > > > > > tasks need more time in CPU, causing the system to freeze > > > > > > during the unfreezing of tasks. For example, threaded irqs, > > > > > > and kernel processes servicing network interface may be delayed > > > > > > for several tens of seconds, causing the system to be unreachable. > > > > > > The fix for this situation is to mark the sched clock as unstable > > > > > > as early as possible in the resume path, leaving it unstable > > > > > > for the duration of the resume process. This will force the > > > > > > scheduler to attempt to align the sched clock across CPUs using > > > > > > the delta with time of day, updating sched clock data. In a post > > > > > > hibernation event, we can then mark the sched clock as stable > > > > > > again, avoiding unnecessary syncs with time of day on systems > > > > > > in which TSC is reliable. > > > > > > > > > > This makes no frigging sense what so bloody ever. If the clock is > > > > > stable, we don't care about sched_clock_data. When it is stable you > > > > > get > > > > > a linear function of the TSC without complicated bits on. > > > > > > > > > > When it is unstable, only then do we care about the > > > > > sched_clock_data. > > > > > > > > > > > > > Yeah, maybe what is not clear here is that we covering for situation > > > > where clock stability changes over time, e.g. at regular boot clock is > > > > stable, hibernation happens, then restore happens in a non-stable > > > > clock. > > > > > > Still confused, who marks the thing unstable? The patch seems to suggest > > > you do yourself, but it is not at all clear why. > > > > > > If TSC really is unstable, then it needs to remain unstable. If the TSC > > > really is stable then there is no point in marking is unstable. > > > > > > Either way something is off, and you're not telling me what. > > > > > > > Hi, Peter > > > > For your original comment, just wanted to clarify the following: > > > > 1. After hibernation, the machine can be resumed on a different but > > compatible > > host (these are VM images hibernated) > > 2. This means the clock between host1 and host2 can/will be different > > The guests TSC value is part of all save/migrate/resume state. Given > this bug, I presume you've actually discarded all register state on > hibernate, and the TSC is starting again from 0? > > The frequency of the new TSC might very likely be different, but the > scale/offset in the paravirtual clock information should let Linux's > view of time stay consistent. > I am looking at my old dmesg logs, which I seem to have lost to revalidate, but I think Eduardo had a different point. I should point out that I was adding to the list of potentially missed assumptions > > In your comments are you making the assumption that the host(s) is/are the > > same? Just checking the assumptions being made and being on the same page > > with > > them. > > TSCs are a massive source of "fun". I'm not surprised that there are > yet more bugs around. > > Does anyone actually know what does/should happen to the real TSC on > native S4? The default course of action should be for virtualisation to > follow suit. > > ~Andrew Balbir