Re: [PATCH v2 06/28] kernel: Define gettimeofday vdso common code

Will Deacon <will.deacon@xxxxxxx> · Wed, 13 Feb 2019 17:05:44 +0000

Hi Thomas,

On Fri, Feb 08, 2019 at 08:28:07PM +0100, Thomas Gleixner wrote:
> On Fri, 8 Feb 2019, Will Deacon wrote:
> > On Fri, Dec 07, 2018 at 05:53:21PM +0000, Will Deacon wrote:
> > > Anyway, moving the counter read into the protected region is a little fiddly
> > > because the memory barriers we have in there won't give us the ordering we
> > > need. We'll instead need to do something nasty, like create a dependency
> > > from the counter read to the read of the seqlock:
> > > 
> > > Maybe the untested crufty hack below, although this will be a nightmare to
> > > implement in C.
> > 
> > We discussed this in person this week, but you couldn't recall the details
> > off the top of your head so I'm replying here. Please could you clarify what
> > your concern was with the existing code, and whether or not I've got the
> > wrong end of the stick?
> 
> If you just collect the variables under the seqcount protection and have
> the readout of the timer outside of it then you are not guaranteeing
> consistent state.
> 
> The problem is:
> 
> 	do {
> 		seq = read_seqcount_begin(d->seq);
> 		last = d->cycle_last;
> 		mult = d->mult;
> 		shift = d->shift;
> 		ns = d->ns_base;
> 	while (read_seqcount_retry(d->seq, seq));
> 
> 	ns += ((read_clock() - last) * mult);
> 	ns >>= shift;
> 
> So on the first glance this looks consistent because you collect all data
> and then do the read and calc outside the loop.
> 
> But if 'd->mult' gets updated _before_ read_clock() then you can run into a
> situation where time goes backwards with the next read.
> 
> Here is the flow you need for that:
> 
> t1 = gettime()
>      {
> 	collect_data()
> 
>      ---> Interrupt, updates mult (mult becomes smaller)
> 
>      	  This can expand over a very long time when the task is scheduled
>      	  out here and there are multiple updates in between. The farther
>      	  out the read is delayed, the more likely the problem is going to
>      	  observable.
> 
>      	read_clock_and_calc()
>      }
> 
> t2 = gettime()
>      {
> 	collect_data()
>      	read_clock_and_calc()
>      }
> 
> This second read uses updated data, i.e. the smaller mult. So depending on
> the distance of the two reads and the difference of the mult, the caller
> can observe t2 < t1, which is a NONO. You can observe it on a single
> CPU. No virt, SMP, migration needed at all.

Thank you for this explanation, I agree that we have a bug here (and have
done since day 1 on arm64!). I've replied separately on ktime_get().

Will