Re: [PATCH V3 10/10] x86/pks: Add PKS test code

Ira Weiny <ira.weiny@xxxxxxxxx> · Thu, 17 Dec 2020 20:05:09 -0800

On Thu, Dec 17, 2020 at 12:55:39PM -0800, Dave Hansen wrote:
> On 11/6/20 3:29 PM, ira.weiny@xxxxxxxxx wrote:
> > +		/* Arm for context switch test */
> > +		write(fd, "1", 1);
> > +
> > +		/* Context switch out... */
> > +		sleep(4);
> > +
> > +		/* Check msr restored */
> > +		write(fd, "2", 1);
> 
> These are always tricky.  What you ideally want here is:
> 
> 1. Switch away from this task to a non-PKS task, or
> 2. Switch from this task to a PKS-using task, but one which has a
>    different PKS value

Or both...

> 
> then, switch back to this task and make sure PKS maintained its value.
> 
> *But*, there's no absolute guarantee that another task will run.  It
> would not be totally unreasonable to have the kernel just sit in a loop
> without context switching here if no other tasks can run.
> 
> The only way you *know* there is a context switch is by having two tasks
> bound to the same logical CPU and make sure they run one after another.

Ah...  We do that.

...
+       CPU_ZERO(&cpuset);
+       CPU_SET(0, &cpuset);
+       /* Two processes run on CPU 0 so that they go through context switch.  */
+       sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset);
...

I think this should be ensuring that both the parent and the child are
running on CPU 0.  At least according to the man page they should be.

<man>
	A child created via fork(2) inherits its parent's CPU affinity mask.
</man>

Perhaps a better method would be to synchronize the 2 threads more to ensure
that we are really running at the 'same time' and forcing the context switch.

>  This just gets itself into a state where it *CAN* context switch and
> prays that one will happen.

Not sure what you mean by 'This'?  Do you mean that running on the same CPU
will sometimes not force a context switch?  Or do you mean that the sleeps
could be badly timed and the 2 threads could run 1 after the other on the same
CPU?  The latter is AFAICT the most likely case.

> 
> You can also run a bunch of these in parallel bound to a single CPU.
> That would also give you higher levels of assurance that *some* context
> switch happens at sleep().

I think more cycles is a good idea for sure.  But I'm more comfortable with
forcing the test to be more synchronized so that it is actually running in the
order we think/want it to be.

> 
> One critical thing with these tests is to sabotage the kernel and then
> run them and make *sure* they fail.  Basically, if you screw up, do they
> actually work to catch it?

I'll try and come up with a more stressful test.

Ira