Re: [PATCH v6 1/3] KVM: selftests: implement random number generation for guest code

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 11 Oct 2022 23:47:01 +0000

On Tue, Oct 11, 2022, Colton Lewis wrote:
> Sean Christopherson <seanjc@xxxxxxxxxx> writes:
> > Regardless of whether or not the details are gory, having to be aware of
> > those details unnecessarily impedes understanding the code.  The vast, vast
> > majority of folks that read this code won't care about how PRNGs work.
> > Even if the reader is familiar with PRNGs, those details aren't at all
> > relevant to understanding what the guest code does.  The reader only needs
> > to know "oh, this is randomizing the address".  How the randomization works
> > is completely irrelevant for that level of understanding.
> 
> It is relevant if the reader of the guest code cares about reentrancy
> and thread-safety (good for such things as reproducing the same stream
> of randoms from the same initial conditions), because they will have to
> manage some state to make that work. Whether that state is an integer or
> an opaque struct requires the same level of knowledge to use the API.

By "readers" I'm not (only) talking about developers actively writing code, I'm
I'm talking about people that run this test, encounter a failure, and read the code
to try and understand what went wrong.  For those people, knowing that the guest is
generating a random number is sufficient information during initial triage.  If the
failure ends up being related to the random number, then yes they'll likely need to
learn the details, but that's a few steps down the road.

> > > I agree returning the value could make it easier to use as part of
> > > expressions, but is it that important?
> 
> > Yes, because in pretty much every use case, the random number is going to
> > be immediately consumed.  Readability and robustness aside, returning the
> > value cuts the amount of code need to generate and consume a random number
> > in half.
> 
> Ok. This is trivial to change (or implement on top of what is
> there). Would you be happy with something like the following?
> 
> uint32_t guest_random(uint32_t *seed)
> {
> 	*seed = (uint64_t)*seed * 48271 % ((uint32_t)(1 << 31) - 1);
> 	return *seed;
> }

Honestly, no.  I truly don't understand the pushback on throwing the seed into
a struct and giving the helpers more precise names.  E.g. without looking at the
prototype, guest_random() tells the reader nothing about the size of the random
number returned, whereas <namespace>_random_u32() or <namespace>_get_random_u32()
(if we want to more closely follow kernel nomenclature) tells the reader exactly
what is returned.

As a contrived example, imagine a bug where the intent was to do 50/50 reads/writes,
but the test ends up doing almost all writes.  Code that looks like:

	if (guest_random(...))

is less obviously wrong than

	if (guest_random_u32(...))

even if the user knows nothing about how the random number is generated.

And aside from hiding details and providing abstraction for readers, throwing a
shim struct around the seed means that if we do want to change up the internal
implementation, then we can do so without having to churn users.

The code is trivial to write and I can't think of any meaningful downside.  Worst
case scenario, we end up with an implementation that is slightly more formal than
then we really need.