What I am still missing is why this is (a) arm64 only; and (b) if this is
something we should really worry about. There are other reasons (e.g.,
speculative references) why migration could temporarily fail, does it happen
that often that it is really something we have to worry about?
The test fails consistently on arm64. It's my rough understanding that it's
failing due to migration backing off because the fault handler has raised the
ref count? (Dev correct me if I'm wrong).
So the real question is, is it a valid test in the first place? Should we just
delete the test or do we need to strengthen the kernel's guarrantees around
migration success?
I think the test should retry migration a number of times in case it
fails. But if it is a persistent migration failure, the test should fail.
--
Cheers,
David / dhildenb