On 12/11/20 3:00 PM, Pavel Tatashin wrote:
I guess revert what we did (unpin) and return an error. The interesting question is what can make migration/isolation fail
OK. I will make the necessary changes. Let's handle errors properly.
Whatever the cause for the error, we will know it when it happens, and
when error is returned. I think I will add a 10-time retry instead of
the infinite retry that we currently have. The 10-times retry we
currently have during the hot-remove path.
It occurs to me that maybe the pre-existing infinite loop shouldn't be
there at all? Would it be better to let the callers retry? Obviously that
would be a separate patch and I'm not sure it's safe to make that change,
but the current loop seems buried maybe too far down.
Thoughts, anyone?
thanks,
--
John Hubbard
NVIDIA