On Wed, Feb 10, 2021 at 05:16:29PM -0800, Mike Kravetz wrote: > Should probably check for -EBUSY as this means someone started using > the page while we were allocating a new one. It would complicate the > code to try and do the 'right thing'. Right thing might be getting > dissolving the new pool page and then trying to isolate this in use > page. Of course things could change again while you are doing that. :( Yeah, I kept the error handling rather low just be clear about the approach I was leaning towards, but yes, we should definitely check for -EBUSY on dissolve_free_huge_page(). And it might be that dissolve_free_huge_page() returns -EBUSY on the old page, and we need to dissolve the freshly allocated one as it is not going to be used, and that might fail as well due to reserves for instance, or maybe someone started using it? I have to confess that I need to check the reservation code closer to be aware of corner cases. We used to try to be clever in such situations in memory-failure code, but at some point you end up thinking "ok, how many retries are considered enough?". That code was trickier as we were handling uncorrected/corrected memory errors, so we strived to do our best, but here we can be more flexible as the whole thing is racy per se, and just fail if we find too many obstacles. I shall resume work early next week. Thanks for the tips ;-) -- Oscar Salvador SUSE L3