On 2022/5/9 17:58, Oscar Salvador wrote: > On Mon, May 09, 2022 at 05:04:54PM +0800, Miaohe Lin wrote: >>>> So that leaves us with either >>>> >>>> 1) Fail offlining -> no need to care about reonlining >> >> Maybe fail offlining will be a better alternative as we can get rid of many races >> between memory failure and memory offline? But no strong opinion. :) > > If taking care of those races is not an herculean effort, I'd go with > allowing offlining + disallow re-onlining. > Mainly because memory RAS stuff. This dose make sense to me. Thanks. We can try to solve those races if offlining + disallow re-onlining is applied. :) > > Now, to the re-onlining thing, we'll have to come up with a way to check > whether a section contains hwpoisoned pages, so we do not have to go > and check every single page, as that will be really suboptimal. Yes, we need a stable and cheap way to do that. Thanks! > >