On Thu, Mar 30, 2023 at 03:45:01PM +0800, Longlong Xia wrote: > hwpoison_user_mappings() is updated to support ksm pages, and add > collect_procs_ksm() to collect processes when the error hit an ksm > page. The difference from collect_procs_anon() is that it also needs > to traverse the rmap-item list on the stable node of the ksm page. > At the same time, add_to_kill_ksm() is added to handle ksm pages. And > task_in_to_kill_list() is added to avoid duplicate addition of tsk to > the to_kill list. This is because when scanning the list, if the pages > that make up the ksm page all come from the same process, they may be > added repeatedly. > > Signed-off-by: Longlong Xia <xialonglong1@xxxxxxxxxx> I don't find any specific issue by code review for now, so I'll try to test your patches. I have one comment about duplicated KSM pages. It seems that KSM controls page duplication by limiting deduplication factor with max_page_sharing, primarily for performance reason. But I think it's imporant from memory RAS's viewpoint too because that means we could allow recovery from memory errors on a KSM page by making affected processes to switch to the duplicated pages (without killing the processes!). Maybe this might be beyond the scope of this patchset and I'm not sure how hard it is, but if you are interested in this issue, that's really nice. Thanks, Naoya Horiguchi