在 2023/3/31 13:42, HORIGUCHI NAOYA(堀口 直也) 写道:
On Thu, Mar 30, 2023 at 03:45:01PM +0800, Longlong Xia wrote:
hwpoison_user_mappings() is updated to support ksm pages, and add
collect_procs_ksm() to collect processes when the error hit an ksm
page. The difference from collect_procs_anon() is that it also needs
to traverse the rmap-item list on the stable node of the ksm page.
At the same time, add_to_kill_ksm() is added to handle ksm pages. And
task_in_to_kill_list() is added to avoid duplicate addition of tsk to
the to_kill list. This is because when scanning the list, if the pages
that make up the ksm page all come from the same process, they may be
added repeatedly.
Signed-off-by: Longlong Xia <xialonglong1@xxxxxxxxxx>
I don't find any specific issue by code review for now, so I'll try to
test your patches.
Dear maintainer,
Can you please provide a brief update on the testing status of the patch
and any suggestions you may have for improving it?
Thank you for your time.
Best regards,
Longlong Xia
>
I have one comment about duplicated KSM pages. It seems that KSM controls
page duplication by limiting deduplication factor with max_page_sharing,
primarily for performance reason. But I think it's imporant from memory
RAS's viewpoint too because that means we could allow recovery from memory
errors on a KSM page by making affected processes to switch to the duplicated
pages (without killing the processes!). Maybe this might be beyond the scope
of this patchset and I'm not sure how hard it is, but if you are interested
in this issue, that's really nice.
Thanks,
Naoya Horiguchi