On Wed, Nov 9, 2022 at 8:16 AM Luck, Tony <tony.luck@xxxxxxxxx> wrote: > > > I think that another viewpoint of how we prioritize memory type to scan > > is kernel vs userspace memory. Current hwpoison mechanism does little to > > recover from errors in kernel pages (slab, reserved), so there seesm > > little benefit to detect such errors proactively and beforehand. If the > > resource for scanning is limited, the user might think of focusing on > > scanning userspace memory. > > Page cache is (in some many use cases) a large user of kernel memory, and there > would be options for recovery if errors were pre-emptively found: clean page -> > re-read from storage, modified page -> mark in some way to force EIO for read() > and fail(?) mmap(). > > -Tony Adding the page cache into discussion, I would like to separate the memory scanner from mm's recovery mechanism. We want to build an agnostic in-kernel scanner that safely detects memory errors in physical memory. (e.g. for IntelX86 all usable physical pages in e820), ideally without the need to know the "memory type" (owned by user vs kernel? free vs allocated? page cache dirty vs clean? owned by virtualization guest vs host). After the scanner detects a PFN has a memory error, it reports to the memory-failure module, who classifies the type of the memory page and takes recovery actions accordingly. (For example, page cache will be handled by me_pagecache_dirty/clean, I believe that's basically what Tony described) So the proactive scanner should always improve the kernel's memory reliability by recovering more error pages and recover proactively (not waiting until someone's access). That being said, prioritizing scanning a certain type of memory is then hard (if not impossible). Because the in-kernel background thread design sees all memory the same type, physical memory, to make things simple. The alternative is we assume there is a caller to drive the scanner. This caller can be either userspace or kernel space (our RFC chooses userspace). Then the caller can prioritize or only scan a certain type of memory, but caller has to secure the memory regions before passing to scanner. The "How to Scan" section in RFC has more details. Please do share your opinion/preference for the two designs.