On Fri, Mar 25, 2022 at 2:42 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > On Fri, Mar 25, 2022 at 2:11 PM Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote: > > > > On Thu, Mar 24, 2022 at 7:51 PM Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > > > On Wed, Mar 23, 2022 at 4:29 PM Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote: > > > > > > > > Problem > > > > ======= > > > > Memory DIMMs are subject to multi-bit flips, i.e. memory errors. > > > > As memory size and density increase, the chances of and number of > > > > memory errors increase. The increasing size and density of server > > > > RAM in the data center and cloud have shown increased uncorrectable > > > > memory errors. There are already mechanisms in the kernel to recover > > > > from uncorrectable memory errors. This series of patches provides > > > > the recovery mechanism for the particular kernel agent khugepaged. > > > > > > > > Impact > > > > ====== > > > > The main reason we chose to make khugepaged tolerant of memory failures > > > > was its high possibility of accessing poisoned memory while performing > > > > functionally optional compaction actions. Standard applications > > > > typically don't have strict requirements on the size of its pages. > > > > So they are given 4K pages by the kernel. The kernel is able to improve > > > > application performance by either 1) giving application 2M pages > > > > to begin with, or 2) collapsing 4K pages into 2M pages when possible. > > > > This collapsing operation is done by khugepaged, a kernel agent that > > > > is constantly scanning memory. When collapsing 4K pages into a 2M page, > > > > it must copy the data from the 4K pages into a physically contiguous > > > > 2M page. Therefore, as long as there exists one poisoned cache line in > > > > collapsible 4K pages, khugepaged will eventually access it. The current > > > > impact to users is a machine check exception triggered kernel panic. > > > > However, khugepaged’s compaction operations are not functionally required > > > > kernel actions. Therefore making khugepaged tolerant to poisoned memory > > > > will greatly improve user experience. > > > > > > > > Solution > > > > ======== > > > > As stated before, it is less desirable to crash the system only because > > > > khugepaged accesses poisoned pages while it is collapsing 4K pages. > > > > The high level idea of this patch series is to skip the group of pages > > > > (usually 512 4K-size pages) once khugepaged finds one of them is poisoned, > > > > as these pages have become ineligible to be collapsed. > > > > > > > > We are also careful to unwind operations khuagepaged has performed before > > > > it detects memory failures. For example, before copying and collapsing > > > > a group of anonymous pages into a huge page, the source pages will be > > > > isolated and their page table is unlinked from their PMD. These operations > > > > need to be undone in order to ensure these pages are not changed/lost from > > > > the perspective of other threads (both user and kernel space). As for > > > > file backed memory pages, there already exists a rollback case. This > > > > patch just extends it so that khugepaged also correctly rolls back when > > > > it fails to copy poisoned 4K pages. > > > > > > Actually I should asked the question in the first place before diving > > > into the implementation details, if uncorrectable memory error > > > happens, kernel will pin the poisoned page and set hwpoison flag, the > > > bumped page refcount would prevent the page from being collapsed IIUC. > > > > This patch series is for cases where khugepaged is the first guy that detects > > the memory errors on these poisoned pages. IOW, the pages are not known to > > have memory errors when khugepaged collapsing gets to them. > > In our observation, this happens frequently when the huge page ratio of > > the system is relatively low, which is fairly common in cloud VMs. > > Thanks, this is the very important information that needs to be caught > in the 1st patch's commit log. Thanks for this valuable feedback. I will add this in the commit msg of v2, but I will wait for your comments on patch 2/2 before sending out v2. > > > > > > > > > > So I'm wondering why we need this? > > > > > > > > > > > Jiaqi Yan (2): > > > > mm: khugepaged: recover from poisoned anonymous memory > > > > mm: khugepaged: recover from poisoned file-backed memory > > > > > > > > include/linux/highmem.h | 37 +++++++ > > > > mm/khugepaged.c | 211 +++++++++++++++++++++++++++++----------- > > > > 2 files changed, 189 insertions(+), 59 deletions(-) > > > > > > > > -- > > > > 2.35.1.894.gb6a874cedc-goog > > > >