On Thu, Jan 19, 2023 at 7:10 AM <kirill.shutemov@xxxxxxxxxxxxxxx> wrote: > > On Mon, Dec 05, 2022 at 03:40:59PM -0800, Jiaqi Yan wrote: > > Make collapse_file roll back when copying pages failed. More concretely: > > - extract copying operations into a separate loop > > - postpone the updates for nr_none until both scanning and copying > > succeeded > > - postpone joining small xarray entries until both scanning and copying > > succeeded > > - postpone the update operations to NR_XXX_THPS until both scanning and > > copying succeeded > > - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but > > copying failed > > > > Tested manually: > > 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk. > > 1. Start a two-thread application. Each thread allocates a chunk of > > non-huge memory buffer from /mnt/ramdisk. > > 2. Pick 4 random buffer address (2 in each thread) and inject > > uncorrectable memory errors at physical addresses. > > 3. Signal both threads to make their memory buffer collapsible, i.e. > > calling madvise(MADV_HUGEPAGE). > > 4. Wait and then check kernel log: khugepaged is able to recover from > > poisoned pages by skipping them. > > 5. Signal both threads to inspect their buffer contents and make sure no > > data corruption. > > > > Signed-off-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx> > > Okay, looks sane. Thanks for your review, :). > > Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> > > -- > Kiryl Shutsemau / Kirill A. Shutemov