Re: [PATCH v9 2/2] mm/khugepaged: recover from poisoned file-backed memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 19, 2023 at 7:10 AM <kirill.shutemov@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, Dec 05, 2022 at 03:40:59PM -0800, Jiaqi Yan wrote:
> > Make collapse_file roll back when copying pages failed. More concretely:
> > - extract copying operations into a separate loop
> > - postpone the updates for nr_none until both scanning and copying
> >   succeeded
> > - postpone joining small xarray entries until both scanning and copying
> >   succeeded
> > - postpone the update operations to NR_XXX_THPS until both scanning and
> >   copying succeeded
> > - for non-SHMEM file, roll back filemap_nr_thps_inc if scan succeeded but
> >   copying failed
> >
> > Tested manually:
> > 0. Enable khugepaged on system under test. Mount tmpfs at /mnt/ramdisk.
> > 1. Start a two-thread application. Each thread allocates a chunk of
> >    non-huge memory buffer from /mnt/ramdisk.
> > 2. Pick 4 random buffer address (2 in each thread) and inject
> >    uncorrectable memory errors at physical addresses.
> > 3. Signal both threads to make their memory buffer collapsible, i.e.
> >    calling madvise(MADV_HUGEPAGE).
> > 4. Wait and then check kernel log: khugepaged is able to recover from
> >    poisoned pages by skipping them.
> > 5. Signal both threads to inspect their buffer contents and make sure no
> >    data corruption.
> >
> > Signed-off-by: Jiaqi Yan <jiaqiyan@xxxxxxxxxx>
>
> Okay, looks sane.

Thanks for your review, :).

>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
>
> --
>   Kiryl Shutsemau / Kirill A. Shutemov




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux