Re: [PATCH v12 0/3] Memory poison recovery in khugepaged collapsing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 4, 2023 at 11:44 AM Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote:
>
> Friendly ping for review :)

Both I and Hugh already gave reviewed/acked for the previous version.
Since there were just some minor changes so you could keep the
reviewed/acked from the previous version.

>
> On Wed, Mar 29, 2023 at 8:11 AM Jiaqi Yan <jiaqiyan@xxxxxxxxxx> wrote:
>>
>> Problem
>> =======
>> Memory DIMMs are subject to multi-bit flips, i.e. memory errors.
>> As memory size and density increase, the chances of and number of
>> memory errors increase. The increasing size and density of server
>> RAM in the data center and cloud have shown increased uncorrectable
>> memory errors. There are already mechanisms in the kernel to recover
>> from uncorrectable memory errors. This series of patches provides
>> the recovery mechanism for the particular kernel agent khugepaged
>> when it collapses memory pages.
>>
>> Impact
>> ======
>> The main reason we chose to make khugepaged collapsing tolerant of
>> memory failures was its high possibility of accessing poisoned memory
>> while performing functionally optional compaction actions.
>> Standard applications typically don't have strict requirements on
>> the size of its pages. So they are given 4K pages by the kernel.
>> The kernel is able to improve application performance by either
>>
>>   1) giving applications 2M pages to begin with, or
>>   2) collapsing 4K pages into 2M pages when possible.
>>
>> This collapsing operation is done by khugepaged, a kernel agent that
>> is constantly scanning memory. When collapsing 4K pages into a 2M page,
>> it must copy the data from the 4K pages into a physically contiguous
>> 2M page. Therefore, as long as there exists one poisoned cache line in
>> collapsible 4K pages, khugepaged will eventually access it. The current
>> impact to users is a machine check exception triggered kernel panic.
>> However, khugepaged’s compaction operations are not functionally required
>> kernel actions. Therefore making khugepaged tolerant to poisoned memory
>> will greatly improve user experience.
>>
>> This patch series is for cases where khugepaged is the first guy
>> that detects the memory errors on the poisoned pages. IOW, the pages
>> are not known to have memory errors when khugepaged collapsing gets to
>> them. In our observation, this happens frequently when the huge page
>> ratio of the system is relatively low, which is fairly common in
>> virtual machines running on cloud.
>>
>> Solution
>> ========
>> As stated before, it is less desirable to crash the system only because
>> khugepaged accesses poisoned pages while it is collapsing 4K pages.
>> The high level idea of this patch series is to skip the group of pages
>> (usually 512 4K-size pages) once khugepaged finds one of them is poisoned,
>> as these pages have become ineligible to be collapsed.
>>
>> We are also careful to unwind operations khuagepaged has performed before
>> it detects memory failures. For example, before copying and collapsing
>> a group of anonymous pages into a huge page, the source pages will be
>> isolated and their page table is unlinked from their PMD. These operations
>> need to be undone in order to ensure these pages are not changed/lost from
>> the perspective of other threads (both user and kernel space). As for
>> file backed memory pages, there already exists a rollback case. This
>> patch just extends it so that khugepaged also correctly rolls back when
>> it fails to copy poisoned 4K pages.
>>
>> Changelog
>> =========
>> v12 changes
>> - Incorporate feedbacks from Shi Yang <shy828301@xxxxxxxxx>.
>> - Drop unused pmd from __collapse_huge_page_copy_succeeded.
>> - Drop unused address from __collapse_huge_page_copy_failed.
>> - smp_mb() should be after filemap_nr_thps_dec.
>> - This revision is rebased to mm-unstable at commit 9b175ce664d33
>>   ("mm: move free_area_empty() to mm/internal.h")
>>
>> v11 changes
>> - Incorporate feedbacks from Shi Yang <shy828301@xxxxxxxxx> and Hugh
>>   Dickins <hughd@xxxxxxxxxx>
>> - Replace releasing pages for-loop with release_pte_pages in
>>   __collapse_huge_page_copy_failed.
>> - Rename pte_ptl to ptl in __collapse_huge_page_copy_succeeded.
>> - Fix a bug in __collapse_huge_page_copy_succeeded: ptep_clear should be
>>   used instead of pte_clear.
>> - Drop _address in __collapse_huge_page_copy_succeeded.
>> - Add smp_mb() before updating filemap_nr_thps_dec.
>> - Move `nr = thp_nr_pages()` closer to its references.
>> - Remove an unnecessary goto statement.
>> - This revision is rebased to mm-unstable at commit b4e1277ee31db
>>   ("xtensa: reword ARCH_FORCE_MAX_ORDER prompt and help text")
>>
>> v10 changes
>> - Incorporate feedbacks from Kirill A. Shutemov
>>   <kirill.shutemov@xxxxxxxxxxxxxxx>
>> - Refactor the 2nd loop (after the loop for copying memory) into 2 helper
>>   functions, one for actions to take when copying succeeded, one for when
>>   copying failed due to #MC.
>> - Use copy_mc_user_highpage for anonymous memory.
>> - Introduce copy_mc_highpage and use it for file-backed memory.
>> - Rename the original PMD from `rollback` to `orig_pmd`.
>> - Some minor changes in comments, e.g. `normal page` to `raw page`.
>> - This revision is rebased to mm-unstable at commit df3ae4347aff9
>>   ("dma-buf: system_heap: avoid reclaim for order 4")
>>
>> v9 changes
>> - Incorporate feedback from Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> - Move copy_mc_highpage into khugepage.c as a static out-of-line
>>   function copy_mc_page.
>>
>> v8 changes
>> - Incorporate feedbacks from Tony Luck <tony.luck@xxxxxxxxx>
>> - Rename copy_highpage_mc to copy_mc_highpage.
>> - Update copy_mc_highpage with kmsan changes.
>> - Code style changes:
>>   1) copy_mc_highpage returns int as "copy" is an action and is consistent
>>      with copy_mc_user_highpage.
>>   2) __collapse_huge_page_copy returns scan_result(int) and is consistent
>>      with __collapse_huge_page_isolate/swapin.
>>   3) variables are declared in separate lines in collapse_file.
>>
>> v7 changes
>> - Fix a bug "KASAN: stack-out-of-bounds Read in collapse_file". After
>>   copying all pages into the huge page, clear_highpage should use index
>>   instead of page->index.
>>
>> v6 changes
>> - Address comments from Kirill Shutemov <kirill@xxxxxxxxxxxxx>
>> - Rewrite __collapse_huge_page_copy to make rollback operations more
>>   clear to its reader.
>> - Add detailed test steps in each commit message.
>>
>> v5 changes
>> - Rebase patches to mm-unstable at
>>   commit ffb39098bf87 ("Merge tag 'linux-kselftest-kunit-6.1-rc1' of
>>   git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest").
>> - Resolves conflicts with:
>>   commit 2f55f070e5b8 ("mm/khugepaged: minor cleanup for collapse_file")
>>   commit 1baec203b77c ("mm/khugepaged: try to free transhuge swapcache
>>   when possible")
>>
>> v4 changes
>> - Incorporate feedbacks from Yang Shi <shy828301@xxxxxxxxx>
>> - Remove tracepoint for __collapse_huge_page_copy, just keep SCAN_COPY_MC
>>   and let trace_mm_collapse_huge_page it
>> - Remove unnecessary comments
>>
>> v3 changes
>> - Incorporate feedbacks from Yang Shi <shy828301@xxxxxxxxx>
>> - Add tracepoint for __collapse_huge_page_copy
>> - Restore PMD in collapse_huge_page
>> - Correct comment about mmap_read_lock
>>
>> v2 changes
>> - Incorporate feedbacks from Yang Shi <shy828301@xxxxxxxxx>
>> - Only keep copy_highpage_mc
>> - Adding new scan_result SCAN_COPY_MC
>> - Defer NR_FILE_THPS update until copying succeeded
>>
>> Jiaqi Yan (3):
>>   mm/khugepaged: recover from poisoned anonymous memory
>>   mm/hwpoison: introduce copy_mc_highpage
>>   mm/khugepaged: recover from poisoned file-backed memory
>>
>>  include/linux/highmem.h            |  54 ++++++--
>>  include/trace/events/huge_memory.h |   3 +-
>>  mm/khugepaged.c                    | 200 ++++++++++++++++++++++-------
>>  3 files changed, 198 insertions(+), 59 deletions(-)
>>
>> --
>> 2.40.0.348.gf938b09366-goog
>>





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux