On 2/11/22 05:07, Muchun Song wrote: > On Fri, Feb 11, 2022 at 3:34 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index cface1d38093..c10df2fd0ec2 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, >> } >> } >> >> +/* >> + * With compound page geometry and when struct pages are stored in ram most >> + * tail pages are reused. Consequently, the amount of unique struct pages to >> + * initialize is a lot smaller that the total amount of struct pages being >> + * mapped. This is a paired / mild layering violation with explicit knowledge >> + * of how the sparse_vmemmap internals handle compound pages in the lack >> + * of an altmap. See vmemmap_populate_compound_pages(). >> + */ >> +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, >> + unsigned long nr_pages) >> +{ >> + return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages; >> +} >> + > > This means only the first 2 pages will be modified, the reset 6 or 4094 pages > do not. In the HugeTLB case, those tail pages are mapped with read-only > to catch invalid usage on tail pages (e.g. write operations). Quick question: > should we also do similar things on DAX? > What's sort of in the way of marking deduplicated pages as read-only is one particular CONFIG_DEBUG_VM feature, particularly page_init_poison(). HugeTLB gets its memory from the page allocator of already has pre-populated (at boot) system RAM sections and needs those to be 'given back' before they can be hotunplugged. So I guess it never goes through page_init_poison(). Although device-dax, the sections are populated and dedicated to device-dax when hotplugged, and then on hotunplug when the last user devdax user drops the page reference. So page_init_poison() is called on those two occasions. It actually writes to whole sections of memmap, not just one page. So either I gate read-only page protection when CONFIG_DEBUG_VM=n (which feels very wrong), or I detect inside page_init_poison() that the caller is trying to init compound devmap backed struct pages that were already watermarked (i.e. essentially when pfn offset between passed page and head page is bigger than 128).