On Fri, Feb 11, 2022 at 8:48 PM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > > On 2/11/22 05:07, Muchun Song wrote: > > On Fri, Feb 11, 2022 at 3:34 AM Joao Martins <joao.m.martins@xxxxxxxxxx> wrote: > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index cface1d38093..c10df2fd0ec2 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -6666,6 +6666,20 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, > >> } > >> } > >> > >> +/* > >> + * With compound page geometry and when struct pages are stored in ram most > >> + * tail pages are reused. Consequently, the amount of unique struct pages to > >> + * initialize is a lot smaller that the total amount of struct pages being > >> + * mapped. This is a paired / mild layering violation with explicit knowledge > >> + * of how the sparse_vmemmap internals handle compound pages in the lack > >> + * of an altmap. See vmemmap_populate_compound_pages(). > >> + */ > >> +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, > >> + unsigned long nr_pages) > >> +{ > >> + return !altmap ? 2 * (PAGE_SIZE/sizeof(struct page)) : nr_pages; > >> +} > >> + > > > > This means only the first 2 pages will be modified, the reset 6 or 4094 pages > > do not. In the HugeTLB case, those tail pages are mapped with read-only > > to catch invalid usage on tail pages (e.g. write operations). Quick question: > > should we also do similar things on DAX? > > > What's sort of in the way of marking deduplicated pages as read-only is one > particular CONFIG_DEBUG_VM feature, particularly page_init_poison(). HugeTLB > gets its memory from the page allocator of already has pre-populated (at boot) > system RAM sections and needs those to be 'given back' before they can be > hotunplugged. So I guess it never goes through page_init_poison(). Although > device-dax, the sections are populated and dedicated to device-dax when > hotplugged, and then on hotunplug when the last user devdax user drops the page > reference. > > So page_init_poison() is called on those two occasions. It actually writes to > whole sections of memmap, not just one page. So either I gate read-only page > protection when CONFIG_DEBUG_VM=n (which feels very wrong), or I detect inside > page_init_poison() that the caller is trying to init compound devmap backed > struct pages that were already watermarked (i.e. essentially when pfn offset > between passed page and head page is bigger than 128). Got it. I haven't realized page_init_poison() will poison the struct pages. I agree with you that mapping with read-only is wrong. Thanks.