On 05/01/2024 08:56, David Hildenbrand wrote: >>>>>> If I am not wrong, that triggers: >>>>>> >>>>>> VM_WARN_ON_FOLIO(folio_test_large(folio) && >>>>>> !folio_test_large_rmappable(folio), folio); >>>>>> >>>>>> So we are trying to rmap a large folio that did not go through >>>>>> folio_prep_large_rmappable(). >> >> Would someone mind explaining the rules to me for this? As far as I can see, >> folio_prep_large_rmappable() just inits the _deferred_list and sets a flag so we >> remember to deinit the list on destruction. Why can't we just init that list for >> all folios order-2 or greater? Then everything is rmappable? > > I think we much rather want to look into moving all mapcount-related stuff into > folio_prep_large_rmappable(). It doesn't make any sense to initialize that for > any compound pages, especially the ones that will never get mapped to user space. > >> >>>>>> >>>>>> net/packet/af_packet.c calls vm_insert_page() on some pages/folios stoed >>>>>> in the "struct packet_ring_buffer". No idea where that comes from, but I >>>>>> suspect it's simply some compound allocation. >>>>> Looks like: >>>>> alloc_pg_vec >>>>> alloc_one_pg_vec_page >>>>> gfp_t gfp_flags = GFP_KERNEL | __GFP_COMP | >>>>> __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY; >>>>> >>>>> buffer = (char *) __get_free_pages(gfp_flags, order); >>>>> So you are right here... :). >>>> >>>> Hm, but I wonder if this something that's supposed to work or is this one of >>>> the cases where we should actually use a VM_PFN mapping? >>>> >>>> It's not a pagecache(file/shmem) page after all. >>>> >>>> We could relax that check and document why we expect something that is not >>>> marked rmappable. But it fells wrong. I suspect this should be a VM_PFNMAP >>>> instead (like recent udmabuf changes). >>> >>> VM_PFNMAP looks correct. >> >> And why is making the folio rmappable and mapping it the normal way not the >> right solution here? Because the folio could be order-1? Or something more >> profound? >> > > Think about it: we are adding/removing a page from rmap handling that can > *never* be looked up using the rmap because there is no rmap for these pages, > and folio->index is just completely unexpressive. VM_MIXEDMAP doesn't have any > linearity constraints. I guess I was assuming treating it the same way as anon folios. But I guess that would be VeryBad (TM) because these aren't anon pages and we don't want to swap, etc? OK got it. > > Logically, it doesn't make any sense to involve rmap code although it currently > might work. validate_page_before_insert() blocks off most pages where the > order-0 mapcount would be used for other purposes and everything would blow up. > > Looking at vm_insert_page(), this interface is only for pages the caller > allocated. Maybe we should just not do any rmap accounting when > mapping/unmapping these pages: not involve any rmap code, including mapcounts? > > vm_normal_page() works on these mappings, so we'd also have to skip rmap code > when unmapping these pages etc. Maybe that's the whole reason we have the rmap > handling here: to not special-case the unmap path. Right. I guess it depends what vm_insert_page() is spec'ed to expect; is the bug in the implementation or is the caller providing the wrong type of folio? I guess there will be many callers providing non-rmappable folios (inc out of tree?). > > Alternatively, we can: > > (1) Require the caller to make sure large folios are rmappable. We > already require allocations to be compound. Should be easy to add. I'm not sure this is practical given vm_insert_page() is exported? > (2) Allow non-rmappable folios in rmap code just for mapcount tracking. > Confusing but possible. > >>> >>> I do have another question: why do we just check the large folio >>> rmappable? Does that mean order0 folio is always rmappable? >>> > > We didn't really have a check for that I believe. We simply reject all pages in > vm_insert_page() that are problematic because the pagecount is overloaded. > >>> I ask this because vm_insert_pages() is called in net/ipv4/tcp.c >>> and drivers call vm_insert_page. I suppose they all need be VM_PFNMAP. > > Right, similar problem. >