On Tue, May 7, 2024 at 6:39 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 07.05.24 11:24, Barry Song wrote: > > On Tue, May 7, 2024 at 8:59 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > >> > >>>> Let's assume a single subpage of a large folio is no longer mapped. > >>>> Then, we'd have: > >>>> > >>>> nr_pages == folio_nr_pages(folio) - 1. > >>>> > >>>> You could simply map+reuse most of the folio without COWing. > >>> > >>> yes. This is good but the pte which is no longer mapped could be > >>> anyone within the nr_pages PTEs. so it could be quite tricky for > >>> set_ptes. > >> > >> The swap batching logic should take care of that, otherwise it would be > >> buggy. > > > > When you mention "it would be buggy," are you also referring to the current > > fallback approach? or only refer to the future patch which might be able > > to map/reuse "nr_pages - 1" pages? > > swap_pte_batch() should not skip any holes. So consequently, set_ptes() > should do the right thing. (regarding your comment "could be quite ricky > for set_ptes") > > So I think that should be working as expected. maybe not. take a look at my current code, I am goto check_folio with nr_pages = 1 if swap_pte_batch(folio_ptep, nr, folio_pte) != folio_nr_pages(folio). + nr_pages = 1; + ... + if (folio_test_large(folio) && folio_test_swapcache(folio)) { + int nr = folio_nr_pages(folio); + ... + if (!pte_same(folio_pte, pte_move_swp_offset(vmf->orig_pte, -idx)) || + swap_pte_batch(folio_ptep, nr, folio_pte) != nr) + goto check_folio; /* read here, i am falling back nr_pages = 1 */ + + + ... + nr_pages = nr; The fallback(=1) works. but it seems you are proposing set nr_pages = swap_pte_batch(folio_ptep, nr, folio_pte) if (swap_pte_batch(folio_ptep, nr, folio_pte) > 1 && swap_pte_batch(folio_ptep, nr, folio_pte) < nr_pages) ? > > > > > The current patch falls back to setting nr_pages = 1 without mapping or > > reusing nr_pages - 1. I feel your concern doesn't refer to this fallback? > > > >> > >>> > >>>> > >>>> Once we support COW reuse of PTE-mapped THP we'd do the same. Here, it's > >>>> just easy to detect that the folio is exclusive (folio_ref_count(folio) > >>>> == 1 before mapping anything). > >>>> > >>>> If you really want to mimic what do_wp_page() currently does, you should > >>>> have: > >>>> > >>>> exclusive || (folio_ref_count(folio) == 1 && !folio_test_large(folio)) > >>> > >>> I actually dislike the part that do_wp_page() handles the reuse of a large > >>> folio which is entirely mapped. For example, A forks B, B exit, we write > >>> A's large folio, we get nr_pages CoW of small folios. Ideally, we can > >>> reuse the whole folios for writing. > >> > >> Yes, see the link I shared to what I am working on. There isn't really a > >> question if what we do right now needs to be improved and all these > >> scenarios are pretty obvious clear. > > > > Great! I plan to dedicate more time to reviewing your work. > > Nice! And there will be a lot of follow-up optimization work I won't > tackle immediately regarding COW (COW-reuse around, maybe sometimes we > want to COW bigger chunks). > > I still have making PageAnonExclusive a per-folio flag on my TODO list, > that will help the COW-reuse around case a lot. > > > > >> > >>> > >>>> > >>>> Personally, I think we should keep it simple here and use: > >>>> > >>>> exclusive || folio_ref_count(folio) == 1 > >>> > >>> I feel this is still better than > >>> exclusive || (folio_ref_count(folio) == 1 && !folio_test_large(folio)) > >>> as we reuse the whole large folio. the do_wp_page() behaviour > >>> doesn't have this. > >> > >> Yes, but there is the comment "Same logic as in do_wp_page();". We > >> already ran into issues having different COW reuse logic all over the > >> place. For this case here, I don't care if we leave it as > >> > >> "exclusive || folio_ref_count(folio) == 1" > > > > I'm perfectly fine with using the code for this patchset and maybe > > looking for other > > opportunities for potential optimization as an incremental patchset, > > for example, > > reusing the remaining PTEs as suggested by you - "simply map+reuse most of > > the folio without COWing." > > > >> > >> But let's not try inventing new stuff here. > > > > It seems you ignored and snipped my "16 + 14" pages and "15" pages > > example though. but once we support "simply map+reuse most of the > > folio without COWing", the "16+14" problem can be resolved, instead, > > we consume 16 pages. > > > Oh, sorry for skipping that, for me it was rather clear: the partially > mapped folios will be on the deferred split list and the excess memory > can (and will be) reclaimed when there is need. So this temporary memory > consumption is usually not a problem in practice. But yes, something to > optimize (just like COW reuse in general). > > -- > Cheers, > > David / dhildenb >