On Tue, Feb 21, 2023 at 10:55 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Mon, Feb 20, 2023 at 02:10:24PM -0500, Pasha Tatashin wrote: > > The discussion should include the following topics: > > - Interaction with folio and the proposed struct page {memdesc}. > > - Handling for migrate_pages() and friends. > > - Handling for FOLL_PIN and FOLL_LONGTERM. > > - What type of madvise() properties the som memory should handle > > Something I didn't see covered was how you'd want to handle memory > pressure. The answer for memdescs is that we'd treat each userspace Indeed, this is something that should be covered. I had a few thoughts about that, but it needs more work. Some possibilities: 1. When memory is pressured we can migrate pages to normal memory, and that would enable that memory to become swappable etc. 2. Teach in-memory compressions such as zswap/zram to work directly with /dev/som. > allocation as a single object; if you allocate a 256kB folio, that has > one accessed bit (set every time any of the PTEs which reference that > folio is accessed), one dirty bit, is aged on the LRU as a single unit > and will be written to swap as a single unit. > > Assuming we're dealing with objects smaller than PMDs, we have a number > of PTEs each of which has its own A and D bits, so we can determine > at each revolution of the LRU clock whether it still makes sense to be > treating the folio as a single unit, or whether pages in the first half > of the folio are no longer being accessed and we should split the folio > in half and age the two halves separately. Interesting > > All of that is still theoretical; we don't allocate anon memory in sizes > other than PAGE_SIZE and PMD size. And we don't track page cache A and > D bits to see whether the decision to allocate a particular page size was > the right one (most page cache memory is never mapped into userspace, so > it might be of limited value, but I'm sure we could track the equivalent > information with read() and write()). Thanks, Pasha