On Fri, Apr 20, 2018 at 12:57:41PM -0700, Tim Chen wrote: > On 04/04/2018 12:17 PM, jglisse@xxxxxxxxxx wrote: > > From: Jérôme Glisse <jglisse@xxxxxxxxxx> > > > > https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc > > > > This is an RFC for LSF/MM discussions. It impacts the file subsystem, > > the block subsystem and the mm subsystem. Hence it would benefit from > > a cross sub-system discussion. > > > > Patchset is not fully bake so take it with a graint of salt. I use it > > to illustrate the fact that it is doable and now that i did it once i > > believe i have a better and cleaner plan in my head on how to do this. > > I intend to share and discuss it at LSF/MM (i still need to write it > > down). That plan lead to quite different individual steps than this > > patchset takes and his also easier to split up in more manageable > > pieces. > > > > I also want to apologize for the size and number of patches (and i am > > not even sending them all). > > > > ---------------------------------------------------------------------- > > The Why ? > > > > I have two objectives: duplicate memory read only accross nodes and or > > devices and work around PCIE atomic limitations. More on each of those > > objective below. I also want to put forward that it can solve the page > > wait list issue ie having each page with its own wait list and thus > > avoiding long wait list traversale latency recently reported [1]. > > > > It does allow KSM for file back pages (truely generic KSM even between > > both anonymous and file back page). I am not sure how useful this can > > be, this was not an objective i did pursue, this is just a for free > > feature (see below). > > > > [1] https://groups.google.com/forum/#!topic/linux.kernel/Iit1P5BNyX8 > > > > ---------------------------------------------------------------------- > > Per page wait list, so long page_waitqueue() ! > > > > Not implemented in this RFC but below is the logic and pseudo code > > at bottom of this email. > > > > When there is a contention on struct page lock bit, the caller which > > is trying to lock the page will add itself to a waitqueue. The issues > > here is that multiple pages share the same wait queue and on large > > system with a lot of ram this means we can quickly get to a long list > > of waiters for differents pages (or for the same page) on the same > > list [1]. > > Your approach seems useful if there are lots of locked pages sharing > the same wait queue. > > That said, in the original workload from our customer with the long wait queue > problem, there was a single super hot page getting migrated, and it > is being accessed by all threads which caused the big log jam while they wait for > the migration to get completed. > With your approach, we will still likely end up with a long queue > in that workload even if we have per page wait queue. > > Thanks. Ok so i re-read the thread, i was writting this cover letter from memory and i had bad recollection of your issue, so sorry. First, do you have a way to reproduce the issue ? Something easy would be nice :) So what i am proposing for per page wait queue would only marginaly help you (it might not even be mesurable in your workload). It would certainly make the code smaller and easier to understand i believe. Now that i have look back at your issue i think there is 2 things we should do. First keep migration page map read only, this would at least avoid CPU read fault. In trace you captured i wasn't able to ascertain if this were read or write fault. Second idea i have is about NUMA, everytime we NUMA migrate a page we could attach a temporary struct to the page (using page->mapping). So if we scan that page again we can inspect information about previous migration and see if we are not over migrating that page (ie bouncing it all over). If so we can mark the page (maybe with a page flag if we can find one) to protect it from further migration. That temporary struct would be remove after a while, ie autonuma would preallocate a bunch of those and keep an LRU of them and recycle the oldest when it needs a new one to migrate another page. LSF/MM slots: Michal can i get 2 slots to talk about this ? MM only discussion, one to talk about doing migration with page map read only but write protected while migration is happening. The other one to talk about attaching auto NUMA tracking struct to page. Cheers, Jérôme