On Fri, Dec 04, 2020 at 11:24:56AM -0500, Pavel Tatashin wrote: > On Thu, Dec 3, 2020 at 2:36 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Thu, Dec 03, 2020 at 02:15:36PM -0500, Pavel Tatashin wrote: > > > > > I studied some more, and I think this is not a race: > > > list_add_tail(&head->lru, &cma_page_list) is called only when > > > isolate_lru_page(head) succeeds. > > > isolate_lru_page(head) succeeds only when PageLRU(head) is true. > > > However, in this function we also clear LRU flag before returning > > > success. > > > This means, that if we race with another thread, the other thread > > > won't get to unprotected list_add_tail(&head->lru, &cma_page_list) > > > until head is is back on LRU list. > > > > Oh interesting, I totally didn't see how that LRU stuff is > > working. So.. this creates a ridiculously expensive spin lock? Not > > broken, but yikes :| > > Not really a spin lock, the second thread won't be able to isolate > this page, and will skip migration of this page. It looks like the intent is that it will call gup again, then goto check_again, and once again try to isolate the LRU. ie it loops. If it gets to a point where all the CMA pages fail to isolate then it simply exits with success as the cma_page_list will be empty. Is this a bug? It seems like a bug, the invariant here is to not return with a CMA page, so why do we have a path that does return with a CMA page? Jason