On Wed, 2021-11-10 at 19:26 -0800, Luck, Tony wrote: > On Thu, Nov 11, 2021 at 04:55:14AM +0200, Jarkko Sakkinen wrote: > > On Wed, 2021-11-10 at 10:51 -0800, Reinette Chatre wrote: > > > sgx_should_reclaim() would only succeed when sgx_nr_free_pages goes > > > below the watermark. Once sgx_nr_free_pages becomes corrupted there is > > > no clear way in which it can correct itself since it is only ever > > > incremented or decremented. > > > > So one scenario would be: > > > > 1. CPU A does a READ of sgx_nr_free_pages. > > 2. CPU B does a READ of sgx_nr_free_pages. > > 3. CPU A does a STORE of sgx_nr_free_pages. > > 4. CPU B does a STORE of sgx_nr_free_pages. > > > > ? > > > > That does corrupt the value, yes, but I don't see anything like this > > in the commit message, so I'll have to check. > > > > I think the commit message is lacking a concurrency scenario, and the > > current transcripts are a bit useless. > > What about this part: > > With sgx_nr_free_pages accessed and modified from a few places > it is essential to ensure that these accesses are done safely but > this is not the case. sgx_nr_free_pages is read without any > protection and updated with inconsistent protection by any one > of the spin locks associated with the individual NUMA nodes. > For example: > > CPU_A CPU_B > ----- ----- > spin_lock(&nodeA->lock); spin_lock(&nodeB->lock); > ... ... > sgx_nr_free_pages--; /* NOT SAFE */ sgx_nr_free_pages--; > > spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock); > > Maybe you missed the "NOT SAFE" hidden in the middle of > the picture? > > -Tony For me from that the ordering is not clear. E.g. compare to https://www.kernel.org/doc/Documentation/memory-barriers.txt /Jarkko