On Thu, Nov 11, 2021 at 04:55:14AM +0200, Jarkko Sakkinen wrote: > On Wed, 2021-11-10 at 10:51 -0800, Reinette Chatre wrote: > > sgx_should_reclaim() would only succeed when sgx_nr_free_pages goes > > below the watermark. Once sgx_nr_free_pages becomes corrupted there is > > no clear way in which it can correct itself since it is only ever > > incremented or decremented. > > So one scenario would be: > > 1. CPU A does a READ of sgx_nr_free_pages. > 2. CPU B does a READ of sgx_nr_free_pages. > 3. CPU A does a STORE of sgx_nr_free_pages. > 4. CPU B does a STORE of sgx_nr_free_pages. > > ? > > That does corrupt the value, yes, but I don't see anything like this > in the commit message, so I'll have to check. > > I think the commit message is lacking a concurrency scenario, and the > current transcripts are a bit useless. What about this part: With sgx_nr_free_pages accessed and modified from a few places it is essential to ensure that these accesses are done safely but this is not the case. sgx_nr_free_pages is read without any protection and updated with inconsistent protection by any one of the spin locks associated with the individual NUMA nodes. For example: CPU_A CPU_B ----- ----- spin_lock(&nodeA->lock); spin_lock(&nodeB->lock); ... ... sgx_nr_free_pages--; /* NOT SAFE */ sgx_nr_free_pages--; spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock); Maybe you missed the "NOT SAFE" hidden in the middle of the picture? -Tony