On Sun, Oct 04, 2020 at 11:27:50PM +0100, Matthew Wilcox wrote: > On Mon, Oct 05, 2020 at 12:50:49AM +0300, Jarkko Sakkinen wrote: > > What is shoukd take is encl->lock. > > > > The loop was pre-v36 like: > > > > idx_start = PFN_DOWN(start); > > idx_end = PFN_DOWN(end - 1); > > > > for (idx = idx_start; idx <= idx_end; ++idx) { > > mutex_lock(&encl->lock); > > page = radix_tree_lookup(&encl->page_tree, idx); > > mutex_unlock(&encl->lock); > > > > if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) > > return -EACCES; > > } > > > > Looking at xarray.h and filemap.c, I'm thinking something along the > > lines of: > > > > for (idx = idx_start; idx <= idx_end; ++idx) { > > mutex_lock(&encl->lock); > > page = xas_find(&xas, idx + 1); > > mutex_unlock(&encl->lock); > > > > if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) > > return -EACCES; > > } > > > > Does this look about right? > > Not really ... > > int ret = 0; > > mutex_lock(&encl->lock); > rcu_read_lock(); Right, so xa_*() take RCU lock implicitly and xas_* do not. > while (xas.index < idx_end) { > page = xas_next(&xas); It should iterate through every possible page index within the range, even the ones that do not have an entry, i.e. this loop also checks that there are no empty slots. Does xas_next() go through every possible index, or skip the non-empty ones? > if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) > ret = -EACCESS; > break; > } > } > rcu_read_unlock(); > mutex_unlock(&encl->lock); In my Geminilake NUC the maximum size of the address space is 64GB for an enclave, and it is not fixed but can grow in microarchitectures beyond that. That means that in (*artificial*) worst case the locks would be kept for 64*1024*1024*1024/4096 = 16777216 iterations. I just realized that in sgx_encl_load_page ([1], the encl->lock is acquired by the caller) I have used xa_load(), which more or less would be compatible with the old radix_tree pattern, i.e. for (idx = idx_start; idx <= idx_end; ++idx) { mutex_lock(&encl->lock); page = xas_load(&encl->page_array, idx); mutex_unlock(&encl->lock); if (!page || (~page->vm_max_prot_bits & vm_prot_bits)) return -EACCES; } To make things stable again, I'll go with this for the immediate future. > return ret; > > ... or you could rework to use the xa_lock instead of encl->lock. > I don't know how feasible that is for you. encl->lock is used to protect enclave state but it is true that page->vm_max_prort_bits is not modified through concurrent access, once the page is added (e.g. by the reclaimer, which gets pages through sgx_activate_page_list, not through xarray). It's an interesting idea, but before even considering it I want to fix the bug, even if the fix ought to be somehow unoptimal in terms of performance. Thanks for helping with this. xarray is still somewhat alien to me and most of the code I see just use the iterator macros excep mm/*, but I'm slowly adapting the concepts. [1] https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git/tree/arch/x86/kernel/cpu/sgx/encl.c [2] https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-sgx.git/tree/arch/x86/kernel/cpu/sgx/main.c /Jarkko