On Sat, Sep 12, 2020 at 09:35:14AM +0800, Hillf Danton wrote: > > Migrate is the cause, very tricky: > > > > CPU0 CPU1 > > ucma_destroy_id() > > ucma_migrate_id() > > ucma_get_ctx() > > xa_lock() > > _ucma_find_context() > > xa_erase() > > xa_lock() > > ctx->file = new_file > > list_move() > > xa_unlock() > > ucma_put_ctx > > ucma_close() > > _destroy_id() > > > > _destroy_id() > > wait_for_completion() > > // boom > > > > > > ie the destrory_id() on the initial FD captures the ctx right before > > migrate moves it, then the new FD closes calling destroy while the > > other destroy is still running. > > More trouble now understanding that the ctx is reported to be freed > in the write path, while if I dont misread the chart above, you're > trying to pull another closer after migrate into the race. migrate moves the ctx between two struct file's, so the race is to be destroying on fir the first struct file, move to the second struct file, then close the second struct file. Now close and destroy_id can race directly, which shouldn't be allowed. Jason