On Fri, Aug 16, 2024 at 02:52:28PM +0000, Edgecombe, Rick P wrote: > On Fri, 2024-08-16 at 09:44 +0100, Catalin Marinas wrote: > > > After a token is consumed normally, it doesn't set it to zero. Instead it > > > sets it to a "previous-ssp token". I don't think we actually want to do that here > > > though because it involves the old SSP, which doesn't really apply in this > > > case. I don't see any problem with zero, but was there any special thinking behind > > > it? > > > > BTW, since it's the parent setting up the shadow stack in its own > > address space before forking, I think at least the read can avoid > > access_remote_vm() and we could do it earlier, even before the new > > process is created. > > Hmm. Makes sense. It's a bit racy since the parent could consume that token from > another thread, but it would be a race in any case. More on the race below. If we handle it properly, we don't need the separate checks. > > > > + if (access_remote_vm(mm, addr, &val, sizeof(val), > > > > + FOLL_FORCE | FOLL_WRITE) != sizeof(val)) > > > > + goto out; > > > > > > The GUPs still seem a bit unfortunate for a couple reasons: > > > - We could do a CMPXCHG version and are just not (I see ARM has identical > > > code in gcs_consume_token()). It's not the only race like this though FWIW. > > > - I *think* this is the only unprivileged FOLL_FORCE that can write to the > > > current process in the kernel. As is, it could be used on normal RO > > > mappings, at > > > least in a limited way. Maybe another point for the VMA check. We'd want to > > > check that it is normal shadow stack? > > > - Lingering doubts about the wisdom of doing GUPs during task creation. > > > > I don't like the access_remote_vm() either. In the common (practically > > only) case with CLONE_VM, the mm is actually current->mm, so no need for > > a GUP. > > On the x86 side, we don't have a shadow stack access CMPXCHG. We will have to > GUP and do a normal CMPXCHG off of the direct map to handle it fully properly in > any case (CLONE_VM or not). I guess we could do the same here and for the arm64 gcs_consume_token(). Basically get_user_page_vma_remote() gives us the page together with the vma that you mentioned needs checking. We can then do a cmpxchg directly on the page_address(). It's probably faster anyway than doing GUP twice. -- Catalin