On 8/8/19 10:27 AM, Catalin Marinas wrote: > On Wed, Aug 07, 2019 at 01:38:16PM -0700, Dave Hansen wrote: > Extending the interface is still possible even with the current > proposal, by changing arg2 etc. We also don't seem to be consistent in > sys_prctl(). We are not consistent because it took a long time to learn this lesson, but I think this is a best-practice that we follow for new ones. >> Also, shouldn't this be converted over to an arch_prctl()? > > What do you mean by arch_prctl()? We don't have such thing, apart from > maybe arch_prctl_spec_ctrl_*(). We achieve the same thing with the > {SET,GET}_TAGGED_ADDR_CTRL macros. They could be renamed to > arch_prctl_tagged_addr_{set,get} or something but I don't see much > point. Silly me. We have an x86-specific: SYSCALL_DEFINE2(arch_prctl, int , option, unsigned long , arg2) I guess there's no ARM equivalent so you're stuck with the generic one. > What would be better (for a separate patch series) is to clean up > sys_prctl() and move the arch-specific options into separate > arch_prctl() under arch/*/kernel/. But it's not really for this series. I think it does make sense for truly arch-specific features to stay out of the arch-generic prctl(). Yes, I know I've personally violated this in the past. :) >> What is the scope of these prctl()'s? Are they thread-scoped or >> process-scoped? Can two threads in the same process run with different >> tagging ABI modes? > > Good point. They are thread-scoped and this should be made clear in the > doc. Two threads can have different modes. > > The expectation is that this is invoked early during process start (by > the dynamic loader or libc init) while in single-thread mode and > subsequent threads will inherit the same mode. However, other uses are > possible. If that's the expectation, it would be really nice to codify it. Basically, you can't enable the feature if another thread is already been forked off. > That said, do we have a precedent for changing user ABI from the kernel > cmd line? 'noexec32', 'vsyscall' I think come close. With the prctl() > for opt-in, controlling this from the cmd line is not too bad (though my > preference is still for the sysctl). There are certainly user-visible things like being able to select various CPU features. >>> +When a process has successfully enabled the new ABI by invoking >>> +prctl(PR_SET_TAGGED_ADDR_CTRL, PR_TAGGED_ADDR_ENABLE), the following >>> +behaviours are guaranteed: >>> + >>> +- Every currently available syscall, except the cases mentioned in section >>> + 3, can accept any valid tagged pointer. The same rule is applicable to >>> + any syscall introduced in the future. >>> + >>> +- The syscall behaviour is undefined for non valid tagged pointers. >> >> Do you really mean "undefined"? I mean, a bad pointer is a bad pointer. >> Why should it matter if it's a tagged bad pointer or an untagged bad >> pointer? > > Szabolcs already replied here. We may have tagged pointers that can be > dereferenced just fine but being passed to the kernel may not be well > defined (e.g. some driver doing a find_vma() that fails unless it > explicitly untags the address). It's as undefined as the current > behaviour (without these patches) guarantees. It might just be nicer to point out what this features *changes* about invalid pointer handling, which is nothing. :) Maybe: The syscall behaviour for invalid pointers is the same for both tagged and untagged pointers. >>> +- prctl(PR_SET_MM, ``*``, ...) other than arg2 PR_SET_MM_MAP and >>> + PR_SET_MM_MAP_SIZE. >>> + >>> +- prctl(PR_SET_MM, PR_SET_MM_MAP{,_SIZE}, ...) struct prctl_mm_map fields. >>> + >>> +Any attempt to use non-zero tagged pointers will lead to undefined >>> +behaviour. >> >> I wonder if you want to generalize this a bit. I think you're saying >> that parts of the ABI that modify the *layout* of the address space >> never accept tagged pointers. > > I guess our difficulty in specifying this may have been caused by > over-generalising. For example, madvise/mprotect came under the same > category but there is a use-case for malloc'ed pointers (and tagged) to > the kernel (e.g. MADV_DONTNEED). If we can restrict the meaning to > address space *layout* manipulation, we'd have mmap/mremap/munmap, > brk/sbrk, prctl(PR_SET_MM). Did I miss anything?. Other related syscalls > like mprotect/madvise preserve the layout while only changing permissions, > backing store, so the would be allowed to accept tags. shmat() comes to mind. I also did a quick grep for mmap_sem taken for write and didn't see anything else obvious jump out at me. >> It looks like the TAG_SHIFT and tag size are pretty baked into the >> aarch64 architecture. But, are you confident that no future >> implementations will want different positions or sizes? (obviously >> controlled by other TCR_EL1 bits) > > For the top-byte-ignore (TBI), that's been baked in the architecture > since ARMv8.0 and we'll have to keep the backwards compatible mode. As > the name implies, it's the top byte of the address and that's what the > document above refers to. > > With MTE, I can't exclude other configurations in the future but I'd > expect the kernel to present the option as a new HWCAP and the user to > explicitly opt in via a new prctl() flag. I seriously doubt we'd break > existing binaries. So, yes TAG_SHIFT may be different but so would the > prctl() above. Basically, what you have is a "turn tagging on" and "turn tagging off" call which are binary: all on or all off. How about exposing a mask: /* Replace hard-coded mask size/position: */ unsigned long mask = prctl(GET_POSSIBLE_TAGGED_ADDR_BITS); if (mask == 0) // no tagging is supported obviously prctl(SET_TAGGED_ADDR_BITS, mask); // now userspace knows via 'mask' where the tag bits are