On Apr 7, 2016 5:12 AM, "Dmitry Safonov" <dsafonov@xxxxxxxxxxxxx> wrote: > > On 04/06/2016 09:04 PM, Andy Lutomirski wrote: >> >> [cc Dave Hansen for MPX] >> >> On Apr 6, 2016 9:30 AM, "Dmitry Safonov" <dsafonov@xxxxxxxxxxxxx> wrote: >>> >>> Now each process that runs natively on x86_64 may execute 32-bit code >>> by proper setting it's CS selector: either from LDT or reuse Linux's >>> USER32_CS. The vice-versa is also valid: running 64-bit code in >>> compatible task is also possible by choosing USER_CS. >>> So we may switch between 32 and 64 bit code execution in any process. >>> Linux will choose the right syscall numbers in entries for those >>> processes. But it still will consider them native/compat by the >>> personality, that elf loader set on launch. This affects i.e., ptrace >>> syscall on those tasks: PTRACE_GETREGSET will return 64/32-bit regset >>> according to process's mode (that's how strace detect task's >>> personality from 4.8 version). >>> >>> This patch adds arch_prctl calls for x86 that make possible to tell >>> Linux kernel in which mode the application is running currently. >>> Mainly, this is needed for CRIU: restoring compatible & native >>> applications both from 64-bit restorer. By that reason I wrapped all >>> the code in CONFIG_CHECKPOINT_RESTORE. >>> This patch solves also a problem for running 64-bit code in 32-bit elf >>> (and reverse), that you have only 32-bit elf vdso for fast syscalls. >>> When switching between native <-> compat mode by arch_prctl, it will >>> remap needed vdso binary blob for target mode. >> >> General comments first: > > Thanks for your comments. >> >> You forgot about x32. > > Will add x32 support for v2. > >> I think that you should separate vdso remapping from "personality". >> vdso remapping should be available even on native 32-bit builds, which >> means that either you can't use arch_prctl for it or you'll have to >> wire up arch_prctl as a 32-bit syscall. > > I cant say, I got your point. Do you mean by vdso remapping > mremap for vdso/vvar pages? I think, it should work now. For 32-bit, the vdso *must* exist in memory at the address that the kernel thinks it's at. Even if you had a pure 32-bit restore stub, you would still need vdso remap, because there's a chance the vdso could land at an unusable address, say one page off from where you want it. You couldn't map a wrapper because there wouldn't be any space for it without moving the real vdso out of the way. Remember, you *cannot* mremap() the 32-bit vdso because you will crash. It works by luck for 64-bit, but it's plausible that we'd want to change that some day. (I have awful patches that speed a bunch of things up at the cost of a vdso trampoline for 64-bit code and a bunch of other hacks. Those patches will never go in for real, but something else might want the ability to use 64-bit vdso trampolines.) > I did remapping for vdso as blob for native x86_64 task differs > to compatible task. So it's just changing blobs, address value > is there for convenience - I may omit it and just remap > different vdso blob at the same place where was previous vdso. > I'm not sure, why do we need possibility to map 64-bit vdso blob > on native 32-bit builds? That would fail, but I think the API should exist. But a native 32-bit program should be able to remap the 32-bit vdso. IOW, I think you should be able to do, roughly: map_new_vdso(VDSO_32BIT, addr); on any kernel. Am I making sense? > >> For "personality", someone needs to enumerate all of the various thigs >> that try to track bitness and see how many of them even make sense. >> On brief inspection: >> >> - TIF_IA32: affects signal format and does something to ptrace. I >> suspect that whatever it does to ptrace is nonsensical, and I don't >> know whether we're stuck with it. >> >> - TIF_ADDR32 affects TASK_SIZE and mmap behavior (and the latter >> isn't even done in a sensible way). >> >> - is_64bit_mm affects MPX and uprobes. >> >> On even more brief inspection: >> >> - uprobes using is_64bit_mm is buggy. >> >> - I doubt that having TASK_SIZE vary serves any purpose. Does anyone >> know why TASK_SIZE is different for different tasks? It would save >> code size and speed things up if TASK_SIZE were always TASK_SIZE_MAX. >> - Using TIF_IA32 for signal processing is IMO suboptimal. Instead, >> we should record which syscall installed the signal handler and use >> the corresponding frame format. > > Oh, I like it, will do. > >> - Using TIF_IA32 of the *target* for ptrace is nonsense. Having >> strace figure out syscall type using that is actively buggy, and I ran >> into that bug a few days ago and cursed at it. strace should inspect >> TS_COMPAT (I don't know how, but that's what should happen). We may >> be stuck with this for ABI reasons. > > ptrace may check seg_32bit for code selector, what do you think? Not sure. I have never fully wrapped my had around ptrace. -- To unsubscribe from this list: send the line "unsubscribe linux-kselftest" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html