Am 27.04.2015 um 16:03 schrieb Alexander Graf: > On 04/27/2015 03:57 PM, Martin Schwidefsky wrote: >> On Mon, 27 Apr 2015 15:48:42 +0200 >> Alexander Graf <agraf@xxxxxxx> wrote: >> >>> On 04/23/2015 02:13 PM, Martin Schwidefsky wrote: >>>> On Thu, 23 Apr 2015 14:01:23 +0200 >>>> Alexander Graf <agraf@xxxxxxx> wrote: >>>> >>>>> As far as alternative approaches go, I don't have a great idea otoh. >>>>> We could have an elf flag indicating that this process needs 4k page >>>>> tables to limit the impact to a single process. In fact, could we >>>>> maybe still limit the scope to non-global? A personality may work >>>>> as well. Or ulimit? >>>> I tried the ELF flag approach, does not work. The trouble is that >>>> allocate_mm() has to create the page tables with 4K tables if you >>>> want to change the page table layout later on. We have learned the >>>> hard way that the direction 2K to 4K does not work due to races >>>> in the mm. >>>> >>>> Now there are two major cases: 1) fork + execve and 2) fork only. >>>> The ELF flag can be used to reduce from 4K to 2K for 1) but not 2). >>>> 2) is required for apps that use lots of forking, e.g. database or >>>> web servers. Same goes for the approach with a personality flag or >>>> ulimit. >>>> >>>> We would have to distinguish the two cases for allocate_mm(), >>>> if the new mm is allocated for a fork the current mm decides >>>> 2K vs. 4K. If the new mm is allocated by binfmt_elf, then start >>>> with 4K and do the downgrade after the ELF flag has been evaluated. >>> Well, you could also make it a personality flag for example, no? Then >>> every new process below a certain one always gets 4k page tables until >>> they drop the personality, at which point each child would only get 2k >>> page tables again. >>> >>> I'm mostly concerned that people will end up mixing VMs and other >>> workloads on the same LPAR, so I don't think there's a one-shoe-fits-all >>> solution. >> If I add an argument to mm_init() to indicate if this context >> is for fork() or execve() then the ELF header flag approach works. > > So you don't need the sysctl? It would not be enough to enable old userspaces that do not have the ELF header flag. So we need both to enable old userspace - new kernel. Christian -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html