> > Anyway, do you see a hole or a serious performance > > problem with my modified proposal (explicit mmap() > > to create the necessary storage)? > > Same problem as with clone. I recommend the clone manpage; it says: > > CLONE_VM > If CLONE_VM is set, the calling process and the child processes run in the same > memory space. In particular, memory writes performed by the calling process or > by the child process are also visible in the other process. Moreover, any mem > ory mapping or unmapping performed with mmap(2) or munmap(2) by the child or > calling process also affects the other process. > > If CLONE_VM is not set, the child process runs in a separate copy of the memory > space of the calling process at the time of clone. Memory writes or file map > pings/unmappings performed by one of the processes do not affect the other, as > with fork(2). > > That is, if any memory OR MAPPING is shared, they all are. Daniel, you didn't read my message. The per-thread memory would be allocated *after* the clone() in pthread_create(). More specifically, pthread_create() would set it up so that the function passed to clone for invocation was in fact a wrapper that sets up the memory and thread data before invoking the application function passed to pthread_create(). Now, if the idea is that the clone() system call is supposed to cause the thread to be born, like Athena, full-grown from the head of Zeus, with the analog to the thread register already set up when it leaves the kernel, then I would be inclined to concede that we need to change the ABI, the kernel, and compilers, and I would ask just what we get for our trouble. But if we are permitted the pthreads abstraction, there's a lot that can be done transparently. Regards, Kevin K.