> > Which it is. Fork shares no memory regions; vfork/clone share all > > memory regions. AFAIK there is no share-heap-but-not-stack option in > > Linux. > > Thats a design decision. At the point you don't have identical mappings for > both threads you need two sets of page tables and you take all the > performance hits that go with changing current tables on a schedule. > > Its a lot cheaper to use a different %esp for each thread That's a point of view that reflects the PC-centric origins of Linux. Large-scale parallel systems, such as the current high-end server offerings from Sun, IBM/Sequent, and SGI, have a non-uniform memory access (NUMA) model on which insisting on maintining identical page tables for all threads of a parallel program can result in an intollerable level of remote memory accesses. The usual technique employed on such systems is a dynamic replication of frequently accessed pages. This of course implies greater OS overhead for all operations on virtual memory maps, and additional overhead to synchronize the copies, but that can be more than compensated for by reducing the average memory latency seen by the CPUs. An even more extrememe case, though one that is perhaps less relevant to mainstream parallel servers, is that of the emulation of shared memory ("virtual shared memory" or VSM as we used to call it) on message based highly-parallel machines, which likewise depends on having distinctly set-up and managed page tables for different threads/processes within a parallel program. There's nothing wrong with having an OS design that allows common page tables to be used across the threads of a parallel program running on a simple, small-scale SMP platform like a dual or quad CPU PC. But making that the *only* way that thread parallelism is supported is, in my opinion, a design error that will have to be fixed one of these days. And the longer the existing model is enhanced and maintained, the harder it will be to fix. All that having been said, please note that this issue is orthogonal to the question of whether one should have a single "process image" across all threads of a parallel program. Regards, Kevin K.