Re: Process adress space (during context/process switch) + VM question

Tristan Wibberley <maihem@xxxxxxxxxx> · Sun, 14 Aug 2005 11:05:54 +0100

Tom Davis wrote:
> Hi
> 
> I have a question I could not find the answer (a clear answer,
> atleast) to online or in books.
> Before the questions, here is my understanding of process/thread
> address spaces and the stack pointers, please tell me if  I am wrong
> somewhere.
> 
> 1) A process has it's own address space and a fork() provides a new
> address space to the child (maybe COW). Since a process binary has a
> text, data, bss and stack space, the child get's a copy of only the
> text portion of the parent. Am I right in assuming this?

The child gets a COW copy of everything that is writable (but I'm not
sure what happens with shared memory in the parent or the stack - though
I think the stack is COW'd just the same). It get's non-writeable stuff
shared in (ie the .text section and the .rodata section).

> 2) A thread (inside a process) has it's own stack but it shares the
> rest of the segments (text, bss and data) with the parent, is this
> right?

I believe, from Linux's PoV, that a thread doesn't have a parent. All
threads in a process are equal. A thread's stack is just an area of
writeable memory in the process that the thread "knows" to use as stack,
and the other threads "know" not to (not really, but that's the short
version). They know to use some area as stack due to information in the
program code and program files.

> 3) A thread stack is created INSIDE the address space of the parents
> stack. For example, if a parent's break_value is at 0x60000000 and
> it's stack pointer is at 0x80000000, then if we create another thread,
> it's stack will be somewhere between the break_value and stack pointer
> (i.e., end of parent's stack space) of the parent?

If you create a thread it will be somewhere that the process could mmap
anonymous pages, that's really all the detail required. BTW, the parent
thread doesn't have its own break value, the break value is per address
space.

> 4) During a context/process switch, the stack pointer is popped from
> the registers. Where does it go (stored on the hard disk?), and which
> registers are popped from the stack? What about all the OTHER values
> in the address space?

The data in the address space may hang around in a physical memory
location, some may be disregarded if there is a copy already on disk
somewhere, some may be written to swap space. The kernel just remembers
where to find that data. The stack pointer (along with the rest of the
important CPU state) are stored in a data structure in the kernel.
Regarding *which* data and the *exact* mechanism used - this is
architecture specific and you should look at the source code.

> For example, I am executing process-A and it has a timeslice on the
> CPU. After it's expired, process-B takes over. Process A must be
> having a lot of variables the values of which must be stored in the
> address space of process-A. Now, if we store only the stack pointers
> and a few other register values, where do we get the whole lot of
> variable values from when we reinstate process-A into the CPU?  For
> example, how does the CPU know about the "text" segment or let's say
> the data segment?

The data of process A is stored in physical memory locations or on disk
and they are *mapped* into its address space - they are not stored in
its address space since an address space is a fairly abstract concept.

When you switch to process-B, the old mappings are remembered by the
kernel and the mappings for process-B loaded in (so that the CPU
"logical address space" is now B's address space + the kernels address
space), then when A is switched to again, B's mappings are remembered by
the kernel and the mappings for A are recalled and loaded in (so that
the CPU "logical address space" is now A's address space + the kernel
address space).

> If the text and data segments are also stored as a "snapshot" into
> some place, how does the CPU know which process's snapshot is stored
> where?

The kernel remembers where each peice of data that is "open"/"mapped"
somewhere in the system is stored and it remembers what is mapped to
where for each process.

> 5) In linux VM, all processes think they have the whole range of
> physical memory at their disposal. Can two processes (A and B) have a
> mapping to the "same" physical address? They are not sharing memory,
> btw. Or is it that if a process is not sharing memory, every one of
> it's pages will have to have a different address in the physical space
> (different from all the page frames of ALL other processes).?

No, they each think they have what they have asked to be mapped in (and
subsequently been told is has been mapped and to where) at their
disposal (under restriction of the protection against writing, etc, that
the mapping has been given). Two processes can use the same addresses,
the kernel ensures that each address refers only to A's data while it
has set A running on the CPU, and similarly for B. If a process asks for
memory to be mapped to a particular address and the kernel thinks that
shouldn't happen for some reason, the mapping attempt may fail or be
done at a different address to that requested and the process will be
notified.

-- 
Tristan Wibberley

Opinions expressed are my own and do not necessarily coincide with those
of my employer, etc.

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/