Re: basic questions of memory management

Joseph A Knapka <jknapka@earthlink.net> · Fri, 26 Oct 2001 14:00:55 +0000

Hong Hsu wrote:
>     Rik,

Rik's a busy guy; I'll try to answer your questions :-)

>     I have basic questions regarding virtual address space and memory
> management, wondering you can give a help.
> 
> 1.   When userland process is created, how big the size of virtual
> address space the kernel assign to it,  4GB or it depends on size of the
> executable code?  If 4GB is used, why is that because 3GB of it will be
> used excessively for the process and it is huge for most of programs.

All user processes running on x86 with a stock kernel
can use up to 3GB of virtual space. Of course, no physical
RAM is allocated for a given virtual page unless the process
actually tries to access it. When a userland process is
created, the first page of the executable is read in and
mapped, and a kernel data structure, the "vm_area", is
set up for the executable code virtual memory area. The
vm_area struct simply tells the kernel how to handle
page faults in the executable's code space, by paging
data in from the executable file on disk. The amount of
virtual space managed using that initial vm_area
corresponds to the size of the executable, but again,
no physical RAM is mapped into the process page tables
until the process actually tries to access it. When that
happens, an arbitrary unused physical RAM page is
selected, the proper page contents are read from disk,
and the page is mapped into the process page tables.

A process may (almost certainly will) have multiple vm_area
structs, because different areas of its virtual address space
may require different fault-handling stratagies (among
other things). For example, anonymous pages (those allocated
by malloc() operations) are paged in either by allocating a
new empty physical page (the first time such a page is
accessed), or by reading the page in from the swap file
(if the page already exists).

> 2.  Upon the creation of a userland process, a Page Table and a Page
> Directory will be created in main memory and stay there until the
> process is terminated.  Besides that the /proc will have corresponding
> subdirectory for the process.  Does the virtual address space has a copy
> of executable code on the secondary memory, usually on hard disk, or it
> just contains tables which holds addresses of  executable code?  Having
> a copy of every executable code on hard disk could take a lot of space
> if 3GB is used for each.

No, there is no copying of executable code going on. Normally any
particular executable page (the one from offset 4096 to offset
8192 in the /bin/bash file, for example) will have at most
one copy in physical RAM at a time; that page will be shared
via page table mappings by all processes that use it.
Executable pages are never written to swap (we already have
a copy on disk, no sense making another one in the swap
file!). Anonymous pages (for example, pages allocated due
to malloc() operations) are written to and read from the
swap file, if necessary; again, at most one copy of such
a page will ever exist on disk or in RAM at a time.

> 3.  If I have enough main memory,  I notice that the size of swap
> (maximum is 128 MB) always zero.  Does that means no virtual address
> space on hard disk?

That means the processes running on your machine are not
even using all the available RAM pages, so there is no need
to use swap space.

> 4.  In Windows 98/ME, upon the termination of a userland process, this
> portion of occupied main memory doesn't get released, instead the main
> memory still hold contents of processes until page fault handler forces
> it out.  Does Linux kernel use similar approach.  If so, how these
> contents of processes in main memory can be reused assume same program
> starts to run again as the Page Table and Page Directory are gone?

Linux keeps track of all the physical RAM pages available
on the system. Each page is either "free" (available for
new data and VM mappings), or else it's in use by some
number of processes; that number is the page's reference
count. When a process exits, the reference counts on all
of the pages it's using are decremented. If a page's
reference count reaches 0, it is free and can be used
again for some other purpose. (Things are actually somewhat
more complicated, but conceptually that's what's going on.)

> 5. In your 'Memory Management Talk', you mentioned the main memory is
> very slow.  As speed of Intel processor grows rapidly, speed difference
> between cpu and main memory is getting big and bigger.  How the issue
> could be solved in future?  Does RAM reached its limitation of speed
> theoretically or L2 cache reached its limitation in terms of cost and
> size?

Make all RAM as fast as the CPU. How hard could it be? :-)

HTH,

-- Joe
# "You know how many remote castles there are along the
#  gorges? You can't MOVE for remote castles!" - Lu Tze re. Uberwald
# (Obsolete as of 2.4.12) Linux MM docs:
http://home.earthlink.net/~jknapka/linux-mm/vmoutline.html
-
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
IRC Channel:   irc.openprojects.net / #kernelnewbies
Web Page:      http://www.kernelnewbies.org/