Re: bootmem allocator

William Lee Irwin III <wli@holomorphy.com> · Wed, 19 Dec 2001 21:34:44 -0800

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> I am trying to understand the bootmem allocator.Nothing much is
> written about those functions except for a few lines.

I did a rewrite of this allocator that does a fairly good job of
respecting its semantics so I think I might have some answers.

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> I believe the virtual memory above PAGE_OFFSET is reserved for kernel.

Yes, the way the paging business goes is that there is a data structure
(essentially a radix tree) pointed at by the register %cr3 where the CPU
looks up the physical address corresponding to the virtual address being
dereferenced. At each level of the page table the full precision of the
field is not used so the CPU is able to store extra information including
user/supervisor bits. In order to make things easier for the kernel,
Linux creates mappings so all the virtual address between PAGE_OFFSET and
4GB map to the physical addresses between 0 and 1GB.

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> Here is my analysis of the setup_arch() from kernel/setup.c.Please
> correct me if I am wrong.
> 1.    /*
>         * Partially used pages are not usable - thus
>         * we are rounding upwards.
>         */
>         start_pfn = PFN_UP(__pa(&_end));
> 
> This is the page frame number in the physical memory above which the
> bootmem would be allocated.

_end is a small integer that is placed at the end of the loaded kernel
image in order to have something where the kernel can in C code figure
out where the end of the kernel's machine code and statically allocated
data structures is. This location changes with the size of the kernel,
the number of drivers you have compiled in, and the sizes of the
statically allocated tables that you can adjust (e.g. pty's).

The bootmem itself doesn't really have information about this until the
kernel passes the bootmem functions the right arguments. This is
basically saying "bootmem, don't look at anything below the end of where
the kernel is loaded", where _end is the marker for the end of that.

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> 2. The next part of the code is a for loop that calculates the highest
> page frame number which would be nothing but the end of physical memory.

The bootmem uses a direct-mapped table to track whether pages are
usable or not. Basically it's a huge array of bits, and there is a
starting page frame number and an ending page frame number for it. If
you have a page frame number (basically a physical address divided by
PAGE_SIZE), to look up whether the page is available you would do this:

	test_bit(bdata->node_bootmem_map,
			pfn - (bdata->node_boot_start)/PAGE_SIZE)

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> 3. Then there is a call to init_bootmem.

The call to init_bootmem() is there so the bootmem knows
	(1) what number to subtract from a page frame number so it can
		look it up in the table
	(2) how large the table is
	(3) where in memory the table is located

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> 4. In the init_bootmem routine, I would like to know what is the
> "mapsize" for.What does it indicate?

mapsize is for part (2) of above, it describes the size of
the memory area the bootmem is supposed to use for the table.

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> 5. In the same routine, a bootmem structure is filled with values for
> the virtual address of start_pfn, the start address and an entry for
> the highest page frame number.Then the memory starting at bootmem_map
> is initialized.

What you're seeing here are the values being filled in for (1), (2), and
(3) above, and the memset() is so that to start with, every position in
the table is filled with a known appropriate value.

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> 6.Returning to setup_arch(),.there is a for loop that does not do a lot.

It's trying to figure out where the beginning and end of memory are by
looping for an extra pass over the E820 table provided by i386 firmware
(so named for the command code used to request the table from firmware).
The bootmem allocator at each range of memory encountered in the loop
is then instructed to mark the memory seen in that range available,
specifically by calling free_bootmem() at the end of the loop.

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> 7.Then there is a call to reserve_bootmem() with the physical address
> of the first usable page frame as one of its arguments.

Well, there are two calls, first one to reserve the bootmem bitmap and
some surrounding memory for preventative purposes, and the second to
reserve the physical zero page, which is often required for firmware.

On Wed, Dec 19, 2001 at 08:33:35PM -0800, Manoj Ekbote wrote:
> 8. In the reserve_bootmem routine, what do sidx,eidx and end point
> to? I am not sure if sidx and eidx point to the physical page numbers
> needed for bootmem.

"sidx" and "eidx" stand for "starting index" and "ending index". They're
used mostly as temporary variables to help simplify some of the loop code.

Cheers,
Bill
--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
IRC Channel:   irc.openprojects.net / #kernelnewbies
Web Page:      http://www.kernelnewbies.org/