Re: VMAs and "offset"s?

"Peter Teoh" <htmldeveloper@xxxxxxxxx> · Thu, 10 Apr 2008 07:57:42 +0800

On Wed, Apr 9, 2008 at 5:52 PM, Robert P. J. Day <rpjday@xxxxxxxxxxxxxx> wrote:
>
> On Tue, 8 Apr 2008, Johannes Weiner wrote:
>
>  > Hi Robert,
>  >
>  > "Robert P. J. Day" <rpjday@xxxxxxxxxxxxxx> writes:
>  >
>  > > On Tue, 8 Apr 2008, Peter Kerpedjiev wrote:
>  > >
>  > >> Robert P. J. Day wrote:
>  > >> > # pmap -d 1
>  > >> > 1:   init [5]
>  > >> > Address   Kbytes Mode  Offset           Device    Mapping
>  > >> > ... snip ...
>  > >> > 00c55000      12 r-x-- 0000000000000000 0fd:00000 libdl-2.7.so
>  > >> > 00c58000       4 r-x-- 0000000000002000 0fd:00000 libdl-2.7.so
>  > >> > 00c59000       4 rwx-- 0000000000003000 0fd:00000 libdl-2.7.so
>  > >> > ...
>  > >> >
>  > >> > if you look at the second VMA for that shared lib, its address
>  > >> > shows that it's 0x3000 higher up in memory, but the Offset
>  > >> > field shows an offset of only 0x2000.  what does that mean?
>  > >> > thanks.
>  > >> >
>  > >> AFAICT, the region from offset 2000 to offset 3000 in libdl-2.7.so
>  > >> is mapped by both of the first two memory areas.
>  > >>
>  > >> I'm not sure why two memory regions would map the same part of a
>  > >> file with the same permissions.
>  > >
>  > > that's what was confusing me, since i read in love's book, p. 256:
>  > >
>  > > "Intervals in different memory areas in the same address space cannot
>  > > overlap."
>  > >
>  > > and that sure looks like overlapping to me, but only if you take the
>  > > "offset" field seriously.
>  >
>  > The areas do not overlap, their file mappings do.
>
>  sorry, i still don't understand the explanation.  here's the output
>  for my entire VMA set for "init", and the closer i look, the more
>  confused i get:
>
>
>  # pmap -d 1
>  1:   init [5]
>  Address   Kbytes Mode  Offset           Device    Mapping
>
> 00110000       4 r-x-- 0000000000110000 000:00000   [ anon ]
>  00967000     220 r-x-- 0000000000000000 0fd:00000 libsepol.so.1
>  0099e000       4 rwx-- 0000000000036000 0fd:00000 libsepol.so.1
>  00a3d000     100 r-x-- 0000000000000000 0fd:00000 libselinux.so.1
>  00a56000       8 rwx-- 0000000000018000 0fd:00000 libselinux.so.1
>  00ab0000     108 r-x-- 0000000000000000 0fd:00000 ld-2.7.so
>  00acb000       4 r-x-- 000000000001a000 0fd:00000 ld-2.7.so
>  00acc000       4 rwx-- 000000000001b000 0fd:00000 ld-2.7.so
>  00acf000    1356 r-x-- 0000000000000000 0fd:00000 libc-2.7.so
>  00c22000       8 r-x-- 0000000000153000 0fd:00000 libc-2.7.so
>  00c24000       4 rwx-- 0000000000155000 0fd:00000 libc-2.7.so
>  00c25000      12 rwx-- 0000000000c25000 000:00000   [ anon ]
>
> 00c55000      12 r-x-- 0000000000000000 0fd:00000 libdl-2.7.so
>  00c58000       4 r-x-- 0000000000002000 0fd:00000 libdl-2.7.so
>  00c59000       4 rwx-- 0000000000003000 0fd:00000 libdl-2.7.so
>  08048000      32 r-x-- 0000000000000000 0fd:00000 init
>  08050000       4 rw--- 0000000000008000 0fd:00000 init
>  08678000     132 rw--- 0000000008678000 000:00000   [ anon ]
>  b7f41000       8 rw--- 00000000b7f41000 000:00000   [ anon ]
>  bfafd000      84 rw--- 00000000bffea000 000:00000   [ stack ]
>  mapped: 2112K    writeable/private: 264K    shared: 0K
>  #
>
>   take those two lines corresponding to libsepol.so.1:
>
>  Address                Offset
>
> 00967000     220 r-x-- 0000000000000000 0fd:00000 libsepol.so.1
>  0099e000       4 rwx-- 0000000000036000 0fd:00000 libsepol.so.1
>
>   but 0x967000+0x36000=0x99d000, not 99e000.  close, but not quite.
>  and if you look at some of the other shared libs, you see that same
>  "off by 0x1000" amount for later VMAs.  so why that small difference?
>  i've checked the source and i still don't see a rationale for that.
>

The domain of knowledge u are pursuing right now is caller "loader" -
or specifically ELF loader, but for the kernel.   The way to load the
binary has many rules (I wrote one loader for Windows Kernel in the
past, open source u cannot find one :-)).   But the logic between PE
format and ELF format is similar:

(I have not read Mulyadi's article in depth, but I guessed it is along
the same line):

All ELF file has many sections, and compiler normally will have to
compile it into a position independent code (PIC) format - esp for the
shared lib (your case in point).   The reason it can be
position-independent is because of a relocation section.   For normal
executable, it is not necessary (no relocation section).   Where to
load each of these sections into memory - it is normally specified
INSIDE THE ELF file, for shared library it can be loaded ANYWHERE in
memory, again the ELF will specify the relative position to load, then
u do relocation section fixup.

There are other details like import/export section fixing etc, but it
does not determine the address to load to.   But another impt point is
this:   file section offset (which is what is displayed above) DOES
NOT CORRESPOND TO memory offset.   (again these are common terms when
u go into "loader").   And lastly, memory offset is also determined by
some form of rounding - something like multiple of PAGE_SIZE
requirements, so your binary can be very compact in a physical file
(say 1K) but upon loading it can be mapped to a memory image of 10M
etc (as u can see in all your maps output).

-- 
Regards,
Peter Teoh

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ