Re: dom0 analysis for IA64

"Itsuro ODA" <oda@xxxxxxxxxxxxx> · Sat, 12 May 2007 21:46:07 +0900 (JST)

Hi,

Thanks for your explanation, Isaku.

> But, in any case, Itsuro, can you do what is possible with
> your patch, and re-submit it?

I understand the first 1GB(for this example) is necessary
for looking ordinal linux kernel structures.

I feel now it is better to make p2m_frame table for the first
contiguous physical address space which is treated as RAM,
and make special command to look the shared_info area or
io space if it is necessary.

Let me consider for a while. (I can't access any IA64
environment right now.)
I will reply early next week.

Thanks.

Dave Anderson said:
> Isaku Yamahata wrote:
>
>> Hi Dave.
>> I think I can explain it.
>>
>> Sometimes xen needs to share pages with dom0.
>> For example shared_info, grant table pages, another domain's pages
>> and etc.
>> In such a case, Xen/IA64 puts those pages in the dom0 pseudo physical
>> addresses space, i.e. it updates dom0 p2m table, thus dom0 can
>> access those pages.
>> Pseudo physical addresses are predefined or given by xen or dom0.
>> Currently shared_info is assigned at pseudo physical address
>> of 1UL << 40 = 1TB.
>> This corresponds to the following entry.
>> >     f00000007d8b0080:  000000007f428000 0000000000000000
>> ..B.............
>>
>> Dom0 controls devices so that it needs to access I/O area.
>> For that purpose, dom0 p2m table has the entry which points I/O area
>> such that dom0 pseudo physical address = machine address.
>> I guess that the following entry corresponds to I/O area.
>> >     f00000007d8b07f0:  0000000000000000 000000007bed4000
>> .........@.{....
>> In order to confirm this, The native linux's /proc/iomem is necessary.
>>
>> thanks.
>>
>
> OK, thanks for that explanation...
>
> It *still* seems to be a huge waste of memory.  Taking the
> example dump, the 1GB of "normal" memory requires 32 p2m_mfn
> values for address translation, plus -- if I understand you
> correctly -- 1 for the shared_info, plus 1 for the I/O area.
> That's a total of 34 8-byte values, or 272 bytes.  Whereas
> this first patch uses 524288 entries, or 4MB of memory.
> It seems to me there should be a better way to handle it,
> even if those two particular pseudo-physical regions are
> "special-cased" for ia64.
>
> But, in any case, Itsuro, can you do what is possible with
> your patch, and re-submit it?
>
> Thanks guys,
>   Dave
>
>
>>
>> On Fri, May 11, 2007 at 10:02:39AM -0400, Dave Anderson wrote:
>> > Itsuro ODA wrote:
>> >
>> >     Hi Dave,
>> >
>> >     > This all sounds good, and I agree that the p2m_mfn should
>> >     > be added to the ia64 XEN_ELFNOTE_CRASH_INFO.
>> >     >
>> >     > However, there's something incorrect in your calculation of
>> >     > "xkd->p2m_frames" in your ia64_xen_kdump_p2m_create()
>> implementation.
>> >     > It looks like it should be 32, but it's set to 524288.  As a
>> result
>> >     > that wastes a lot of memory, and "help -n" is pretty much
>> unusable
>> >     > since wants to dump all ~512k entries:
>> >
>> >     This is because IA64's pseudo-physical memory map (domain on xen
>> >     specific).
>> >
>> >     phys-to-machine mapping is managed as 3-level page table.
>> >     pgd looks like:
>> >     -------------------------------------------------------------
>> >     crash> doms
>> >        DID       DOMAIN      ST T  MAXPAGE  TOTPAGE VCPU     SHARED_I
>> >     P2M_MFN
>> >       32753 f000000007dac080 ?? O     0        0      0          0
>> >     ----
>> >       32754 f000000007ff0080 ?? X     0        0      0          0
>> >     ----
>> >       32767 f000000007ff4080 ?? I     0        0      1          0
>> >     ----
>> >     >*    0 f000000007da4080 ?? 0   10000    f986     1
>> f000000007d90000
>> >     1f62c
>> >
>> >     crash> domain f000000007da4080
>> >     struct domain {
>> >       domain_id = 0,
>> >       shared_info = 0xf000000007d90000,
>> >     ...
>> >       arch = {
>> >         mm = {
>> >           pgd = 0xf00000007d8b0000
>> >         },
>> >     ...
>> >     crash> rd 0xf00000007d8b0000 256
>> >     f00000007d8b0000:  000000007c8d8000 0000000000000000
>> ...|............
>> >     f00000007d8b0010:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b0020:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b0030:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b0040:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b0050:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b0060:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b0070:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b0080:  000000007f428000 0000000000000000
>> ..B.............
>> >     f00000007d8b0090:  0000000000000000 0000000000000000
>> ................
>> >     ...
>> >     f00000007d8b07c0:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b07d0:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b07e0:  0000000000000000 0000000000000000
>> ................
>> >     f00000007d8b07f0:  0000000000000000 000000007bed4000
>> .........@.{....
>> >     -------------------------------------------------------------------------
>> >     (256 * 2048 = 524288)
>> >
>> >     It is certain that (pseudo-)physical memory "256GB-" and "-4TB"
>> exits.
>> >     These area are shared by domain-0 and xen hypervisor.
>> >     These area should be accessed in dom0's analysis session.
>> >
>> >     (I said:)
>> >     > > But this patch is a bit tricky. And the memory usage is
>> >     > > large if the machine memory layout is sparse.
>> >
>> >     It is wrong. This should be "the memory usage is large if
>> >     pseudo-physical memory layout is sparse."
>> >     And it is always sparse actually...
>> >
>> >     Thanks.
>> >
>> >
>> > Hi Itsuro,
>> >
>> > I now understand the difference in the 3rd-level p2m
>> > frame contents being page table entries instead of mfn
>> > values.
>> >
>> > However, I still do not understand what you mean regarding
>> > the concept of the pseudo-physical memory being "sparse".
>> > Looking at the dumpfile again, it appears to have the same
>> > type of flat pseudo-physical memory layout just like the
>> > other architectures.
>> >
>> > Dom0 has ~1GB of pseudo-physical memory:
>> >
>> > crash> sys
>> >       KERNEL: ../20070510-sample-dump-2/vmlinux-xen-ia64
>> >     DUMPFILE: ../20070510-sample-dump-2/vmcore.tiger.iomem_machine
>> >         CPUS: 1
>> >         DATE: Mon May  7 04:07:43 2007
>> >       UPTIME: 00:01:47
>> > LOAD AVERAGE: 0.11, 0.04, 0.01
>> >        TASKS: 21
>> >     NODENAME: (none)
>> >      RELEASE: 2.6.18-xen
>> >      VERSION: #3 SMP Mon May 7 13:14:41 JST 2007
>> >      MACHINE: ia64  (1296 Mhz)
>> >       MEMORY: 1 GB
>> >        PANIC: "SysRq : Trigger a crashdump"
>> > crash>
>> >
>> > And as far as dom0's VM is concerned, its memory map only knows
>> > about the 64512 pages in DMA zone 0:
>> >
>> > crash> kmem -n
>> > NODE    SIZE      PGLIST_DATA       BOOTMEM_DATA       NODE_ZONES
>> >   0    64512    a000000100482f80  a000000100608950  a000000100482f80
>> >                                                     a000000100483500
>> >                                                     a000000100483a80
>> >                                                     a000000100484000
>> >     MEM_MAP       START_PADDR  START_MAPNR
>> > e0000000010b0000       0            0
>> >
>> > ZONE  NAME         SIZE       MEM_MAP      START_PADDR  START_MAPNR
>> >   0   DMA         64512  e0000000010b0000            0            0
>> >   1   DMA32           0                 0            0            0
>> >   2   Normal          0                 0            0            0
>> >   3   HighMem         0                 0            0            0
>> > crash>
>> >
>> > So the "end of memory" would be just below 1GB:
>> >
>> > crash> eval 64512 * 16k
>> > hexadecimal: 3f000000  (1008MB)
>> >     decimal: 1056964608
>> >       octal: 7700000000
>> >      binary:
>> 0000000000000000000000000000000000111111000000000000000000000000
>> > crash>
>> >
>> > So, with respect to dom0, how would it ever go beyond 32
>> > p2m_frames?  Putting a debug printf in xen_kdump_p2m, it
>> > shows this:
>> >
>> > crash> rd -p 3f000000
>> > xen_kdump_p2m: mfn_idx for 3f000000: 31
>> >         3f000000:  0000000000000000                    ........
>> > crash>
>> >
>> > So that shows that there only needs to be 32 p2m_frames
>> > for accessing all of dom0 pseudo-physical memory.
>> >
>> > But it also shows that you are allowing access to memory
>> > that is *beyond* the end of dom0 pseudo-physical memory,
>> > since 3f000000 should not be readable.  There is not a
>> > page structure associated with 3f000000:
>> >
>> > crash> kmem -p | tail
>> > e000000001421dd0 3efd8000      -------       -----   1 0
>> > e000000001421e08 3efdc000      -------       -----   1 0
>> > e000000001421e40 3efe0000      -------       -----   1 60
>> > e000000001421e78 3efe4000      -------       -----   1 60
>> > e000000001421eb0 3efe8000      -------       -----   1 60
>> > e000000001421ee8 3efec000      -------       -----   1 60
>> > e000000001421f20 3eff0000      -------       -----   2 0
>> > e000000001421f58 3eff4000      -------       -----   1 80
>> > e000000001421f90 3eff8000      -------       -----   1 80
>> > e000000001421fc8 3effc000      -------       -----   1 80
>> > crash>
>> >
>> > By doing few other "rd -p" commands, I see that you seem
>> > to be allowing memory accesses based upon what's in the ELF
>> > header PT_LOAD segments, which are "machine" physical memory
>> > descriptors:
>> >
>> > crash> help -n | grep phys_end
>> >                phys_end: 1000
>> >                phys_end: 7000
>> >                phys_end: 9000
>> >                phys_end: 82000
>> >                phys_end: 85000
>> >                phys_end: a0000
>> >                phys_end: 4000000
>> >                phys_end: 81b3000
>> >                phys_end: ffc0000
>> >                phys_end: 10000000
>> >                phys_end: 7ab06000
>> >                phys_end: 7c8d2000
>> >                phys_end: 7c92e000
>> >                phys_end: 7c938000
>> >                phys_end: 7c97e000
>> >                phys_end: 7cdf6000
>> >                phys_end: 7cdfc000
>> >                phys_end: 7ce2a000
>> >                phys_end: 7d001000
>> >                phys_end: 7d002000
>> >                phys_end: 7d044000
>> >                phys_end: 7d045000
>> >                phys_end: 7d37e000
>> >                phys_end: 7d700000
>> >                phys_end: 7d77e000
>> >                phys_end: 7d8b4000
>> >                phys_end: 7f980000
>> >                phys_end: 7fa00000
>> >                phys_end: 7feda000
>> > crash>
>> >
>> > So it appears that the physical machine running the
>> > dom0 and hypervisor has almost 2GB of "real" physical
>> > memory.  And if I try to read the limit address of
>> > 7feda000, it fails:
>> >
>> > crash> rd -p 7feda000
>> > xen_kdump_p2m: mfn_idx for 7feda000: 63
>> > rd: read error: physical address: 7feda000  type: "64-bit PHYSADDR"
>> > crash>
>> >
>> > But the last page of physical memory can be read:
>> >
>> > crash> rd -p 7fed9000
>> > xen_kdump_p2m: mfn_idx for 7fed9000: 63
>> >         7fed9000:  000000007f9da0a0                    ........
>> > crash>
>> >
>> > "rd -p" is supposed to read pseudo-physical memory in xen
>> > kernels, but it seems to be allowing reads based upon the
>> > PT_LOAD segment contents?  In other words, it seems to
>> > be mixing dom0 pseudo-physical memory and the system's
>> > machine memory, because 7fed9000 is not a legitimate dom0
>> > pseudo-physical address.
>> >
>> > (And even with that happening, the maximum p2m_frame index
>> > is still only 63 -- how can it ever be 512k with respect
>> > to dom0's pseudo-physical memory?)
>> >
>> > So I'm sorry, but this does not make sense to me...
>> >
>> > Dave
>> >
>> >
>> >
>>
>> > --
>> > Crash-utility mailing list
>> > Crash-utility@xxxxxxxxxx
>> > https://www.redhat.com/mailman/listinfo/crash-utility
>>
>> --
>> yamahata
>>
>> --
>> Crash-utility mailing list
>> Crash-utility@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/crash-utility
>

-- 
Itsuro ODA <oda@xxxxxxxxxxxxx>

--
Crash-utility mailing list
Crash-utility@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/crash-utility