----- "Dave Anderson" <anderson@xxxxxxxxxx> wrote: > ----- "xiaowei hu" <xiaowei.hu@xxxxxxxxxx> wrote: > > > Hi all, > > > > There is a bug when using crash to process the xen domU dump core that > > larger that 4GB(it is found at processing a 10GB guest core dump file). > > crash reporting this errors: > > crash: cannot find mfn 8392757 (0x801035) in page index > > > > > > crash: cannot read/find cr3 page > > > > this is caused by a var overflow,in the structure of > > typedef struct xc_core_header { > > unsigned int xch_magic; > > unsigned int xch_nr_vcpus; > > unsigned int xch_nr_pages; > > unsigned int xch_ctxt_offset; > > unsigned int xch_index_offset; > > unsigned int xch_pages_offset; > > } xc_core_header_t; > > > > the xch_ctxt_offset,xch_index_offset and xch_pages_offset mean the > > offsets in the core dump file , when it is defined as unsingend > > long ,that means the file can't be more that 4GB,so when processing with > > core dump files that more than 4GB may have error (I encountered > > overflow on that 10GB file),so changing those offset vars to unsigned > > long ,make sure crash can seek to the right position. > > btw,please reply directly to me ,I am not in the mail list. > > > > > > Signed-off-by: Xiaowei Hu <xiaowei.hu@xxxxxxxxxx> > > > > > > diff -up crash-5.0.0/xendump.h.org crash-5.0.0/xendump.h > > --- crash-5.0.0/xendump.h.org 2010-02-04 03:48:04.000000000 +0800 > > +++ crash-5.0.0/xendump.h 2010-02-04 05:41:27.000000000 +0800 > > @@ -28,9 +28,9 @@ typedef struct xc_core_header { > > unsigned int xch_magic; > > unsigned int xch_nr_vcpus; > > unsigned int xch_nr_pages; > > - unsigned int xch_ctxt_offset; > > - unsigned int xch_index_offset; > > - unsigned int xch_pages_offset; > > + unsigned long xch_ctxt_offset; > > + unsigned long xch_index_offset; > > + unsigned long xch_pages_offset; > > } xc_core_header_t; > > > > struct pfn_offset_cache { > > First question -- are you saying that the change above works for you? > > And second -- in your dumpfile, even with 10GB of memory, wouldn't > the base offset value of all three indexes still fit well below > the 4GB mark? > > The xc_core_header in crash is a copy of that found in "tools/libxc/xenctrl.h", > and is presumptively the beginning/header of the dumpfile. And so making the > wholesale change above breaks all earlier (?) versions. > > But what is confusing is that the latest/final version of "xenctrl.h" used in RHEL5 > (3.0.3 vintage), as well as the current version in Fedora (3.4.0-2.fc12) still use > unsigned int offsets, and I just checked with one of our xen masters, and the Xensource > git tree also still has unsigned int values in the header data > structure: > > typedef struct xc_core_header { > unsigned int xch_magic; > unsigned int xch_nr_vcpus; > unsigned int xch_nr_pages; > unsigned int xch_ctxt_offset; > unsigned int xch_index_offset; > unsigned int xch_pages_offset; > } xc_core_header_t; > > #define XC_CORE_MAGIC 0xF00FEBED > #define XC_CORE_MAGIC_HVM 0xF00FEBEE > > Are your xen userspace tools an Oracle hybrid? Ah -- it's becoming clearer now... The evolution of the various xendump formats is the cause for confusion and the issue at hand. In the beginning, the "xm dump-core" facility used its own unique dumpfile format, where the xc_core_header shown above was at the beginning of the dumpfile and served as its primary header. Much later, "xm dump-core" started using an ELF format, where it carried forward 3 of the old xc_core_header fields above into either this ELF note: struct xen_dumpcore_elfnote_header_desc { uint64_t xch_magic; uint64_t xch_nr_vcpus; uint64_t xch_nr_pages; uint64_t xch_page_size; }; or into one of several ELF section headers. The remaining 3 "offset" fields are stored like so: xch_ctxt_offset: in the ".xen_prstatus" ELF section header xch_index_offset: in the ".xen_pfn" or ".xen_p2m" ELF section header depending whether it's fully-virtualized or paravirtualized. xch_pages_offset: in the ".xen_pages" ELF section header The offsets are in the ELF section headers are of "sh_offset" fields of the Elf64_Shdr (or Elf32_Shdr if ELFCLASS32): typedef struct { Elf64_Word sh_name; /* Section name (string tbl index) */ Elf64_Word sh_type; /* Section type */ Elf64_Xword sh_flags; /* Section flags */ Elf64_Addr sh_addr; /* Section virtual addr at execution */ Elf64_Off sh_offset; /* Section file offset */ Elf64_Xword sh_size; /* Section size in bytes */ Elf64_Word sh_link; /* Link to another section */ Elf64_Word sh_info; /* Additional section information */ Elf64_Xword sh_addralign; /* Section alignment */ Elf64_Xword sh_entsize; /* Entry size if section holds table */ } Elf64_Shdr; FWIW, I don't know (or recall) whether ELFCLASS32 is ever used, even with 32-bit xen hosts/guests, because the "sh_offset" in the Elf32_Shdr is of type Elf32_Off, which is 32-bits: /* Type of file offsets. */ typedef uint32_t Elf32_Off; typedef uint64_t Elf64_Off; Anyway, the problem is that the crash utility started using the old xc_core_header data structure when it was the only header. When they started using ELF format dumpfiles, the sh_offset values from the ELF section headers were copied into the old xc_core_header data structure in the crash utility so that the old code base could still be used. But if any of the sh_offset values overflowed into the upper 32-bits, then they would be truncated when the copy was made. In any case, getting back to the crash utility issue, the patch that you proposed cannot be used alone because it will break backwards-compatibility. What could be done is to have the xc_core_verify() initialization code read the dumpfile header into an "original" xc_core_header structure type, verify it as one of the "old-style" dumpfiles, but then store the offsets into your updated xc_core_header structure. Dave The xc_core_header above -- Crash-utility mailing list Crash-utility@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/crash-utility