On 04 Jul 2007 08:29:39 +0800, Zou Nan hai <nanhai.zou at intel.com> wrote: > On Wed, 2007-07-04 at 04:24, Eric W. Biederman wrote: > > "Natalie Protasevich" <protasnb at gmail.com> writes: > > > > > I came across a report about panics on a IA64 system that happen when > > > kexec is being executed. The FSB parity error gets generated: > > > > > > BRLD / UC to x8208208208, A43:41 = x0, FSB Parity Error detected on > > > Processor Request > > > BRLC / UC to xFFFF2000000, A43:41 = x7, FSB Parity Error detected on > > > the Deferred Reply > > > BRLD / WB to xFFFFFFF0028, A43:41 = x7, FSB Parity Error detected on > > > the Deferred Reply > > > BRLD / WB to xFFFFFFF0028, A43:41 = x7, FSB Parity Error detected on > > > the Deferred Reply > > > BRLC / UC to xFFFF2000000, A43:41 = x7, FSB Parity Error detected on > > > the Deferred Reply > > > BRLD / UC to x8208208208, A43:41 = x0, FSB Parity Error detected on > > > Processor Request > > > > > > > > > And the pattern of the address on the bus is actually coming from the > > > piece of code in arch/ia64/kernel/gate.S, calculating ar.bpstore: > > > > > > ... > > > sub r14=r14,r17 // r14 <- -rse_num_regs(bspstore1, bsp1) > > > movl r17=0x8208208208208209 > > > ;; > > > add r18=r18,r14 // r18 (delta) <- rse_slot_num(bsp0) - > > > rse_num_regs(bspstore1,bsp1) > > > setf.sig f7=r17 > > > cmp.lt p7,p0=r14,r0 // p7 <- (r14 < 0)? > > > ;; > > > ... > > > > > > Hi, > > Is the problem reproducible? Is there any special configuration or kexec > command line option to reproduce it? > On which platform and which version of kernel did you see the issue? > > It looks like there may be something wrong with the memory map setting > of the second kernel. > Can you send me copies of /proc/iomem of the first kernel and the second > kernel? > Thanks! I will try to get as much information as I can. It is 100 % reproducible, but intermittent - another words happens with each run, but not predictably (I will get more precise scenario). This is a large ES7000 server with up to 512 processors, I will find out if this happens with large configuration or any. Kernel is the SLES10 or RHEL4U5, they use both. I will provide the iomem, not sure how soon - either tomorrow or after the holiday... Regards, --Natalie > Thanks > Zou Nan hai > > > > > Have you seen such error before? What would you recommend for debugging this? > > > > Not really. > > > > However this sounds fairly deterministic on the hardware involved. > > So I would recommend a code audit. > > > > With low-level kexec code like this it really requires someone who knows > > the architecture to think through the code. > > > > Adding in serial output into the assembly and what not can help to > > isolate the piece of the code causing the problem. But it looks > > like you have done that. > > > > You haven't provided quite enough context for me to understand how > > this code sequence is reproduced. I would certainly need more > > information then you have given to even locate the code path this is > > coming from, as it has been a long time since I looked at ia64. > > > > I have CC'd a few likely suspects and the kexec list so with a little > > luck if anyone is familiar with this they can answer you. > > > > Eric >