"Natalie Protasevich" <protasnb at gmail.com> writes: > I came across a report about panics on a IA64 system that happen when > kexec is being executed. The FSB parity error gets generated: > > BRLD / UC to x8208208208, A43:41 = x0, FSB Parity Error detected on > Processor Request > BRLC / UC to xFFFF2000000, A43:41 = x7, FSB Parity Error detected on > the Deferred Reply > BRLD / WB to xFFFFFFF0028, A43:41 = x7, FSB Parity Error detected on > the Deferred Reply > BRLD / WB to xFFFFFFF0028, A43:41 = x7, FSB Parity Error detected on > the Deferred Reply > BRLC / UC to xFFFF2000000, A43:41 = x7, FSB Parity Error detected on > the Deferred Reply > BRLD / UC to x8208208208, A43:41 = x0, FSB Parity Error detected on > Processor Request > > > And the pattern of the address on the bus is actually coming from the > piece of code in arch/ia64/kernel/gate.S, calculating ar.bpstore: > > ... > sub r14=r14,r17 // r14 <- -rse_num_regs(bspstore1, bsp1) > movl r17=0x8208208208208209 > ;; > add r18=r18,r14 // r18 (delta) <- rse_slot_num(bsp0) - > rse_num_regs(bspstore1,bsp1) > setf.sig f7=r17 > cmp.lt p7,p0=r14,r0 // p7 <- (r14 < 0)? > ;; > ... > > Have you seen such error before? What would you recommend for debugging this? Not really. However this sounds fairly deterministic on the hardware involved. So I would recommend a code audit. With low-level kexec code like this it really requires someone who knows the architecture to think through the code. Adding in serial output into the assembly and what not can help to isolate the piece of the code causing the problem. But it looks like you have done that. You haven't provided quite enough context for me to understand how this code sequence is reproduced. I would certainly need more information then you have given to even locate the code path this is coming from, as it has been a long time since I looked at ia64. I have CC'd a few likely suspects and the kexec list so with a little luck if anyone is familiar with this they can answer you. Eric