On Thursday 03 September 2009 10:45:25 pm Hidetoshi Seto wrote: > Bjorn Helgaas wrote: > > Since you're doing a lot of cleanup in aerdrv_errprint.c anyway, > > the format of the messages it prints is pretty strange compared > > to typical Linux messages. Here's a sample: > > > > +------ PCI-Express Device Error ------+ > > Error Severity : Corrected > > PCIE Bus Error type : Physical Layer > > Receiver Error : First > > Receiver ID : 0f00 > > VendorID=103ch, DeviceID=403bh, Bus=0fh, Device=00h, Function=00h > > > > I'd like this a lot better if it used dev_printk and other Linux > > conventions, e.g., "pci 0000:0f:00.0" instead of "Bus=0fh, Device=00h, > > Function=00h" and "device[103c:403b]" instead of "VendorID=103ch, > > DeviceID=403bh". > > Good point. > Such kind of format change is worth to do, I think. > > > Also, it currently calls printk many times, so the output is often > > interspersed with other messages, and there's no nice way to put it > > back together or even tell where the end is. There's no loop, so > > there is a clear maximum output size; maybe everything could be > > put into a buffer and printed with a single dev_printk to make it > > atomic. > > Unfortunately there could be some loop if multiple errors are found. > I think the "interspersed message" is a common issue on where need > to print long strings in multiple lines, and it will be more serious > on where uses KERN_CONT and/or pr_cont(). Oh, I see, you got rid of the static errmsg_buff[] (which limited the output size before), so the loop in aer_get_error_source_name() determines the output size. That's a nice change; there was nothing in the the old code to keep us from overflowing errmsg_buff[] and corrupting something. > Buffering messages might be a bit over-engineered if it is just to > avoid this problem. Since there is no KERN_CONT in aerdrv_errprint.c, > I think using dev_printk would be enough since message lines are printed > each with prefix "pci 0000:00:00.0: ..." so it would not be so hard to > put it back together. Yep, if we have the dev_printk prefixes, it probably would be over- engineering to make everything atomic. > OK, I'll make one more patch to change the format. > Following is a blueprint. Feel free to comment on it: > (Note: handwritten, actual error not used) > > Before: > +------ PCI-Express Device Error ------+ > Error Severity : Corrected > PCIE Bus Error type : Data Link Layer > Bad TLP : > Bad DLLP : First > Receiver ID : 0f00 > VendorID=103ch, DeviceID=403bh, Bus=0fh, Device=00h, Function=00h > Error of this Agent(0f00) is reported first > +------ PCI-Express Device Error ------+ > Error Severity : Corrected > PCIE Bus Error type : Data Link Layer > Bad TLP : First > Receiver ID : 1700 > VendorID=103ch, DeviceID=403bh, Bus=17h, Device=00h, Function=00h > > After: > pci 0000:01:00.0: AER: Multiple Corrected error received: id=0f00 > pci 0000:0f:00.0: PCIE Bus Error: severity=Corrected, type=Data Link Laye > r, id=0f00(Receiver ID) > pci 0000:0f:00.0: device[103c:403b] error status/mask=000020c0/00002000 I should have put a space after "device". > pci 0000:0f:00.0: [ 6] Bad TLP > pci 0000:0f:00.0: [ 7] Bad DLLP (First) > pci 0000:0f:00.0: Error of this Agent(0f00) is reported first Are "id=0f00" in the first line and "Agent(0f00)" references to PCI device 0f:00? If so, it'd be nice to print them in the conventional Linux format, including the domain. > pci 0000:17:00.0: PCIE Bus Error: severity=Corrected, type=Data Link Laye > r, id=1700(Receiver ID) > pci 0000:17:00.0: device[103c:403b] error status/mask=00000040/00002000 > pci 0000:17:00.0: [ 6] Bad TLP (First) I think that's great. Thanks a lot for considering this change. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html