On 08/27/2013 06:27 PM, Bjorn Helgaas wrote: > On Tue, Aug 27, 2013 at 09:53:35AM +0200, Ludwig Petrosyan wrote: >> So: I use microTCA system with PCIe bus, there are two AMC cards (PCIe >> endpoints), lets call card A and card B. >> as well there are two device drivers for A and B. Card B has bug, after >> PCIe memory write operation (MWr) the card sends back Completion >> packet without data (Cpl) (I now it is wrong, but card designed in this >> way and has to be changed). >> User process Ua reads data from Card A in loop, everything is OK , but >> then I start second user process Ub which writes in loop data to card B >> (bugged card) the Ua gets wrong data. After improving card B the problem >> was solved, but could be it has to be checked on the PCIe driver level >> as well. > PCIe transactions (MWr, MRd, Cpl, etc.) are not directly visible > to the OS or the driver. > > The only thing I can think of that we could do is add a quirk to > blacklist the broken version of card B. You can look at existing > quirks in drivers/pci/quirks.c. Most of them workaround issues > that aren't quite as severe as this one, but we could probably > figure out a way to make the device completely unusable. > > Or do you have something else in mind? > > Bjorn We have fixed the bug in card B and now it is OK, but question is open, what will happen if we got some PCIe endpoint card with the same bug: read operations from other PCIe devices could be broken. Just I think this problem should be solved on the OS level (I am not sure) I will try to explain how things are going on how I think: User process Ub sends Memory-Write request to card B, this is Posted request, so just after sending the request Ub forgets about it, TLP of this packet contain Requester ID for RootComplex, at the same time user process Ua (the RootComplex is free now) sends non-Posted memory read request to card A and waits for Completion packet, but at the same time the card B (bugged card, it should not send Completion to Posted memory write request) send to RootComplex Completion Packet without data and some how Ua get this data as result of his Memory Read request. Seems the Completer ID (or Tag field) in Completion packet not checked and completion from one PCIe endpoint returned as completion of read request from other PCIe endpoint. I want to say this is only an assumption, just I wont to be sure the bugged PCIe device won't influence operation of other devices But could be this problem has to be solved on PCIe Switch or RootComplex side not in OS side... with best regards Ludwig -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html