On 09/05/2015 04:13 PM, Jon Masters wrote: > On 08/11/2015 03:28 PM, Bjorn Helgaas wrote: >> On Mon, Aug 10, 2015 at 2:07 PM, Duc Dang <dhdang@xxxxxxx> wrote: >>> On Mon, Aug 10, 2015 at 10:42 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: >>>> On Mon, Aug 10, 2015 at 12:16 PM, Duc Dang <dhdang@xxxxxxx> wrote: >>>>> On Monday, August 10, 2015, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote: >>>>>> >>>>>> On Fri, Jul 31, 2015 at 12:00 PM, Duc Dang <dhdang@xxxxxxx> wrote: >>>>>>> On Wed, Jul 29, 2015 at 8:55 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >>>>>>> wrote: >>>>>>>> On Tue, Jul 28, 2015 at 08:22:55PM -0500, Bjorn Helgaas wrote: >>>>>>>>> On Tue, Jul 28, 2015 at 02:50:39PM -0700, Duc Dang wrote: >>>>>>>> >>>>>>>>>> Do you have another PCIe card to try on the same reboot test on this >>>>>>>>>> board? >>>>>>>>> >>>>>>>>> I've seen this on at least two Mellanox cards. I'm running similar >>>>>>>>> tests >>>>>>>>> on a different type of card now. >>>>>>>> >>>>>>>> FWIW, reboot tests on two machines with Mellanox cards failed, while >>>>>>>> the >>>>>>>> same test on a machine with a different proprietary card succeeded. >>>>>>> >>>>>>> Thanks, Bjorn. >>>>>>> >>>>>>> I don't have the same Mellanox card as yours, but I will also run >>>>>>> similar reboot test to see if I hit the same issue with my card. >>>>>> >>>>>> Any more hints on this? Nothing has changed on my end, so of course >>>>>> I'm still seeing this, always on machines with Mellanox, and never on >>>>>> other machines. Could this be a hardware issue like a signal >>>>>> integrity or margin issue? I don't know where to go from here because >>>>>> I'm not a hardware person, and I don't know anything to do in >>>>>> software. >>>>> >>>>> >>>>> Hi Bjorn, >>>>> >>>>> I tried to run similar reboot tests on 2 different Mellanox cards (Connect-X >>>>> family, one card has 2 10G interfaces, the other one has 1 port that >>>>> supports InfiniBand) with U-Boot 1.15.12 and linux 4.2-rc5 and I did not see >>>>> the crash that you encounterred. >>>>> >>>>> Did you check if your Mellanox cards have latest firmware? I did see some >>>>> link issues on my Mellanox cards with its old firmware before. >>>> >>>> Good idea; I'll check that, too. Also, I just learned that these >>>> cards on installed with an extender card because of some space issues, >>>> so we're going to test again without the extender. >>> >>> Hi Bjorn, >>> >>> Are other cards that passed your test installed directly to the >>> on-board PCIe slot? >>> If yes, then this is a good data point and it will be useful to test >>> the case where >>> your Mellanox cards are directly installed into the on-board PCIe slot. >> >> The cards that passed the test were installed directly, with no >> extender. We removed the extender from one of the machines with the >> Mellanox card and have not seen this issue since then. I think it's >> very likely that the problem is related to using the extender. > > If you're trying to use Mellanox cards in (for example) an APM Mustang > like system with a PCIe extender card (for example a 90 degree angle > adjustment for a low profile server case), you might want to ping me > offline. I have procured a number of these over the past couple of years > for my home lab and have found one that works (almost) reliably on that > particular hardware platform and does 10G in my home lab. Traveling for the holiday, but I guess it doesn't need to be a secret. I think I have found some success with this one (but I have ordered many different ones over the past year so will confirm next week): http://www.amazon.com/gp/product/B00H8VVD00?psc=1&redirect=true&ref_=oh_aui_search_detailpage Specifically, the fixed angle adapter brackets generally DO NOT work. Jon. -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html