On Wed, Aug 16, 2017 at 12:52:52PM -0400, Doug Ledford wrote: > On Wed, 2017-08-16 at 11:05 +0300, Leon Romanovsky wrote: > > On Thu, Aug 03, 2017 at 07:57:39AM -0400, Doug Ledford wrote: > > > On Wed, 2017-08-02 at 08:45 +0300, Leon Romanovsky wrote: > > > > On Tue, Aug 01, 2017 at 08:31:49AM -0400, Doug Ledford wrote: > > > > > Here's the information from the tag: > > > > > > > > > > tag v15-rc1 > > > > > Tagger: Doug Ledford <dledford@xxxxxxxxxx> > > > > > Date: Tue Aug 1 08:18:05 2017 -0400 > > > > > > > > > > rdma-core-15-rc1 > > > > > > > > Isn't the release supposed to be without "-rc1? > > > > > > It is. This is an rc, the release should follow soon. > > > > Doug, > > > > Did the RDMA cluster return to operation? > > Mostly. All of the easy fixes have been done. Now we are down to > debugging/fixing the things that aren't so easy. For instance, if you > have an older ConnectX-2 card in IB/Eth mode, and you are using PFC on > two different no-drop priorities, and have two separate vlans with one > egress priority map on one vlan and another egress priority map on the > other vlan, then the second vlan will refuse to work. This is true for > one of our card models we have in the test lab: > > Model: MHQH29B-XTR > PSID: MT_0D80120009 > > Latest firmware available at Mellanox.com (without a support login): > 2_9_1000 > > That firmware is broken for this test case. It didn't show up prior to > the cluster move as the card was plugged into a different switch > (different brand, entirely different switch OS) and the prior switch > allowed this card to get away with whatever it isn't doing right. I > was able to isolate the problem down to specifically being that when > you add the egress mapping to the second vlan, that second vlan doesn't > work, but if you remove the egress mapping on that second vlan but > otherwise leave the vlan intact, then it starts working (albeit minus > your egress mapping so you won't actually get PFC on that vlan like you > should). Some time ago, I used my Mellanox provided support login to > get the latest unofficial/unreleased OEM firmware kit, so I built a new > firmware for the card out of that unreleased stuff and that solved the > problem. > > So, it's progressing, and we are slowly marking machines back as fully > operational, but you know the saying, the first 90% takes 10% of the > time and the last 10% takes 90% of the time, and that's how things are > playing out here. I was working on it as my main focus Monday and > Tuesday, I'm going to refocus on patch processing today while a few > hardware changes are being made, and go from there. My first priority > today is the -rc pull request and getting it ready. After that I want > to get some more -next stuff pulled in. > > And I know this particular thread is in reference to the rdma-core > package, I haven't forgotten it, I just haven't had a chance to test it > yet :-/. > Thanks for the update, I asked it in context of rdma-core, because Yishai stopped submissions of all our user space patches till the official release and it included my one-liner fix to ibv_xsrq_pinpong :) > -- > Doug Ledford <dledford@xxxxxxxxxx> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature