On Mon, Mar 26, 2018 at 02:37:03PM +0200, Wei Lin Guay wrote: > When the master SM failovers to the slave SM, the following sequence of > events cause segfault in ibacm. We are using Mellanox CX3 in the > virtualized environment. > > 1. IBV_EVENT_CLIENT_REREGISTRATION > 2. IBV_EVENT_GID_CHANGE > > During the client reregistration, a multicast join request is sent. Then, > when handling the GID table change async event, the acm logical port is > reset(the acm port is shutdown and set back to up again). Due to the fact > that the physical IB port is ACTIVE, the multicast join response can come > back before the acm logical port has transitioned back to up state. In the > handling of the multicast join response while the acm logical port is not > up, the acm process hits into a null pointer derefence when it tries to > associate the gid with the associate acm port via acm_gid_index. To avoid > this, this patch delays the handling of the MAD until the next event if > there is no acm port provider in the associated IB port. > > ibacm[39381]: segfault at bc ip 0000000000402e59 sp 00007fc77c267bf8 error > 4 in ibacm[400000+c000] > > back trace > ========== > 0 acm_gid_index (port=0x0, gid=0x1088b30) at src/acm.c:374 > 1 0x00007f1e0b18f93c in acmp_record_mc_av (sa_mad=0x1088aa0) at > prov/acmp/src/acmp.c:658 > 2 acmp_process_join_resp (sa_mad=0x1088aa0) at prov/acmp/src/acmp.c:734 > 3 0x00000000004038c1 in acmc_recv_mad (port=0x107bc78) at src/acm.c:2888 > 4 0x0000000000403b0d in acm_sa_handler (context=<value optimized out>) at > src/acm.c:2926 > > Signed-off-by: Wei Lin Guay <wei.lin.guay@xxxxxxxxxx> > Reviewed-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> > --- > ibacm/src/acm.c | 5 +++++ > 1 file changed, 5 insertions(+) > Thanks, applied.
Attachment:
signature.asc
Description: PGP signature