Re: infiniband and redunant mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
thanks for investigating problem. I'm also trying to find out WHAT is
the main problem but sadly, it was more like "try to find IBA HW" ;)
(and sadly nether softiwarp nor soft roce was working).


Evgeny Barskiy napsal(a):
> Hello,
> 
> Here is some changes to allow corosync run in iba + rrp mode
> (problem #2, described in
> http://lists.corosync.org/pipermail/discuss/2012-October/002086.html):
> 
> ------ totemiba.c line 1031 send_token_unbind
> +if(instance->send_token_ah)
> +{
> +       ibv_destroy_ah(instance->send_token_ah);
> +       instance->send_token_ah = 0;
> +}
> 
> ------ totemiba.c line 1419 totemiba_token_send
> 
> +if(instance->send_token_ah)
>         res = ibv_post_send (instance->send_token_cma_id->qp, &send_wr,
> &failed_send_wr);
> 
> 

I'm unsure if this is really correct solution (maybe it is), but
send_token_ah shouldn't really be NULL.

> 
> It looks like its initializing, joining cpg and running normally after
> this small fix
> 
> 
> The other one problem
> (problem #1, described in
> http://lists.corosync.org/pipermail/discuss/2012-October/002086.html):
> is kinda more interesting, yes its only occurs during program strarting
> but this is just side effect
> 
> What if we have one switch down during corosync start?
> If its first switch - assert in memb_ring_id_create_or_load
> If its any other one -  infinite loop or even segfault
> 
> this event will never happen:
> 
> case RDMA_CM_EVENT_MULTICAST_JOIN:
> instance->mcast_qpn = event->param.ud.qp_num;
> instance->mcast_qkey = event->param.ud.qkey;
> instance->mcast_ah = ibv_create_ah (instance->mcast_pd,
> &event->param.ud.ah_attr);
> instance->totemiba_iface_change_fn (instance->rrp_context,
> &instance->my_id);
> break;
> 
> so main_iface_change_fn will not be called enough times and we will not
> enter to the gathering state
> 
> in totemudp there is checking if interface is down and even if it down
> we call main_iface_change function
> 
> so I think  somwhere here
> 
> static void timer_function_netif_check_timeout (
> void *data)
> {
> struct totemiba_instance *instance = (struct totemiba_instance *)data;
> int res;
> int interface_up;
> int interface_num;
> int addr_len;
> totemip_iface_check (&instance->totem_interface->bindnet,
> &instance->totem_interface->boundto, &interface_up, &interface_num,
> instance->totem_config->clear_node_high_bit);
> 
> we should at least check "interface_up" variablelike in udp version
> also we should probably setup timer which will retry to initialize it later
> 

exactly

> Next question is if its possible just to loose
> RDMA_CM_EVENT_MULTICAST_JOIN event?
> If yes, we will have infinite loop this way, probable some timer is
> required?
> 

I'm really unsure there.

> Evgeny
> 
> 
> 

Thanks for your investigation. Hopefully we will be able to find proper
solution soon.

Regards,
  Honza

> _______________________________________________
> discuss mailing list
> discuss@xxxxxxxxxxxx
> http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss


[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux