Re: NULL primary_path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 8, 2013 at 10:31 AM, Vlad Yasevich <vyasevich@xxxxxxxxx> wrote:
> On 03/08/2013 09:31 AM, Karl Heiss wrote:
>>
>> On Fri, Mar 8, 2013 at 8:52 AM, Karl Heiss <kheiss@xxxxxxxxx> wrote:
>>>
>>> On Thu, Mar 7, 2013 at 6:09 PM, Vlad Yasevich <vyasevich@xxxxxxxxx>
>>> wrote:
>>>>
>>>> On 03/07/2013 04:51 PM, Karl Heiss wrote:
>>>>>
>>>>>
>>>>> On Thu, Mar 7, 2013 at 12:17 PM, Vlad Yasevich <vyasevich@xxxxxxxxx>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On 03/07/2013 12:06 PM, Karl Heiss wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> The issue appears to manifest itself when the connection is closed
>>>>>>> from the remote end and getsockopt(SCTP_STATUS) is called within a
>>>>>>> small window in which the association is still valid but
>>>>>>> asoc->peer.primary_path is NULL.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Aha!  Thanks.  There was a bug in the rcu clean-up that allowed the
>>>>>> association to remain while all transports have been removed.
>>>>>>
>>>>>> Here is a patch that should have addressed this condition:
>>>>>>
>>>>>> commit 8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>> Author: Daniel Borkmann <dborkman@xxxxxxxxxx>
>>>>>> Date:   Fri Feb 1 04:37:43 2013 +0000
>>>>>>
>>>>>>       sctp: sctp_close: fix release of bindings for deferred
>>>>>> call_rcu's
>>>>>>
>>>>>> Full patch is here:
>>>>>>
>>>>>>
>>>>>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=8c98653f05534acd1cb07ea4929702a3659177d1
>>>>>>
>>>>>> Make sure that you have this patch in the kernel you are running
>>>>>>
>>>>>> -vlad
>>>>>>
>>>>>>
>>>>>>>
>>>>>
>>>>> Unfortunately this patch wont apply to the version of the SCTP stack
>>>>> that we are using (2.6.36.2) since it does not have a
>>>>> sctp_transport_destroy_rcu() function.  Is there any chance that
>>>>> simply swapping the order of the instructions without moving them
>>>>> would have any effect?  I ask this hypothetically because the race
>>>>> condition window seems to be difficult to recreate, thus nothing to
>>>>> test against (aside from in the field!).
>>>>>
>>>>> Karl
>>>>>
>>>>
>>>> Hi Karl
>>>>
>>>> I think I see the problem now.  The problem happens when the association
>>>> is
>>>> destroyed.  We delay removing the association from
>>>> the association id pool until all references on the association
>>>> have dropped.  As a result, it is possible (for a very short
>>>> period of time) for an association structure to still exist in
>>>> the kernel and still be found via the association id, but that
>>>> association
>>>> has no transports and is about to be completely destroyed.
>>>>
>>>> This is a really interesting race and I need to figure out if it is
>>>> there on purpose or not?
>>>>
>>>> In the mean time, here is a patch that should solve it for you.
>>>>
>>>> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
>>>> index b907073..2d92c89 100644
>>>> --- a/net/sctp/socket.c
>>>> +++ b/net/sctp/socket.c
>>>> @@ -223,7 +223,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>> *sk,
>>>> sctp_assoc_t id)
>>>>                  if (!list_empty(&sctp_sk(sk)->ep->asocs))
>>>>                          asoc = list_entry(sctp_sk(sk)->ep->asocs.next,
>>>>                                            struct sctp_association,
>>>> asocs);
>>>> -               return asoc;
>>>> +               goto done;
>>>>          }
>>>>
>>>>          /* Otherwise this is a UDP-style socket. */
>>>> @@ -234,6 +234,7 @@ struct sctp_association *sctp_id2assoc(struct sock
>>>> *sk,
>>>> sctp_assoc_t id)
>>>>          asoc = (struct sctp_association *)idr_find(&sctp_assocs_id,
>>>> (int)id);
>>>>          spin_unlock_bh(&sctp_assocs_id_lock);
>>>>
>>>> +done:
>>>>          if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
>>>>                  return NULL;
>>>>
>>>
>>> Vlad,
>>>
>>> Looking at the kdump from the panic, I am seeing that your patch above
>>> may not work in this case since the asoc is valid, the base.sk is
>>> valid, and base.dead is 0.  Unless base.sk is valid but doesn't match
>>> sk, this wouldn't appear to fix this issue.
>
>
> Hm..  If the association is not marked "dead", it should still have all its
> transports present.  If you look at the peer.transport_addr_list in
> you kdump, is that list empty or not?
>
> Are any other peer transport pointers set (active_path, retran_path)?
>

crash> p ((struct sctp_association *) 0xffff8107670e3000).peer
$14 = {
  rwnd = 65535,
  transport_addr_list = {
    next = 0xffff8107670e3180,
    prev = 0xffff8107670e3180
  },
  transport_count = 0,
  port = 3868,
  primary_path = 0x0,
  primary_addr = {
    v4 = {
      sin_family = 0,
      sin_port = 0,
      sin_addr = {
        s_addr = 0
      },
      __pad = "\000\000\000\000\000\000\000"
    },
    v6 = {
      sin6_family = 0,
      sin6_port = 0,
      sin6_flowinfo = 0,
      sin6_addr = {
        in6_u = {
          u6_addr8 =
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
          u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0},
          u6_addr32 = {0, 0, 0, 0}
        }
      },
      sin6_scope_id = 0
    },
    sa = {
      sa_family = 0,
      sa_data = "\000\000\000\000\000\000\000\000\000\000\000\000\000"
    }
  },
  active_path = 0x0,
  retran_path = 0x0,
  last_sent_to = 0x0,
  last_data_from = 0x0,
  tsn_map = {
    tsn_map = 0x0,
    base_tsn = 0,
    cumulative_tsn_ack_point = 0,
    max_tsn_seen = 0,
    len = 0,
    pending_data = 0,
    num_dup_tsns = 0,
    dup_tsns = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
  },
  sack_needed = 1 '\001',
  sack_cnt = 0,
  ecn_capable = 0 '\0',
  ipv4_address = 1 '\001',
  ipv6_address = 0 '\0',
  hostname_address = 0 '\0',
  asconf_capable = 0 '\0',
  prsctp_capable = 0 '\0',
  auth_capable = 0 '\0',
  adaptation_ind = 0,
  addip_disabled_mask = 0,
  i = {
    init_tag = 0,
    a_rwnd = 0,
    num_outbound_streams = 0,
    num_inbound_streams = 0,
    initial_tsn = 0
  },
  cookie_len = 0,
  cookie = 0x0,
  addip_serial = 0,
  peer_random = 0x0,
  peer_chunks = 0x0,
  peer_hmacs = 0x0
}

>
>>>
>>> Karl
>>
>>
>> Vlad,
>>
>> One other thing, with the difficulty we are having recreating this
>> issue, is there any generic way to increase the likelihood for the
>> transport to be cleared out while delaying the association cleanup?
>> Is there any way that the association is initialized without any
>> transport information?
>
>
> When the association is initialized, the lists are empty, but the next
> thing that happens is that we add transport of the destination we are
> sending to or receiving from to the association and mark it as primary and
> active.  All this happens under a socket lock, so getsockopt can't
> access the association until all actions on that association complete.
>
>
>> The reason I ask; we believe the issue is
>> happening very shortly after the association is brought up (we bring
>> it up and then do the getsockopt()).
>
>
> Can you check what the association state is?  Alternately, can you provide
> the kdump and the kernel so I can dig around.

crash> p ((struct sctp_association *) 0xffff8107670e3000).state
$15 = SCTP_STATE_CLOSED

>
> Thanks
> -vlad
>>
>>
>> Thanks,
>> Karl
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Networking Development]     [Linux OMAP]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux