On Thu, Sep 10, 2015 at 01:24:54PM -0300, Marcelo Ricardo Leitner wrote: > On Thu, Sep 10, 2015 at 11:50:06AM -0400, Vlad Yasevich wrote: > > On 09/10/2015 10:22 AM, Marcelo Ricardo Leitner wrote: > > > Em 10-09-2015 10:24, Vlad Yasevich escreveu: > ... > > >> Then you can order sctp_net_init() such that it happens first, then protosw registration > > >> happens, then control socket initialization happens, then inet protocol registration > > >> happens. > > >> > > >> This way, we are always guaranteed that by the time user calls socket(), protocol > > >> defaults are fully initialized. > > > > > > Okay, that works for module loading stage, but then how would we handle new netns's? We > > > have to create the control socket per netns and AFAICT sctp_net_init() is the only hook > > > called when a new netns is being created. > > > > > > Then if we move it a workqueue that is scheduled by sctp_net_init(), we loose the ability > > > to handle its errors by propagating through sctp_net_init() return value, not good. > > > > Here is kind of what I had in mind. It's incomplete and completely untested (not even > > compiled), but good enough to describe the idea: > ... > > Ohh, ok now I get it, thanks. If having two pernet_subsys for a given > module is fine, that works for me. It's clearer and has no moment of > temporary failure. > > I can finish this patch if everybody agrees with it. > > > >>> I used the list pointer because that's null as that memory is entirely zeroed when alloced > > >>> and, after initialization, it's never null again. Works like a lock/condition without > > >>> using an extra field. > > >>> > > >> > > >> I understand this a well. What I don't particularly like is that we are re-using > > >> a list without really stating why it's now done this way. Additionally, it's not really > > >> the last that happens so it's seems kind of hacky... If we need to add new > > >> per-net initializers, we now need to make sure that the code is put in the right > > >> place. I'd just really like to have a cleaner solution... > > > > > > Ok, got you. We could add a dedicated flag/bit for that then, if reusing the list is not > > > clear enough. Or, as we are discussing on the other part of thread, we could make it block > > > and wait for the initialization, probably using some wait_queue. I'm still thinking on > > > something this way, likely something more below than sctp then. > > > > > > > I think if we don the above, the second process calling socket() will either find the > > the protosw or will try to load the module also. I think either is ok after > > request_module returns we'll look at the protosw and will find find things. > > Seems so, yes. Nice. I was testing with it, something is not good. I finished your patch and testing with a flooder like: # for j in {1..5}; do for i in {1234..1280}; do \ sctp_darn -H 192.168.122.147 -P $j$i -l & done & done The system didn't crash, but I got: [1] 13507 [2] 13508 [3] 13510 [4] 13513 [5] 13517 [mrl@localhost ~]$ sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn: failed to create socket: Socket type not supported. sctp_darn listening... sctp_darn listening... sctp_darn listening... sctp_darn listening... ... sctp_darn listening... sctp_darn listening... sctp_darn listening... sctp_darn listening... sctp_darn: failed to create socket: Socket type not supported. sctp_darn listening... sctp_darn listening... And with this applied: --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -289,6 +289,7 @@ lookup_protocol: if (unlikely(err)) { if (try_loading_module < 2) { rcu_read_unlock(); + printk("%p loading proto module\n", sock); /* * Be more specific, e.g. net-pf-2-proto-132-type-1 * (net-pf-PF_INET-proto-IPPROTO_SCTP-type-SOCK_STREAM) @@ -303,6 +304,8 @@ lookup_protocol: else request_module("net-pf-%d-proto-%d", PF_INET, protocol); + + printk("%p done loading proto module\n", sock); goto lookup_protocol; } else goto out_rcu_unlock; During that test, it showed: [ 732.261730] ffff8800da2ce300 loading proto module ^^^^^^^^^^^^^^^^ (1) (first printed line) [ 732.262077] ffff8800da2ca680 loading proto module [ 732.262285] ffff8800da2c8b00 loading proto module [ 732.262421] ffff8800da2ccd00 loading proto module [ 732.263763] ffff8800da2c8580 loading proto module [ 732.265872] ffff8800da2cd280 loading proto module [ 732.270517] ffff8800da2cf900 loading proto module [ 732.270626] ffff8800da2c9080 loading proto module [ 732.272170] ffff8800da2c8580 done loading proto module [ 732.273042] ffff8800da2c8580 loading proto module [ 732.277248] ffff8800da2cf380 loading proto module [ 732.278495] ffff8800da2ca680 done loading proto module [ 732.278916] ffff8800da2cd280 done loading proto module [ 732.278918] ffff8800da2cd280 loading proto module ... [ 732.281171] ffff8800da2ce300 done loading proto module ^^^^^^^^^^^^^^^^ (1) [ 732.281173] ffff8800da2ce300 loading proto module ^^^^^^^^^^^^^^^^ (1) ... [ 732.321299] ffff880034995800 loading proto module [ 732.461481] ffff880034995800 done loading proto module [ 732.461482] ffff880034995800 loading proto module [ 732.461483] ffff880034995800 done loading proto module ^^^^^^^^^^^^^^^^ an attempt that tried both aliases quickly and returned before sctp was initialized ... [ 732.489413] sctp: Hash tables configured (established 1638 bind 1638) ... [ 732.634034] ffff8800da2ce300 done loading proto module ^^^^^^^^^^^^^^^^ (1) nearly 400ms later Also got: $ dmesg | grep runaw [ 732.439598] request_module: runaway loop modprobe net-pf-2-proto-132-type-5 [ 732.442017] request_module: runaway loop modprobe net-pf-2-proto-132 [ 732.449195] request_module: runaway loop modprobe net-pf-2-proto-132-type-5 [ 732.451946] request_module: runaway loop modprobe net-pf-2-proto-132 [ 732.458970] request_module: runaway loop modprobe net-pf-2-proto-132-type-5 [ 732.460380] request_module: runaway loop modprobe net-pf-2-proto-132-type-5 Such socket() calls failed (and some other too), as per (kernel/kmod.c): __request_module() { ... max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT); atomic_inc(&kmod_concurrent); if (atomic_read(&kmod_concurrent) > max_modprobes) { /* We may be blaming an innocent here, but unlikely */ if (kmod_loop_msg < 5) { printk(KERN_ERR "request_module: runaway loop modprobe %s\n", module_name); kmod_loop_msg++; } atomic_dec(&kmod_concurrent); return -ENOMEM; } ... } FWIW, my test system has 8 threads. It seems that request_module will not serialize it as we wanted and we would be putting unexpected pressure on it, yet it fixes the original issue. Maybe we can place a semaphore at inet_create(), protecting the request_module()s so only one socket can do it at a time and, after it is released, whoever was blocked on it re-checks if the module isn't already loaded before attempting again. It makes the loading of different modules slower, though, but I'm not sure if that's really a problem. Not many modules are loaded at the same time like that. What do you think? Marcelo -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html