> -----Original Message----- > From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] > Sent: Monday, October 31, 2016 3:05 AM > To: KY Srinivasan <kys@xxxxxxxxxxxxx> > Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; Van De Ven, Arjan > <arjan.van.de.ven@xxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx; Haiyang Zhang > <haiyangz@xxxxxxxxxxxxx> > Subject: Re: [PATCH] Drivers: hv: vmbus: Raise retry/wait limits in > vmbus_post_msg() > > KY Srinivasan <kys@xxxxxxxxxxxxx> writes: > > >> -----Original Message----- > >> From: Vitaly Kuznetsov [mailto:vkuznets@xxxxxxxxxx] > >> Sent: Wednesday, October 26, 2016 4:12 AM > >> To: devel@xxxxxxxxxxxxxxxxxxxxxx > >> Cc: linux-kernel@xxxxxxxxxxxxxxx; KY Srinivasan <kys@xxxxxxxxxxxxx>; > >> Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> > >> Subject: [PATCH] Drivers: hv: vmbus: Raise retry/wait limits in > >> vmbus_post_msg() > >> > >> DoS protection conditions were altered in WS2016 and now it's easy to get > >> -EAGAIN returned from vmbus_post_msg() (e.g. when we try changing > MTU > >> on a > >> netvsc device in a loop). All vmbus_post_msg() callers don't retry the > >> operation and we usually end up with a non-functional device or crash. > >> > >> While host's DoS protection conditions are unknown to me my tests show > >> that > >> it can take up to 46 attempts to send a message after changing udelay() to > >> mdelay() and caping msec at '256', this means we can wait up to 10 > seconds > >> before the message is sent so we need to use msleep() instead. Almost all > >> vmbus_post_msg() callers are ready to sleep but there is one special case: > >> vmbus_initiate_unload() which can be called from interrupt/NMI context > >> and > >> we can't sleep there. I'm also not sure about the lonely > >> vmbus_send_tl_connect_request() which has no in-tree users but its > >> external > >> users are most likely waiting for the host to reply so sleeping there is > >> also appropriate. > > > > Vitaly, > > > > One of the reasons why the delay was in microseconds was to make sure > that the boot time > > was not adversely affected by the delay we had in setting up the channel. > The change to microsecond > > delay and other changes in this code reduced the time it took to initialize > netvsc from > > 200 milliseconds to about 12 milliseconds. This is important for us as we look > at achieving sub-second > > boot times. > > The situation you are trying to address are test cases where you are hitting > the host with > > requests that triggers hosts DOS prevention code. Perhaps we could have a > hybrid approach: we > > retain microsecond wait until we hit a threshold and then we use > millisecond delays. This way, the normal boot > > path is still fast while we can handle some of the other cases where the host > DOS prevention code kicks in. > > > > Ok, > > I actually tested boot time with my patch and didn't see a difference > (so I guess our first attempt to send messages usually succeeds) but if > we're concearned about less-than-a-second boot time we'd rather keep the > microseonds delay for first several attempts. I'll do v2. Thank you. K. Y > > Thanks, > > > -- > Vitaly _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel