On Sat, Mar 4, 2017 at 12:51 PM, Xin Long <lucien.xin@xxxxxxxxx> wrote: > On Sat, Mar 4, 2017 at 1:57 AM, Xin Long <lucien.xin@xxxxxxxxx> wrote: >> On Sat, Mar 4, 2017 at 12:31 AM, David Laight <David.Laight@xxxxxxxxxx> wrote: >>> From: Xin Long >>>> Sent: 03 March 2017 15:43 >>> ... >>>> > It is much more important to get MSG_MORE working 'properly' for SCTP >>>> > than for TCP. For TCP an application can always use a long send. >>> >>>> "long send" ?, you mean bigger data, or keeping sending? >>>> I didn't get the difference between SCTP and TCP, they >>>> are similar when sending data. >>> >>> With tcp an application can always replace two send()/write() >>> calls with a single call to writev(). >>> For sctp two send() calls must be made in order to generate two >>> data chunks. >>> So it is much easier for a tcp application to generate 'full' >>> ethernet packets. >> okay, it should not be a important reason, and sctp might also support >> it one day. :-) >> >>> >>>> >>>> > >>>> > ... >>>> >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len) >>>> >> * breaks. >>>> >> */ >>>> >> err = sctp_primitive_SEND(net, asoc, datamsg); >>>> >> + asoc->force_delay = 0; >>>> >> /* Did the lower layer accept the chunk? */ >>>> >> if (err) { >>>> >> sctp_datamsg_free(datamsg); >>>> > >>>> > I don't think this is right - or needed. >>>> > You only get to the above if some test has decided to send data chunks. >>>> > So it just means that the NEXT time someone tries to send data all the >>>> > queued data gets sent. >>> >>>> the NEXT time someone tries to send data with "MSG_MORE clear", >>>> yes, but with "MSG_MORE set", it will still delay. >>>> >>>> > I'm guessing that the whole thing gets called in a loop (definitely needed >>>> > for very long data chunks, or after the window is opened). >>> >>>> yes, if users keep sending data chunks with MSG_MORE set, no >>>> data with "MSG_MORE clear" gap. >>>> >>>> > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE >>>> > set it would expect to see a lot of full ethernet frames be sent. >>> >>>> right. >>> >>>> > With the above a frame will be sent (containing all but 1 chunk) when the >>>> > amount of queued data becomes too large for an ethernet frame, and immediately >>>> > followed by a second ethernet frame with 1 chunk in it. >>> >>>> "followed by a second ethernet frame with 1 chunk in it.", I think this's >>>> what you're really worried about, right ? >>>> But sctp flush data queue NOT like what you think, it's not keep traversing >>>> the queue untill the queue is empty. >>>> once a packet with chunks in one ethernet frame is sent, sctp_outq_flush >>>> will return. it will pack chunks and send the next packet again untill some >>>> other 'event' triggers it, like retransmission or data received from peer. >>>> I don't think this is a problem. >>> >>> Erm.... that can't work. >>> I think there is code to convert a large user send into multiple data chunks. >>> So if the user does a 4k (say) send several large chunks get queued. >>> These would need to all be sent at once. >>> >>> Similarly when the transmit window is received. >>> So somewhere there ought to be a loop that will send more than one packet. >> As far as I can see, no loop like you said, mostly, the incoming >> chunk (like SACK) from peer will trigger the next flush out. >> I can try to trace the path in kernel for sure tomorrow. > okay, you are right, I missed sctp_packet_transmit_chunk also call > sctp_packet_transmit to send the current packet. :) > > But if we keep sending data with "MSG_MORE", after one ethernet frame > is sent, "followed by a second ethernet frame with 1 chunk in it" will NOT > happen, as in this loop the asoc's msg_more flag is still set, and this flush > is called by sctp_sendmsg(the function msg_more should care more). > > did I miss something ? Hi, David Laight As now sctp MSG_MORE is broken with msg->force_day, I hope we can use this patch before you get a better one to fix it, so that users would not be confused with the strange behaviour. what do you say ? > >> >>> >>>> > Now it might be that the flag needs clearing when retransmissions are queued. >>>> > OTOH they might get sent for other reasons. >>> >>>> Before we really overthought about MSG_MORE, no need to care about >>>> retransmissions, define MSG_MORE, in my opinion, it works more for >>>> *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them. >>> >>> Eh? and when nagle disabled. >>> If 'inflight' isn't 0 then most paths don't flush data. >> I knew, but MSG_MORE is different thing, it should only try to work for the >> current and following data. >> >>> >>>> We cannot let asoc's more_more flag work as global, it will block elsewhere >>>> sending data chunks, not only sctp_sendmsg. >>> >>> If the connection was flow controlled off, and more 'credit' arrives and there >>> is less that an ethernet frame's worth of data pending, and the last send >>> said 'MSG_MORE' there is no point sending anything until the application >>> does a send with MSG_MORE clear. >> got you, I think you have different understanding about MSG_MORE >> while this patch just try to make it work like TCP's msg_more, but what >> you mentioned here is the same as TCP thing, seems you also want >> to improve TCP's MSG_MORE :-) >> >>> >>> I'm not sure what causes a retransmission to send data, I suspect that 'inflight' >>> can easily be non-zero at that time. >> The thing that causes a retransmission to send data is that both tx and >> rtx send data through sctp_outq_flush, in which it will try to send rtx queue, >> then rx queue. >> >> yes, once a packet is sent out and not yet be SACKed, "inflight" will not be >> zero, so when retransmiting, "inflight" must be non-zero. >> >>> Likely something causes a packet be generated - which then collects the data chunks. >>> >>> David >>> >>> -- To unsubscribe from this list: send the line "unsubscribe linux-sctp" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html