From: Xin Long > Sent: 03 March 2017 15:43 ... > > It is much more important to get MSG_MORE working 'properly' for SCTP > > than for TCP. For TCP an application can always use a long send. > "long send" ?, you mean bigger data, or keeping sending? > I didn't get the difference between SCTP and TCP, they > are similar when sending data. With tcp an application can always replace two send()/write() calls with a single call to writev(). For sctp two send() calls must be made in order to generate two data chunks. So it is much easier for a tcp application to generate 'full' ethernet packets. > > > > > ... > >> @@ -1982,6 +1982,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len) > >> * breaks. > >> */ > >> err = sctp_primitive_SEND(net, asoc, datamsg); > >> + asoc->force_delay = 0; > >> /* Did the lower layer accept the chunk? */ > >> if (err) { > >> sctp_datamsg_free(datamsg); > > > > I don't think this is right - or needed. > > You only get to the above if some test has decided to send data chunks. > > So it just means that the NEXT time someone tries to send data all the > > queued data gets sent. > the NEXT time someone tries to send data with "MSG_MORE clear", > yes, but with "MSG_MORE set", it will still delay. > > > I'm guessing that the whole thing gets called in a loop (definitely needed > > for very long data chunks, or after the window is opened). > yes, if users keep sending data chunks with MSG_MORE set, no > data with "MSG_MORE clear" gap. > > > Now if an application sends a lot of (say) 100 byte chunks with MSG_MORE > > set it would expect to see a lot of full ethernet frames be sent. > right. > > With the above a frame will be sent (containing all but 1 chunk) when the > > amount of queued data becomes too large for an ethernet frame, and immediately > > followed by a second ethernet frame with 1 chunk in it. > "followed by a second ethernet frame with 1 chunk in it.", I think this's > what you're really worried about, right ? > But sctp flush data queue NOT like what you think, it's not keep traversing > the queue untill the queue is empty. > once a packet with chunks in one ethernet frame is sent, sctp_outq_flush > will return. it will pack chunks and send the next packet again untill some > other 'event' triggers it, like retransmission or data received from peer. > I don't think this is a problem. Erm.... that can't work. I think there is code to convert a large user send into multiple data chunks. So if the user does a 4k (say) send several large chunks get queued. These would need to all be sent at once. Similarly when the transmit window is received. So somewhere there ought to be a loop that will send more than one packet. > > Now it might be that the flag needs clearing when retransmissions are queued. > > OTOH they might get sent for other reasons. > Before we really overthought about MSG_MORE, no need to care about > retransmissions, define MSG_MORE, in my opinion, it works more for > *inflight is 0*, if it's not 0, we shouldn't stop other places flushing them. Eh? and when nagle disabled. If 'inflight' isn't 0 then most paths don't flush data. > We cannot let asoc's more_more flag work as global, it will block elsewhere > sending data chunks, not only sctp_sendmsg. If the connection was flow controlled off, and more 'credit' arrives and there is less that an ethernet frame's worth of data pending, and the last send said 'MSG_MORE' there is no point sending anything until the application does a send with MSG_MORE clear. I'm not sure what causes a retransmission to send data, I suspect that 'inflight' can easily be non-zero at that time. Likely something causes a packet be generated - which then collects the data chunks. David ��.n��������+%������w��{.n�����{������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f