On Wed, Mar 9, 2011 at 7:30 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Wed, 9 Mar 2011 18:33:20 -0600 > Steve French <smfrench@xxxxxxxxx> wrote: > >> On Wed, Mar 9, 2011 at 5:54 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >> > On Wed, 9 Mar 2011 16:01:30 -0600 >> > Steve French <smfrench@xxxxxxxxx> wrote: >> > >> >> >> >> Except we don't need to wait that long with the page locked >> >> ie for a response from the cifs server (such as Samba or Windows >> >> or NetApp), just need to wait for it to get on the wire. >> >> Waiting for us to get the server response would >> >> take 10 or 100 times longer. In any case we can't resend >> >> the same request to the server (the signature changes on the >> >> resend since the sequence number is incremented on every >> >> request/response so we have to recalc the checksum anyway) and >> >> cifs requests can't get lost (as with nfs over udp). Keeping >> >> a page locked for 10milliseconds seems like a bad idea - but >> >> it is a little more complicated to implement (for the cifs case) >> >> so that we end page writeback (for the non-WB_SYNC) >> >> as quickly as reasonably possible so we don't kill perf. >> >> >> > >> > The problem here is that the socket layer doesn't have a mechanism >> > to notify us of a TCP ACK. So, we have to wait for the next-best thing >> > -- a response from the server. >> >> But ... we can stop writeback as soon as kernel_sendmsg returns - once >> we return from kernel_sendmsg the buffers can (and often will) be >> freed so we know those pages could not still be used by tcp (below >> cifs) once kernel_sendmsg returns. We can minimize the delay further >> by making sure we set TCP_NODELAY on the socket (we probably ought to >> make that the default instead of an option). >> > > That's not correct. A return from kernel_sendmsg just means that the > data has been buffered up, not that it has been sent and acked. We > shouldn't use that as an indicator to mean that the pages no longer > need to be stable. We can't free the page from the cache until the server responds that it has written the data (otherwise if the server crashes we have no way to resend the dirty page), but I don't see any reason to block redirtying a page as long as we don't break the signing mechanism. We can allow writes to a page once the page has been buffered (kernel_sendmsg is complete). Although cifs has stricter (stricter than open to close) guarantees than nfs in some case, we could hang on to pages longer as nfs does until the server returns the equivalent of fsync. -- Thanks, Steve -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html