Re: stable page writes: wait_on_page_writeback and packet signing

Steve French <smfrench@xxxxxxxxx> · Thu, 10 Mar 2011 07:44:17 -0600

On Wed, Mar 9, 2011 at 7:41 PM, Trond Myklebust
<Trond.Myklebust@xxxxxxxxxx> wrote:
> On Wed, 2011-03-09 at 16:01 -0600, Steve French wrote:
>> On Wed, Mar 9, 2011 at 3:51 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Wed, Mar 09, 2011 at 01:44:24PM -0600, Steve French wrote:
>> >> Following up on the discussion about how to avoid the copy into a
>> >> temporary buffer for the case when a file system has to sign a page
>> >> (or list of pages) that is going to be passed in an iovec to be
>> >> written to the network or disk, I noticed that a few file systems do
>> >> issue wait_on_page_writeback (nfs in nfs_writepages for example).
>> >> Apparently some areas are being investigated to add something similar
>> >> for ext4 for disk adapters that do crc checks on data being sent down
>> >> to the disk.   In the cifs case it looks like cifs_writepages already
>> >> does:
>> >>
>> >> if (wbc->sync_mode != WB_SYNC_NONE)
>> >>                                 wait_on_page_writeback(page);
>>
>> <snip>
>>
>> > Sounds like a case for the same dirty page lifecycle as NFS: clean
>> > -> dirty -> writeback -> unstable -> clean. i.e. the page is
>> > unstable after the issuing of the IO until the response from the
>> > server so the page can't be reclaimed while the IO is still in
>> > progress at the server...
>>
>> Except we don't need to wait that long with the page locked
>> ie for a response from the cifs server (such as Samba or Windows
>> or NetApp), just need to wait for it to get on the wire.
>> Waiting for us to get the server response would
>> take 10 or 100 times longer.   In any case we can't resend
>> the same request to the server (the signature changes on the
>> resend since the sequence number is incremented on every
>> request/response so we have to recalc the checksum anyway) and
>> cifs requests can't get lost (as with nfs over udp).  Keeping
>> a page locked for 10milliseconds seems like a bad idea - but
>> it is a little more complicated to implement (for the cifs case)
>> so that we end page writeback (for the non-WB_SYNC)
>> as quickly as reasonably possible so we don't kill perf.
>
> So what if the server crashes, or you get some other transient error?
>
> The NFS unstable write mechanism is there in order to deal with
> imperfect servers that occasionally crash and lose cached data. If all
> we had to deal with was perfect situations where all WRITE requests
> succeed, then life would be much simpler...

If the server crashes we have to resend a new request with a different
sequence number so the signature changes - so it doesn't
matter if the page was mmapped and modified we will have to recalculate
the crc anyway.

-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html