Re: [PATCH] mark rbd requiring stable pages

Mike Christie <michaelc@xxxxxxxxxxx> · Thu, 22 Oct 2015 12:22:40 -0500

On 10/22/15, 11:52 AM, Ilya Dryomov wrote:
On Thu, Oct 22, 2015 at 5:37 PM, Mike Christie <michaelc@xxxxxxxxxxx> wrote:
On 10/22/2015 06:20 AM, Ilya Dryomov wrote:

If we are just talking about if stable pages are not used, and someone
is re-writing data to a page after the page has already been submitted
to the block layer (I mean the page is on some bio which is on a request
which is on some request_queue scheduler list or basically anywhere in
the block layer), then I was saying this can occur with any block
driver. There is nothing that is preventing this from happening with a
FC driver or nvme or cciss or in dm or whatever. The app/user can
rewrite as late as when we are in the make_request_fn/request_fn.

I think I am misunderstanding your question because I thought this is
expected behavior, and there is nothing drivers can do if the app is not
doing a flush/sync between these types of write sequences.
I don't see a problem with rewriting as late as when we are in
request_fn() (or in a wq after being put there by request_fn()).  Where
I thought there *might* be an issue is rewriting after sendpage(), if
sendpage() is used - perhaps some sneaky sequence similar to that
retransmit bug that would cause us to *transmit* incorrect bytes (as
opposed to *re*transmit) or something of that nature?

Just to make sure we are on the same page.

Are you concerned about the tcp/net layer retransmitting due to it
detecting a issue as part of the tcp protocol, or are you concerned
about rbd/libceph initiating a retry like with the nfs issue?

The former, tcp/net layer.  I'm just conjecturing though.

For iscsi, we normally use the sendpage path. Data digests are off by 
default and some distros do not even allow you to turn them on, so our 
sendpage path has got a lot of testing and we have not seen any 
corruptions. Not saying it is not possible, but just saying we have not 
seen any.

It could be due to a recent change. Ronny, tell us about the workload 
and I will check iscsi.

Oh yeah, for the tcp/net retransmission case, I had said offlist, I 
thought there might be a issue with iscsi but I guess I was wrong, so I 
have not seen any issues with that either.

iSCSI just has that bug I mentioned offlist where we close the socket 
and fail commands upwards in the wrong order. That is a iscsi specific 
bug though.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html