On Thu, Oct 22, 2015 at 7:22 PM, Mike Christie <michaelc@xxxxxxxxxxx> wrote: > On 10/22/15, 11:52 AM, Ilya Dryomov wrote: >> >> On Thu, Oct 22, 2015 at 5:37 PM, Mike Christie <michaelc@xxxxxxxxxxx> >> wrote: >>> >>> On 10/22/2015 06:20 AM, Ilya Dryomov wrote: >>>> >>>> >>>>>> >>>>>> If we are just talking about if stable pages are not used, and someone >>>>>> is re-writing data to a page after the page has already been submitted >>>>>> to the block layer (I mean the page is on some bio which is on a >>>>>> request >>>>>> which is on some request_queue scheduler list or basically anywhere in >>>>>> the block layer), then I was saying this can occur with any block >>>>>> driver. There is nothing that is preventing this from happening with a >>>>>> FC driver or nvme or cciss or in dm or whatever. The app/user can >>>>>> rewrite as late as when we are in the make_request_fn/request_fn. >>>>>> >>>>>> I think I am misunderstanding your question because I thought this is >>>>>> expected behavior, and there is nothing drivers can do if the app is >>>>>> not >>>>>> doing a flush/sync between these types of write sequences. >>>> >>>> I don't see a problem with rewriting as late as when we are in >>>> request_fn() (or in a wq after being put there by request_fn()). Where >>>> I thought there *might* be an issue is rewriting after sendpage(), if >>>> sendpage() is used - perhaps some sneaky sequence similar to that >>>> retransmit bug that would cause us to *transmit* incorrect bytes (as >>>> opposed to *re*transmit) or something of that nature? >>> >>> >>> >>> Just to make sure we are on the same page. >>> >>> Are you concerned about the tcp/net layer retransmitting due to it >>> detecting a issue as part of the tcp protocol, or are you concerned >>> about rbd/libceph initiating a retry like with the nfs issue? >> >> >> The former, tcp/net layer. I'm just conjecturing though. >> > > For iscsi, we normally use the sendpage path. Data digests are off by > default and some distros do not even allow you to turn them on, so our > sendpage path has got a lot of testing and we have not seen any corruptions. > Not saying it is not possible, but just saying we have not seen any. Great, that's reassuring. > > It could be due to a recent change. Ronny, tell us about the workload and I > will check iscsi. > > Oh yeah, for the tcp/net retransmission case, I had said offlist, I thought > there might be a issue with iscsi but I guess I was wrong, so I have not > seen any issues with that either. I'll drop my concerns then. Those corruptions could be a bug in ceph reconnect code or something else - regardless, that's separate from the issue at hand. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html