> what should the daemon see in such situation? I agree that it looks like the client-core doesn't notice that a process doing IO was cancelled. I don't think the client-core keeps track of what slots are in use, it just trusts that the buffer-index in any IO upcall is safe to use. I believe that wait_for_cancellation_downcall has its roots in the old AIO code, and ended up at some point getting used outside of that context. -Mike On Tue, Feb 9, 2016 at 4:06 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Tue, Feb 09, 2016 at 05:40:49PM +0000, Al Viro wrote: > >> Could you try, on top of those fixes, comment the entire >> if (op->downcall.type == ORANGEFS_VFS_OP_FILE_IO) { >> long n = wait_for_completion_interruptible_timeout(&op->done, >> op_timeout_secs * HZ); >> if (unlikely(n < 0)) { >> gossip_debug(GOSSIP_DEV_DEBUG, >> "%s: signal on I/O wait, aborting\n", >> __func__); >> } else if (unlikely(n == 0)) { >> gossip_debug(GOSSIP_DEV_DEBUG, >> "%s: timed out.\n", >> __func__); >> } >> } >> in orangefs_devreq_write_iter() out and see if the corruption happens? > > Another thing: what's the protocol rules regarding the cancels? The current > code looks very odd - if we get a hit by a signal after the daemon has > picked e.g. read request but before it had replied, we will call > orangefs_cancel_op_in_progress(), which will call service_operation() with > ORANGEFS_OP_CANCELLATION which will. And that'll insert the cancel request > into list and practically immediately notice that we have a pending signal, > remove the cancel request from the list and bugger off. With daemon almost > certainly *not* getting to see it at all. > > I've asked that before if anybody has explained that, I've missed that reply. > How the fuck is that supposed to work? Forget the kernel-side implementation > details, what should the daemon see in such situation? > > I would expect something like "you can't reuse a slot until operation has > been either completed or purged or a cancel had been sent and ACKed by > the daemon". Is that what is intended? If so, the handling of cancels might > be better off asynchronous - let the slot freeing be done after the cancel > had been ACKed and _not_ in the context of original syscall... > > There are some traces of AIO support in that thing; could this be a victim of > trimming async parts for submission into the mainline? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html