Yes... I remember... I think you are referring to my reply in Message-ID: CAOg9mSSH=LuKyGiVthVajZFc6d=hGWGeLE8G9Y9d5B+g1-2sEg@xxxxxxxxxxxxxx in this thread... I just commented those lines out again, and ran tests... both with and without signaling the client-core to restart. dbench never complained and completed normally across restarts every time except the last, where it failed and the "Failed to allocate orangefs file inode" error was emitted from orangefs_create. Until recently I ran everything, including the server, on the same VM. Currently I am mounting my Orangefs filesystem from a four-server setup from other VMs... it is can be pretty bad news for a userspace filesystem when the kernel crashes on the machine it is running on <g>... -Mike On Tue, Feb 9, 2016 at 12:40 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Tue, Feb 09, 2016 at 09:34:12AM -0500, Mike Marshall wrote: >> > Objections? >> >> Heck no... I've been trying to keep from changing the protocol so as to >> avoid making a whole nother project out of keeping the out-of-tree >> Frankenstein version of the kernel module going, but getting this version >> of the kernel module upstream and getting it infused with ideas from you >> depth-of-knowledge folks is the real goal here. >> >> You're talking about changing orangefs_kernel_op_s (pvfs2_kernel_op_t >> out of tree) and it doesn't cross the boundary into userspace... even if >> it did, that "completion" structure looks like it has been been around >> as long as any of the Linux versions we try to run on... > > OK. While we are at it... Remember the question about the need for devreq > ->write_iter() to wait wait_for_direct_io() to finish copying the data > from slots to final destination? You said that removing that wait ends up > with daemon somehow stomping on those slots and I wonder if that was > another effect of that double-free bug. > > Could you try, on top of those fixes, comment the entire > if (op->downcall.type == ORANGEFS_VFS_OP_FILE_IO) { > long n = wait_for_completion_interruptible_timeout(&op->done, > op_timeout_secs * HZ); > if (unlikely(n < 0)) { > gossip_debug(GOSSIP_DEV_DEBUG, > "%s: signal on I/O wait, aborting\n", > __func__); > } else if (unlikely(n == 0)) { > gossip_debug(GOSSIP_DEV_DEBUG, > "%s: timed out.\n", > __func__); > } > } > in orangefs_devreq_write_iter() out and see if the corruption happens? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html