On Fri, Jan 22, 2016 at 03:30:02PM -0500, Mike Marshall wrote: > The userspace daemon (client-core) that reads/writes to the device > restarts automatically if it stops for some reason... I believe active > ops are marked "purged" when this happens, and when client-core > restarts "purged" ops are retried (once)... see the comment > in waitqueue.c "if the operation was purged in the meantime..." > > I've tried to rattle Walt and Becky's chains to see if they > can describe it better... What I mean is the following sequence: Syscall: puts op into request list, sleeps in wait_for_matching_downcall() Daemon: exits, markes purged, wakes Syscall up Daemon gets restarted Daemon calls read(), finds op still on the list Syscall: finally gets the timeslice, removes op from the list, decides to resubmit This is very hard to hit - normally by the time we get around to read() from restarted daemon the waiter had already been woken up and already removed the purged op from the list. So in practice you probably had never hit that case. However, it is theoretically possible. What I propose to do is to have purged requests that are still in the lists to be skipped by orangefs_devreq_read() and orangefs_devreq_remove_op(). IOW, pretend that the race had been won by whatever had been waiting on that request and got woken up when it had been purged. Note that by the time it gets resubmitted, it already has the 'purged' flag removed - set_op_state_waiting(op) is done when we are inserting into request list and it leaves no trace of OP_VFS_STATE_PURGED. So I'm not talking about the resubmitted stuff; just the one that had been in queue since before the daemon restart and hadn't been removed from there yet. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html