This thing is invulnerable now! Nothing hangs when I kill the client-core, and the client-core always restarts. Sometimes, if you hit it right with a kill while dbench is running, a file create will fail. I've been trying to trace down why all day, in case there's something that can be done... Here's what I see: orangefs_create service_operation wait_for_matching_downcall purges op and returns -EAGAIN orangefs_clean_up_interrupted_operation if (EAGAIN) ... goto retry_servicing wait_for_matching_downcall returns 0 service_operation returns 0 orangefs_create has good return value from service_operation op->khandle: 00000000-0000-0000-0000-000000000000 op->fs_id: 0 subsequent getattr on bogus object fails orangefs_create on EINVAL. seems like the second time around, wait_for_matching_downcall must have seen op_state_serviced, but I don't see how yet... I pushed the new patches out to gitolite.kernel.org:pub/scm/linux/kernel/git/hubcap/linux for-next I made a couple of additional patches that make it easier to read the flow of gossip statements, and also removed a few lines of vestigial ASYNC code. -Mike On Mon, Feb 15, 2016 at 6:04 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > On Mon, Feb 15, 2016 at 05:32:54PM -0500, Martin Brandenburg wrote: > >> Something that used a slot, such as reader, would call >> service_operation while holding a bufmap. Then the client-core would >> crash, and the kernel would get run_down waiting on the slots to be >> given up. But the slots are not given up until someone wakes all the >> processes waiting in service_operation up, which happens after all the >> slots are given up. Then client-core hangs until someone sends a >> deadly signal to all the processes waiting in service_operation or >> presumably the timeout expires. >> >> This splits finalize and run_down so that orangefs_devreq_release can >> mark the slot map as killed, then purge waiting ops, then wait for all >> the slots to be released. Meanwhile, processes which were waiting will >> get into orangefs_bufmap_get which will see that the slot map is >> shutting down and wait for the client-core to come back. > > D'oh. Yes, that was exactly the point of separating mark_dead and run_down - > the latter should've been done after purging all requests. Fixes folded, > branch force-pushed. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html