On Thu, 18 Feb 2016, Mike Marshall wrote: > Still busted, exactly the same, I think. The doomed op gets a good > return code from is_daemon_in_service in service_operation but > gets EAGAIN from wait_for_matching_downcall... an edge case kind of > problem. > > Here's the raw (well, slightly edited for readability) logs showing > the doomed op and subsequent failed op that uses the bogus handle > and fsid from the doomed op. > > > > Alloced OP (ffff880012898000: 10889 OP_CREATE) > service_operation: orangefs_create op:ffff880012898000: > > > > wait_for_matching_downcall: operation purged (tag 10889, ffff880012898000, att 0 > service_operation: wait_for_matching_downcall returned -11 for ffff880012898000 > Interrupted: Removed op ffff880012898000 from htable_ops_in_progress > tag 10889 (orangefs_create) -- operation to be retried (1 attempt) > service_operation: orangefs_create op:ffff880012898000: > service_operation:client core is NOT in service, ffff880012898000 > > > > service_operation: wait_for_matching_downcall returned 0 for ffff880012898000 > service_operation orangefs_create returning: 0 for ffff880012898000 > orangefs_create: PPTOOLS1.PPA: > handle:00000000-0000-0000-0000-000000000000: fsid:0: > new_op:ffff880012898000: ret:0: > > > > Alloced OP (ffff880012888000: 10958 OP_GETATTR) > service_operation: orangefs_inode_getattr op:ffff880012888000: > service_operation: wait_for_matching_downcall returned 0 for ffff880012888000 > service_operation orangefs_inode_getattr returning: -22 for ffff880012888000 > Releasing OP (ffff880012888000: 10958 > orangefs_create: Failed to allocate inode for file :PPTOOLS1.PPA: > Releasing OP (ffff880012898000: 10889 > > > > > What I'm testing with differs from what is at kernel.org#for-next by > - diffs from Al's most recent email > - 1 souped up gossip message > - changed 0 to OP_VFS_STATE_UNKNOWN one place in service_operation > - reinit_completion(&op->waitq) in orangefs_clean_up_interrupted_operation > > > Mike, what error do you get from userspace (i.e. from dbench)? open("./clients/client0/~dmtmp/EXCEL/5D7C0000", O_RDWR|O_CREAT, 0600) = -1 ENODEV (No such device) An interesting note is that I can't reproduce at all with only one dbench process. It seems there's not enough load. I don't see how the kernel could return ENODEV at all. This may be coming from our client-core. -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html