I haven't edited up a list of how the debug output looked, but most importantly: the WARN_ON is hit... it appears that the client-core is sending over fsid:0: -Mike On Thu, Feb 18, 2016 at 3:08 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote: > I haven't been trussing it... it reports EINVAL to stderr... I find > the ops to look > at in the debug output by looking for the -22... > > (373) open ./clients/client8/~dmtmp/PARADOX/STUDENTS.DB failed for > handle 9981 (Invalid argument) > > I just got the whacky code <g> from Al's last message to compile, I'll > have results from that soon... > > -Mike > > On Thu, Feb 18, 2016 at 2:49 PM, Martin Brandenburg <martin@xxxxxxxxxxxx> wrote: >> On Thu, 18 Feb 2016, Mike Marshall wrote: >> >>> Still busted, exactly the same, I think. The doomed op gets a good >>> return code from is_daemon_in_service in service_operation but >>> gets EAGAIN from wait_for_matching_downcall... an edge case kind of >>> problem. >>> >>> Here's the raw (well, slightly edited for readability) logs showing >>> the doomed op and subsequent failed op that uses the bogus handle >>> and fsid from the doomed op. >>> >>> >>> >>> Alloced OP (ffff880012898000: 10889 OP_CREATE) >>> service_operation: orangefs_create op:ffff880012898000: >>> >>> >>> >>> wait_for_matching_downcall: operation purged (tag 10889, ffff880012898000, att 0 >>> service_operation: wait_for_matching_downcall returned -11 for ffff880012898000 >>> Interrupted: Removed op ffff880012898000 from htable_ops_in_progress >>> tag 10889 (orangefs_create) -- operation to be retried (1 attempt) >>> service_operation: orangefs_create op:ffff880012898000: >>> service_operation:client core is NOT in service, ffff880012898000 >>> >>> >>> >>> service_operation: wait_for_matching_downcall returned 0 for ffff880012898000 >>> service_operation orangefs_create returning: 0 for ffff880012898000 >>> orangefs_create: PPTOOLS1.PPA: >>> handle:00000000-0000-0000-0000-000000000000: fsid:0: >>> new_op:ffff880012898000: ret:0: >>> >>> >>> >>> Alloced OP (ffff880012888000: 10958 OP_GETATTR) >>> service_operation: orangefs_inode_getattr op:ffff880012888000: >>> service_operation: wait_for_matching_downcall returned 0 for ffff880012888000 >>> service_operation orangefs_inode_getattr returning: -22 for ffff880012888000 >>> Releasing OP (ffff880012888000: 10958 >>> orangefs_create: Failed to allocate inode for file :PPTOOLS1.PPA: >>> Releasing OP (ffff880012898000: 10889 >>> >>> >>> >>> >>> What I'm testing with differs from what is at kernel.org#for-next by >>> - diffs from Al's most recent email >>> - 1 souped up gossip message >>> - changed 0 to OP_VFS_STATE_UNKNOWN one place in service_operation >>> - reinit_completion(&op->waitq) in orangefs_clean_up_interrupted_operation >>> >>> >>> >> >> Mike, >> >> what error do you get from userspace (i.e. from dbench)? >> >> open("./clients/client0/~dmtmp/EXCEL/5D7C0000", O_RDWR|O_CREAT, 0600) = -1 ENODEV (No such device) >> >> An interesting note is that I can't reproduce at all >> with only one dbench process. It seems there's not >> enough load. >> >> I don't see how the kernel could return ENODEV at all. >> This may be coming from our client-core. >> >> -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html