Re: Orangefs ABI documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeah, it looks like the fault is entirely with the client-core...

orangefs-kernel.h:      OP_VFS_STATE_UNKNOWN = 0,
orangefs-kernel.h:      OP_VFS_STATE_WAITING = 1,
orangefs-kernel.h:      OP_VFS_STATE_INPROGR = 2,
orangefs-kernel.h:      OP_VFS_STATE_SERVICED = 4,
orangefs-kernel.h:      OP_VFS_STATE_PURGED = 8,
orangefs-kernel.h:      OP_VFS_STATE_GIVEN_UP = 16,


Alloced OP (ffff880011078000: 20210 OP_CREATE)
service_operation: orangefs_create op:ffff880011078000:
service_op: orangefs_create op:ffff880011078000: process:dbench state -> 1

orangefs_devreq_read: op:ffff880011078000: process:pvfs2-client-co state -> 2

set_op_state_purged: op:ffff880011078000: process:pvfs2-client-co state -> 10

wait_for_matching_downcall: operation purged (tag 20210, ffff880011078000, att 0
service_operation: wait_for_matching_downcall returned -11 for ffff880011078000
Interrupted: Removed op ffff880011078000 from htable_ops_in_progress
tag 20210 (orangefs_create) -- operation to be retried (1 attempt)
service_operation: orangefs_create op:ffff880011078000:
process:dbench: pid:1171service_op: orangefs_create
op:ffff880011078000: process:dbench state -> 1
service_operation:client core is NOT in service, ffff880011078000

orangefs_devreq_read: op:ffff880011078000: process:pvfs2-client-co state -> 2

WARNING: CPU: 0 PID: 1216 at fs/orangefs/devorangefs-req.c:423
set_op_state_serviced: op:ffff880011078000: process:pvfs2-client-co state -> 4
service_operation: wait_for_matching_downcall returned 0 for ffff880011078000
service_operation orangefs_create returning: 0 for ffff880011078000
orangefs_create: BENCHS.LWP:
handle:00000000-0000-0000-0000-000000000000: fsid:0:
new_op:ffff880011078000: ret:0:

-Mike

On Thu, Feb 18, 2016 at 3:22 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote:
> I haven't edited up a list of how the debug output looked,
> but most importantly: the WARN_ON is hit... it appears that
> the client-core is sending over fsid:0:
>
> -Mike
>
> On Thu, Feb 18, 2016 at 3:08 PM, Mike Marshall <hubcap@xxxxxxxxxxxx> wrote:
>> I haven't been trussing it... it reports EINVAL to stderr... I find
>> the ops to look
>> at in the debug output by looking for the -22...
>>
>> (373) open ./clients/client8/~dmtmp/PARADOX/STUDENTS.DB failed for
>> handle 9981 (Invalid argument)
>>
>> I just got the whacky code <g> from Al's last message to compile, I'll
>> have results from that soon...
>>
>> -Mike
>>
>> On Thu, Feb 18, 2016 at 2:49 PM, Martin Brandenburg <martin@xxxxxxxxxxxx> wrote:
>>> On Thu, 18 Feb 2016, Mike Marshall wrote:
>>>
>>>> Still busted, exactly the same, I think. The doomed op gets a good
>>>> return code from is_daemon_in_service in service_operation but
>>>> gets EAGAIN from wait_for_matching_downcall... an edge case kind of
>>>> problem.
>>>>
>>>> Here's the raw (well, slightly edited for readability) logs showing
>>>> the doomed op and subsequent failed op that uses the bogus handle
>>>> and fsid from the doomed op.
>>>>
>>>>
>>>>
>>>> Alloced OP (ffff880012898000: 10889 OP_CREATE)
>>>> service_operation: orangefs_create op:ffff880012898000:
>>>>
>>>>
>>>>
>>>> wait_for_matching_downcall: operation purged (tag 10889, ffff880012898000, att 0
>>>> service_operation: wait_for_matching_downcall returned -11 for ffff880012898000
>>>> Interrupted: Removed op ffff880012898000 from htable_ops_in_progress
>>>> tag 10889 (orangefs_create) -- operation to be retried (1 attempt)
>>>> service_operation: orangefs_create op:ffff880012898000:
>>>> service_operation:client core is NOT in service, ffff880012898000
>>>>
>>>>
>>>>
>>>> service_operation: wait_for_matching_downcall returned 0 for ffff880012898000
>>>> service_operation orangefs_create returning: 0 for ffff880012898000
>>>> orangefs_create: PPTOOLS1.PPA:
>>>> handle:00000000-0000-0000-0000-000000000000: fsid:0:
>>>> new_op:ffff880012898000: ret:0:
>>>>
>>>>
>>>>
>>>> Alloced OP (ffff880012888000: 10958 OP_GETATTR)
>>>> service_operation: orangefs_inode_getattr op:ffff880012888000:
>>>> service_operation: wait_for_matching_downcall returned 0 for ffff880012888000
>>>> service_operation orangefs_inode_getattr returning: -22 for ffff880012888000
>>>> Releasing OP (ffff880012888000: 10958
>>>> orangefs_create: Failed to allocate inode for file :PPTOOLS1.PPA:
>>>> Releasing OP (ffff880012898000: 10889
>>>>
>>>>
>>>>
>>>>
>>>> What I'm testing with differs from what is at kernel.org#for-next by
>>>>   - diffs from Al's most recent email
>>>>   - 1 souped up gossip message
>>>>   - changed 0 to OP_VFS_STATE_UNKNOWN one place in service_operation
>>>>   - reinit_completion(&op->waitq) in orangefs_clean_up_interrupted_operation
>>>>
>>>>
>>>>
>>>
>>> Mike,
>>>
>>> what error do you get from userspace (i.e. from dbench)?
>>>
>>> open("./clients/client0/~dmtmp/EXCEL/5D7C0000", O_RDWR|O_CREAT, 0600) = -1 ENODEV (No such device)
>>>
>>> An interesting note is that I can't reproduce at all
>>> with only one dbench process. It seems there's not
>>> enough load.
>>>
>>> I don't see how the kernel could return ENODEV at all.
>>> This may be coming from our client-core.
>>>
>>> -- Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux