Re: Caught the first erroneous translated errorcode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19-6-2017 17:45, Willem Jan Withagen wrote:
> On 19-6-2017 16:55, Gregory Farnum wrote:
>> On Mon, Jun 19, 2017 at 7:46 AM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote:
>>> Op 19-6-2017 om 16:31 schreef Sage Weil:
>>>>
>>>> On Mon, 19 Jun 2017, Willem Jan Withagen wrote:
>>>>>
>>>>> On 19-6-2017 14:56, Jason Dillaman wrote:
>>>>>>
>>>>>> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@xxxxxxxxxxx>
>>>>>> wrote:
>>>>>>>
>>>>>>> librbd/io/AioCompletion.cc:199:ssize_t
>>>>>>> AioCompletion::get_return_value() {
>>>>>>
>>>>>>
>>>>>> librbd just wraps librados, so I would think all the error codes
>>>>>> should have already been properly translated before it reaches this
>>>>>> level since otherwise any internal librbd error logging will output
>>>>>> the incorrect failure reason. I'd suspect most of the client-side
>>>>>> handling should probably be handled inside osdc/Objecter.h/cc..
>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Thanx for the pointer. Changing any of the librbd stuff did indeed not
>>>>> result in a working rados-stripper.sh
>>>>>
>>>>> Objecter.{h,cc} already had the forward error rewrite. I added the
>>>>> reverse in the original patch. But obviously that is not enough (yet)
>>>>> So I'll start digging a bit more in the librados files as you suggested.
>>>>
>>>> I think the place to do this is in MOSDOpReply.. that alone should be
>>>> enough to do the translate as the value passes over the wire.
>>>
>>>
>>> Hi Sage,
>>>
>>> Tehe interesting part of this is that ALL tests but one actually work. So
>>> all tests that start
>>> a cluster thru vstart actually do work. EXCEPT for rados-stiper.sh.
>>>
>>> Now this make me question what is different with the stiper code that causes
>>> an ECANCEL
>>> to not be translated back ot FreeBSD code.
>>
>> I'm not sure exactly how it's arranged, but libradosstriper is layered
>> on top of librados and I don't think anybody's done any of the errno
>> translation work for other platforms that you got pointed at.
>> Depending on how it's done that may mean it's missing big chunks --
>> for instance, if libradosstriper embeds error codes that aren't
>> touched by librados, it will need to do its own translation.
> 
> Hi Greg,
> 
> The error is on the path server -> client.
> 
> How do I know: FreeBSD highest error number atm is 96.
> ECANCELD is an expected return value in the stiper-code.
> So server-side  translation seems to be doing what it should.
> Client-side code is:
> 
> 1260 ./src/libradosstriper/RadosStriperImpl.cc
> ====
>   bl.append(oss.str());
>   writeOp.setxattr(XATTR_SIZE, bl);
>   rc = m_ioCtx.operate(firstObjOid, &writeOp);
>   // return current size
>   *size = curSize;
>   // handle case where objectsize is already bigger than size
>   if (-ECANCELED == rc)
>     rc = 0;
>   if (rc) {
>     unlockObject(soid, *lockCookie);
>     lderr(cct()) << "RadosStriperImpl::openStripedObjectForWrite : "
>                    << "could not set new size for "
>                    << soid << " : rc = " << rc << dendl;
>   }
>   return rc;
> ====
> 
> So I have ot drill down into m_ioCtx.operate.
> But I'll first look at Sage's suggestion.

Have not been able to find the right spot....
So uped the logging, and this is the first place where any reference to
-125 is made:
116: 2017-06-20 01:24:21.556950 80fc18800  5 -- 127.0.0.1:0/1969737172
>> 127.0.0.1:6804/60048 conn(0x81065c000 :-1 s=STATE_OPEN
_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=1). rx osd.1 seq 6
0x810696e00 osd_op_reply(5 toyfile.0000000000000000 [cmpxattr
(8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125) Unknown
error: 125)) v8
116: 2017-06-20 01:24:21.556985 80fc18800  1 -- 127.0.0.1:0/1969737172
<== osd.1 127.0.0.1:6804/60048 6 ==== osd_op_reply(5 toyf
ile.0000000000000000 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3
ondisk = -125 ((125) Unknown error: 125)) v8 ==== 210+0+0
 (669224781 0 0) 0x810696e00 con 0x81065c000
116: 2017-06-20 01:24:21.557009 80fc18800 10 client.4115.objecter
ms_dispatch 0x80fc33000 osd_op_reply(5 toyfile.000000000000000
0 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125)
Unknown error: 125)) v8
116: 2017-06-20 01:24:21.557024 80fc18800 10 client.4115.objecter in
handle_osd_op_reply
116: 2017-06-20 01:24:21.557031 80fc18800  7 client.4115.objecter
handle_osd_op_reply 5 ondisk uv 3 in 1.3 attempt 0
116: 2017-06-20 01:24:21.557038 80fc18800 10 client.4115.objecter  op 0
rval -85 len 0
116: 2017-06-20 01:24:21.557043 80fc18800 10 client.4115.objecter  op 1
rval 0 len 0
116: 2017-06-20 01:24:21.557047 80fc18800 15 client.4115.objecter
handle_osd_op_reply completed tid 5
116: 2017-06-20 01:24:21.557050 80fc18800 15 client.4115.objecter
finish_op 5
116: 2017-06-20 01:24:21.557056 80fc18800 20 client.4115.objecter
put_session s=0x810695800 osd=1 4
116: 2017-06-20 01:24:21.557060 80fc18800 15 client.4115.objecter
_session_op_remove 1 5
116: 2017-06-20 01:24:21.557073 80fc18800  5 client.4115.objecter 0 in
flight
116: 2017-06-20 01:24:21.557085 80fc18800 20 client.4115.objecter
put_session s=0x810695800 osd=1 3

This make me wonder and now the question is if this osd_reply contains
the numeric error value or is it a formatted text error report of some
event on the server and there is already a translation problem on the
server, and not in the client.

--WjW
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux