On 19-6-2017 17:45, Willem Jan Withagen wrote: > On 19-6-2017 16:55, Gregory Farnum wrote: >> On Mon, Jun 19, 2017 at 7:46 AM, Willem Jan Withagen <wjw@xxxxxxxxxxx> wrote: >>> Op 19-6-2017 om 16:31 schreef Sage Weil: >>>> >>>> On Mon, 19 Jun 2017, Willem Jan Withagen wrote: >>>>> >>>>> On 19-6-2017 14:56, Jason Dillaman wrote: >>>>>> >>>>>> On Sun, Jun 18, 2017 at 1:18 PM, Willem Jan Withagen <wjw@xxxxxxxxxxx> >>>>>> wrote: >>>>>>> >>>>>>> librbd/io/AioCompletion.cc:199:ssize_t >>>>>>> AioCompletion::get_return_value() { >>>>>> >>>>>> >>>>>> librbd just wraps librados, so I would think all the error codes >>>>>> should have already been properly translated before it reaches this >>>>>> level since otherwise any internal librbd error logging will output >>>>>> the incorrect failure reason. I'd suspect most of the client-side >>>>>> handling should probably be handled inside osdc/Objecter.h/cc.. >>>>> >>>>> Hi Jason, >>>>> >>>>> Thanx for the pointer. Changing any of the librbd stuff did indeed not >>>>> result in a working rados-stripper.sh >>>>> >>>>> Objecter.{h,cc} already had the forward error rewrite. I added the >>>>> reverse in the original patch. But obviously that is not enough (yet) >>>>> So I'll start digging a bit more in the librados files as you suggested. >>>> >>>> I think the place to do this is in MOSDOpReply.. that alone should be >>>> enough to do the translate as the value passes over the wire. >>> >>> >>> Hi Sage, >>> >>> Tehe interesting part of this is that ALL tests but one actually work. So >>> all tests that start >>> a cluster thru vstart actually do work. EXCEPT for rados-stiper.sh. >>> >>> Now this make me question what is different with the stiper code that causes >>> an ECANCEL >>> to not be translated back ot FreeBSD code. >> >> I'm not sure exactly how it's arranged, but libradosstriper is layered >> on top of librados and I don't think anybody's done any of the errno >> translation work for other platforms that you got pointed at. >> Depending on how it's done that may mean it's missing big chunks -- >> for instance, if libradosstriper embeds error codes that aren't >> touched by librados, it will need to do its own translation. > > Hi Greg, > > The error is on the path server -> client. > > How do I know: FreeBSD highest error number atm is 96. > ECANCELD is an expected return value in the stiper-code. > So server-side translation seems to be doing what it should. > Client-side code is: > > 1260 ./src/libradosstriper/RadosStriperImpl.cc > ==== > bl.append(oss.str()); > writeOp.setxattr(XATTR_SIZE, bl); > rc = m_ioCtx.operate(firstObjOid, &writeOp); > // return current size > *size = curSize; > // handle case where objectsize is already bigger than size > if (-ECANCELED == rc) > rc = 0; > if (rc) { > unlockObject(soid, *lockCookie); > lderr(cct()) << "RadosStriperImpl::openStripedObjectForWrite : " > << "could not set new size for " > << soid << " : rc = " << rc << dendl; > } > return rc; > ==== > > So I have ot drill down into m_ioCtx.operate. > But I'll first look at Sage's suggestion. Have not been able to find the right spot.... So uped the logging, and this is the first place where any reference to -125 is made: 116: 2017-06-20 01:24:21.556950 80fc18800 5 -- 127.0.0.1:0/1969737172 >> 127.0.0.1:6804/60048 conn(0x81065c000 :-1 s=STATE_OPEN _MESSAGE_READ_FOOTER_AND_DISPATCH pgs=2 cs=1 l=1). rx osd.1 seq 6 0x810696e00 osd_op_reply(5 toyfile.0000000000000000 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125) Unknown error: 125)) v8 116: 2017-06-20 01:24:21.556985 80fc18800 1 -- 127.0.0.1:0/1969737172 <== osd.1 127.0.0.1:6804/60048 6 ==== osd_op_reply(5 toyf ile.0000000000000000 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125) Unknown error: 125)) v8 ==== 210+0+0 (669224781 0 0) 0x810696e00 con 0x81065c000 116: 2017-06-20 01:24:21.557009 80fc18800 10 client.4115.objecter ms_dispatch 0x80fc33000 osd_op_reply(5 toyfile.000000000000000 0 [cmpxattr (8) op 3 mode 2,setxattr (4)] v19'4 uv3 ondisk = -125 ((125) Unknown error: 125)) v8 116: 2017-06-20 01:24:21.557024 80fc18800 10 client.4115.objecter in handle_osd_op_reply 116: 2017-06-20 01:24:21.557031 80fc18800 7 client.4115.objecter handle_osd_op_reply 5 ondisk uv 3 in 1.3 attempt 0 116: 2017-06-20 01:24:21.557038 80fc18800 10 client.4115.objecter op 0 rval -85 len 0 116: 2017-06-20 01:24:21.557043 80fc18800 10 client.4115.objecter op 1 rval 0 len 0 116: 2017-06-20 01:24:21.557047 80fc18800 15 client.4115.objecter handle_osd_op_reply completed tid 5 116: 2017-06-20 01:24:21.557050 80fc18800 15 client.4115.objecter finish_op 5 116: 2017-06-20 01:24:21.557056 80fc18800 20 client.4115.objecter put_session s=0x810695800 osd=1 4 116: 2017-06-20 01:24:21.557060 80fc18800 15 client.4115.objecter _session_op_remove 1 5 116: 2017-06-20 01:24:21.557073 80fc18800 5 client.4115.objecter 0 in flight 116: 2017-06-20 01:24:21.557085 80fc18800 20 client.4115.objecter put_session s=0x810695800 osd=1 3 This make me wonder and now the question is if this osd_reply contains the numeric error value or is it a formatted text error report of some event on the server and there is already a translation problem on the server, and not in the client. --WjW -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html