Re: librados3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



+ ceph-devel

Jason, thanks for the analysis and explanation. so the bottom-line is
that, after upgrading with librados2-compat and librbd1 offered by
nautilus, QEMU will continue working.  so the solution will be:

1. will offer librados2-compact which offers the minimal set of
symbols previously provided by librados2 that used by QEMU
2. the new librbd1 will conflict with librados2

i will work on this this week.

On Sat, Jan 12, 2019 at 1:49 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
>
> Definitely appears to be symbol stomping:
>
> The "rados_create" is using librados2 which in turn calls
> common_preinit. However, I have two different versions of that
> function:
>
> break common_preinit
> Breakpoint 1 at 0x7fffdd181ba0: common_preinit. (4 locations)
> (gdb) info breakpoints
> Num     Type           Disp Enb Address            What
> 1       breakpoint     keep y   <MULTIPLE>
> 1.1                         y     0x00007fffdd181ba0 in
> common_preinit(CephInitParameters const&, code_environment_t, int) at
> ./src/common/common_init.cc:28
> 1.2                         y     0x00007fffe5d4afe0
> <common_preinit(CephInitParameters const&, code_environment_t,
> int)@plt>
> 1.3                         y     0x00007fffe63d0940 in
> common_preinit(CephInitParameters const&, code_environment_t, int) at
> ./src/common/common_init.cc:33
> 1.4                         y     0x00007fffef12df00
> <common_preinit(CephInitParameters const&, code_environment_t,
> int)@plt>
>
> (gdb) info sharedlibrary
> ...
> 0x00007fffe62a75f0  0x00007fffe686e3be  Yes
> /usr/lib/ceph/libceph-common.so.1
> ...
> 0x00007fffdcec79f0  0x00007fffdd40a3fe  Yes
> /usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.0
>
> If I step through to the invokation of "common_preinit", it seems to
> call the new version within the librados2 library:
>
> Thread 1 "qemu-img" hit Breakpoint 2, rados_create_cct
> (clustername=clustername@entry=0x7fffef1dc87c "",
> iparams=iparams@entry=0x7fffd95fedc0) at
> ./src/librados/librados.cc:2769
> 2769 ./src/librados/librados.cc: No such file or directory.
> (gdb) step
> Thread 1 "qemu-img" hit Breakpoint 1, 0x00007fffef12df00 in
> common_preinit(CephInitParameters const&, code_environment_t, int)@plt
> () from /usr/lib/x86_64-linux-gnu/librados.so.2
>
> This allocates a new version of CephContext that isn't compatible w/
> librados2. Of course, if you reversed the order of symbol lookups
> somehow, if the old librados2 creates an old CephContext, librbd1
> would crash when attempting to use it since it would expect the new
> version.
>
>
> On Fri, Jan 11, 2019 at 12:30 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> >
> > Here is an example of the failures that started as soon as librados3
> > was introduced [1]:
> >
> > 2018-12-25T03:46:32.581 INFO:teuthology.orchestra.run.smithi174:>
> > qemu-img convert -f qcow2 -O raw
> > /home/ubuntu/cephtest/qemu/base.client.0.qcow2 rbd:rbd/client.0.0
> > 2018-12-25T03:46:32.612 DEBUG:teuthology.orchestra.run:got remote
> > process result: None
> > 2018-12-25T03:46:32.613 ERROR:teuthology.contextutil:Saw exception
> > from nested tasks
> > Traceback (most recent call last):
> >   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py",
> > line 30, in nested
> >     vars.append(enter())
> >   File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
> >     return self.gen.next()
> >   File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/qemu.py",
> > line 242, in download_image
> >     base_file, 'rbd:rbd/{image_name}'.format(image_name=image_name)
> >   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py",
> > line 194, in run
> >     r = self._runner(client=self.ssh, name=self.shortname, **kwargs)
> >   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py",
> > line 430, in run
> >     r.wait()
> >   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py",
> > line 162, in wait
> >     self._raise_for_status()
> >   File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py",
> > line 180, in _raise_for_status
> >     raise CommandCrashedError(command=self.command)
> > CommandCrashedError: Command crashed: 'qemu-img convert -f qcow2 -O
> > raw /home/ubuntu/cephtest/qemu/base.client.0.qcow2 rbd:rbd/client.0.0'
> >
> >
> > Running this in my environment I can see:
> >
> > Thread 1 "qemu-img" received signal SIGSEGV, Segmentation fault.
> > std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >::_M_replace (this=0x7ffff651f5a8
> > <_nl_global_locale+72>, __pos=<optimized out>, __len1=<optimized out>,
> >     __s=0x7fffef1dc87c "", __len2=<optimized out>) at
> > /usr/include/c++/7/bits/basic_string.tcc:468
> > 468 /usr/include/c++/7/bits/basic_string.tcc: No such file or directory.
> > (gdb) bt
> > #0  std::__cxx11::basic_string<char, std::char_traits<char>,
> > std::allocator<char> >::_M_replace (this=0x7ffff651f5a8
> > <_nl_global_locale+72>, __pos=<optimized out>, __len1=<optimized out>,
> >     __s=0x7fffef1dc87c "", __len2=<optimized out>) at
> > /usr/include/c++/7/bits/basic_string.tcc:468
> > #1  0x00007fffef130017 in ?? () from /usr/lib/x86_64-linux-gnu/librados.so.2
> > #2  0x00007fffef134d0a in rados_create () from
> > /usr/lib/x86_64-linux-gnu/librados.so.2
> > #3  0x00007fffefbfafbd in ?? () from /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so
> > #4  0x00005555555af677 in bdrv_create_co_entry (opaque=0x7fffd95fef10)
> > at ./block.c:420
> > #5  0x00005555555b01d4 in bdrv_create (drv=0x7fffefdfd020,
> > filename=filename@entry=0x555555dec3c0 "rbd:rbd/base",
> > opts=opts@entry=0x555555dcb700, errp=errp@entry=0x7fffd95fef60)
> >     at ./block.c:447
> > #6  0x00005555555b0639 in bdrv_create_file (filename=0x555555dec3c0
> > "rbd:rbd/base", opts=0x555555dcb700, errp=0x7fffd95fef90) at
> > ./block.c:481
> > #7  0x00005555555af677 in bdrv_create_co_entry (opaque=0x7fffffffe100)
> > at ./block.c:420
> > #8  0x00005555556854c6 in coroutine_trampoline (i0=<optimized out>,
> > i1=<optimized out>) at ./util/coroutine-ucontext.c:79
> > #9  0x00007ffff618b6b0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> > #10 0x00007fffffffd970 in ?? ()
> > #11 0x0000000000000000 in ?? ()
> >
> >
> > $ dpkg --list | grep RADOS
> > ii  librados2                             12.2.8-0ubuntu0.18.04.1
> >      amd64        RADOS distributed object store client library
> > ii  librados3                             14.0.1-2445-g5dfcb55-1bionic
> >      amd64        RADOS distributed object store client library
> > ii  libradospp1                           14.0.1-2445-g5dfcb55-1bionic
> >      amd64        RADOS distributed object store client C++ library
> > ii  libradosstriper1                      14.0.1-2445-g5dfcb55-1bionic
> >      amd64        RADOS striping interface
> > ii  librbd1                               14.0.1-2445-g5dfcb55-1bionic
> >      amd64        RADOS block device client library
> > ii  librgw2                               14.0.1-2445-g5dfcb55-1bionic
> >      amd64        RADOS Gateway client library
> >
> >
> > [1] http://qa-proxy.ceph.com/teuthology/teuthology-2018-12-20_02:01:03-rbd-master-distro-basic-smithi/3382335/teuthology.log
> >
> > On Fri, Jan 11, 2019 at 11:52 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > >
> > > How would that work? You'd end up w/ qemu-kvm linked against librados2
> > > from an older Ceph and librbd1 from the new Ceph. librbd1 would be
> > > linked against librados3 which would pull in librados3 which would
> > > provide the same symbols for things like "rados_connect" used by QEMU.
> > > Not to mention that both would be linked to "libceph-common.so.1"?
> > >
> > > On Fri, Jan 11, 2019 at 11:44 AM kefu chai <tchaikov@xxxxxxxxx> wrote:
> > > >
> > > > On Sat, Jan 12, 2019 at 12:15 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Jan 11, 2019 at 11:12 AM kefu chai <tchaikov@xxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Jan 8, 2019 at 9:42 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Tue, Jan 8, 2019 at 6:26 AM kefu chai <tchaikov@xxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > On Tue, Jan 8, 2019 at 9:27 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > Apologies for resurrecting this thread, but do have a plan forward for
> > > > > > > > > QEMU's librados2 dependencies post-Nautilus upgrade? Do we need to
> > > > > > > > > create a librados2-compat RPM/DEB that has a dummy librados2 library?
> > > > > > > >
> > > > > > > > Jason, thanks for raising this concern. yeah, i think we should offer
> > > > > > > > an upgrade path for QEMU and probably other existing librados2's C API
> > > > > > > > consumers, i will try to whip up a librados2-compat package before
> > > > > > > > nautilus rolls out.
> > > > > > >
> > > > > > > Thanks. Let me know if you need / want help.
> > > > > > >
> > > > > >
> > > > > > Jason, i just realized that you are talking about runtime
> > > > > > compatibility instead of built-time compatibility. i thought it was
> > > > > > something like libmariadbclient-dev-compat [0], which provides
> > > > > > symbolic links like
> > > > > >
> > > > > > /usr/lib/x86_64-linux-gnu/libmariadbclient.so <=
> > > > > > /usr/lib/x86_64-linux-gnu/libmysqlclient.so
> > > > > >
> > > > > > but apparently, librados2-compat is not a dev package. and we does
> > > > > > offer librados.so so the existing librados2 application can compile
> > > > > > just fine with the new librados-dev package.
> > > > > >
> > > > > > so presumably, you are suggesting a package which offers a just-enough
> > > > > > librados.so.2 which wraps (or even links to!) librados.so.3. and just
> > > > > > like its predecessor, this new librados.so.2 will work with QEMU
> > > > > > compiling against the old librados-dev. am i right? but this approach
> > > > > > is a little bit controversial, IMHO. as we'd shipping a librados2
> > > > > > which is not strictly backward compatible to librados2 API.
> > > > > >
> > > > > > so back to your question, i don't see the need to provide the
> > > > > > runtime-compatible package for existing librados2 clients.
> > > > >
> > > > > Well, we *cannot* release Nautilus if an upgrade will remove QEMU on
> > > > > systems -- I think we can all agree to that. We also are not going to
> > > > > be able to update N-number of distro operating systems to remove the
> > > > > QEMU package's (soft) dependency on librados2. Therefore, I *need* to
> > > > > provide a librados2 package with Nautilus.
> > > >
> > > > i don't follow you. nautilus does not conflict with librados2. as
> > > > librados3 is a different packge, which can co-exists with librados2.
> > > >
> > > > >
> > > > > > ---
> > > > > >
> > > > > > [0] https://packages.debian.org/sid/libmariadbclient-dev-compat
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Regards
> > > > > > Kefu Chai
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Jason
> > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > Kefu Chai
> > >
> > >
> > >
> > > --
> > > Jason
> >
> >
> >
> > --
> > Jason
>
>
>
> --
> Jason



-- 
Regards
Kefu Chai



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux