(circling back for the wider audience) I've opened a new PR [1] to revert back from librados3 + libradospp1 to librados2. I've added explicit symbol versioning to all librados API methods and fixed some backwards compatibility issues in the librados C API that were introduced in the Nautilus cycle. The C++ API, however, will be non-ABI compatible with prior releases but now documented as a non-stable ABI between Ceph major releases. Note that we have broken the librados C++ ABI in the past (e.g. explicitly when switching to C++11 ABI mode and accidentally numerous times due to ceph::buffer::list tweaks) without any external complaints. One big question is whether or not it would still be worthwhile to still pull the C++ API from the librados2 shared library into its own libradospp1 shared library. The plan would still be that it would not be a stable API between releases, so I'm not sure of the long-term value besides adding an extra initial hurdle when 3rd party programs attempt to develop against it. On Wed, Jan 16, 2019 at 5:19 AM kefu chai <tchaikov@xxxxxxxxx> wrote: > > On Mon, Jan 14, 2019 at 6:52 PM kefu chai <tchaikov@xxxxxxxxx> wrote: > > > > + ceph-devel > > > > Jason, thanks for the analysis and explanation. so the bottom-line is > > that, after upgrading with librados2-compat and librbd1 offered by > > nautilus, QEMU will continue working. so the solution will be: > > > > 1. will offer librados2-compact which offers the minimal set of > > symbols previously provided by librados2 that used by QEMU > > 2. the new librbd1 will conflict with librados2 > > > > i will work on this this week. > > https://github.com/ceph/ceph/pull/25982 > > > > > On Sat, Jan 12, 2019 at 1:49 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > Definitely appears to be symbol stomping: > > > > > > The "rados_create" is using librados2 which in turn calls > > > common_preinit. However, I have two different versions of that > > > function: > > > > > > break common_preinit > > > Breakpoint 1 at 0x7fffdd181ba0: common_preinit. (4 locations) > > > (gdb) info breakpoints > > > Num Type Disp Enb Address What > > > 1 breakpoint keep y <MULTIPLE> > > > 1.1 y 0x00007fffdd181ba0 in > > > common_preinit(CephInitParameters const&, code_environment_t, int) at > > > ./src/common/common_init.cc:28 > > > 1.2 y 0x00007fffe5d4afe0 > > > <common_preinit(CephInitParameters const&, code_environment_t, > > > int)@plt> > > > 1.3 y 0x00007fffe63d0940 in > > > common_preinit(CephInitParameters const&, code_environment_t, int) at > > > ./src/common/common_init.cc:33 > > > 1.4 y 0x00007fffef12df00 > > > <common_preinit(CephInitParameters const&, code_environment_t, > > > int)@plt> > > > > > > (gdb) info sharedlibrary > > > ... > > > 0x00007fffe62a75f0 0x00007fffe686e3be Yes > > > /usr/lib/ceph/libceph-common.so.1 > > > ... > > > 0x00007fffdcec79f0 0x00007fffdd40a3fe Yes > > > /usr/lib/x86_64-linux-gnu/ceph/libceph-common.so.0 > > > > > > If I step through to the invokation of "common_preinit", it seems to > > > call the new version within the librados2 library: > > > > > > Thread 1 "qemu-img" hit Breakpoint 2, rados_create_cct > > > (clustername=clustername@entry=0x7fffef1dc87c "", > > > iparams=iparams@entry=0x7fffd95fedc0) at > > > ./src/librados/librados.cc:2769 > > > 2769 ./src/librados/librados.cc: No such file or directory. > > > (gdb) step > > > Thread 1 "qemu-img" hit Breakpoint 1, 0x00007fffef12df00 in > > > common_preinit(CephInitParameters const&, code_environment_t, int)@plt > > > () from /usr/lib/x86_64-linux-gnu/librados.so.2 > > > > > > This allocates a new version of CephContext that isn't compatible w/ > > > librados2. Of course, if you reversed the order of symbol lookups > > > somehow, if the old librados2 creates an old CephContext, librbd1 > > > would crash when attempting to use it since it would expect the new > > > version. > > > > > > > > > On Fri, Jan 11, 2019 at 12:30 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > > > Here is an example of the failures that started as soon as librados3 > > > > was introduced [1]: > > > > > > > > 2018-12-25T03:46:32.581 INFO:teuthology.orchestra.run.smithi174:> > > > > qemu-img convert -f qcow2 -O raw > > > > /home/ubuntu/cephtest/qemu/base.client.0.qcow2 rbd:rbd/client.0.0 > > > > 2018-12-25T03:46:32.612 DEBUG:teuthology.orchestra.run:got remote > > > > process result: None > > > > 2018-12-25T03:46:32.613 ERROR:teuthology.contextutil:Saw exception > > > > from nested tasks > > > > Traceback (most recent call last): > > > > File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/contextutil.py", > > > > line 30, in nested > > > > vars.append(enter()) > > > > File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__ > > > > return self.gen.next() > > > > File "/home/teuthworker/src/git.ceph.com_ceph_master/qa/tasks/qemu.py", > > > > line 242, in download_image > > > > base_file, 'rbd:rbd/{image_name}'.format(image_name=image_name) > > > > File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/remote.py", > > > > line 194, in run > > > > r = self._runner(client=self.ssh, name=self.shortname, **kwargs) > > > > File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", > > > > line 430, in run > > > > r.wait() > > > > File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", > > > > line 162, in wait > > > > self._raise_for_status() > > > > File "/home/teuthworker/src/git.ceph.com_git_teuthology_master/teuthology/orchestra/run.py", > > > > line 180, in _raise_for_status > > > > raise CommandCrashedError(command=self.command) > > > > CommandCrashedError: Command crashed: 'qemu-img convert -f qcow2 -O > > > > raw /home/ubuntu/cephtest/qemu/base.client.0.qcow2 rbd:rbd/client.0.0' > > > > > > > > > > > > Running this in my environment I can see: > > > > > > > > Thread 1 "qemu-img" received signal SIGSEGV, Segmentation fault. > > > > std::__cxx11::basic_string<char, std::char_traits<char>, > > > > std::allocator<char> >::_M_replace (this=0x7ffff651f5a8 > > > > <_nl_global_locale+72>, __pos=<optimized out>, __len1=<optimized out>, > > > > __s=0x7fffef1dc87c "", __len2=<optimized out>) at > > > > /usr/include/c++/7/bits/basic_string.tcc:468 > > > > 468 /usr/include/c++/7/bits/basic_string.tcc: No such file or directory. > > > > (gdb) bt > > > > #0 std::__cxx11::basic_string<char, std::char_traits<char>, > > > > std::allocator<char> >::_M_replace (this=0x7ffff651f5a8 > > > > <_nl_global_locale+72>, __pos=<optimized out>, __len1=<optimized out>, > > > > __s=0x7fffef1dc87c "", __len2=<optimized out>) at > > > > /usr/include/c++/7/bits/basic_string.tcc:468 > > > > #1 0x00007fffef130017 in ?? () from /usr/lib/x86_64-linux-gnu/librados.so.2 > > > > #2 0x00007fffef134d0a in rados_create () from > > > > /usr/lib/x86_64-linux-gnu/librados.so.2 > > > > #3 0x00007fffefbfafbd in ?? () from /usr/lib/x86_64-linux-gnu/qemu/block-rbd.so > > > > #4 0x00005555555af677 in bdrv_create_co_entry (opaque=0x7fffd95fef10) > > > > at ./block.c:420 > > > > #5 0x00005555555b01d4 in bdrv_create (drv=0x7fffefdfd020, > > > > filename=filename@entry=0x555555dec3c0 "rbd:rbd/base", > > > > opts=opts@entry=0x555555dcb700, errp=errp@entry=0x7fffd95fef60) > > > > at ./block.c:447 > > > > #6 0x00005555555b0639 in bdrv_create_file (filename=0x555555dec3c0 > > > > "rbd:rbd/base", opts=0x555555dcb700, errp=0x7fffd95fef90) at > > > > ./block.c:481 > > > > #7 0x00005555555af677 in bdrv_create_co_entry (opaque=0x7fffffffe100) > > > > at ./block.c:420 > > > > #8 0x00005555556854c6 in coroutine_trampoline (i0=<optimized out>, > > > > i1=<optimized out>) at ./util/coroutine-ucontext.c:79 > > > > #9 0x00007ffff618b6b0 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 > > > > #10 0x00007fffffffd970 in ?? () > > > > #11 0x0000000000000000 in ?? () > > > > > > > > > > > > $ dpkg --list | grep RADOS > > > > ii librados2 12.2.8-0ubuntu0.18.04.1 > > > > amd64 RADOS distributed object store client library > > > > ii librados3 14.0.1-2445-g5dfcb55-1bionic > > > > amd64 RADOS distributed object store client library > > > > ii libradospp1 14.0.1-2445-g5dfcb55-1bionic > > > > amd64 RADOS distributed object store client C++ library > > > > ii libradosstriper1 14.0.1-2445-g5dfcb55-1bionic > > > > amd64 RADOS striping interface > > > > ii librbd1 14.0.1-2445-g5dfcb55-1bionic > > > > amd64 RADOS block device client library > > > > ii librgw2 14.0.1-2445-g5dfcb55-1bionic > > > > amd64 RADOS Gateway client library > > > > > > > > > > > > [1] http://qa-proxy.ceph.com/teuthology/teuthology-2018-12-20_02:01:03-rbd-master-distro-basic-smithi/3382335/teuthology.log > > > > > > > > On Fri, Jan 11, 2019 at 11:52 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > > > > > How would that work? You'd end up w/ qemu-kvm linked against librados2 > > > > > from an older Ceph and librbd1 from the new Ceph. librbd1 would be > > > > > linked against librados3 which would pull in librados3 which would > > > > > provide the same symbols for things like "rados_connect" used by QEMU. > > > > > Not to mention that both would be linked to "libceph-common.so.1"? > > > > > > > > > > On Fri, Jan 11, 2019 at 11:44 AM kefu chai <tchaikov@xxxxxxxxx> wrote: > > > > > > > > > > > > On Sat, Jan 12, 2019 at 12:15 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Fri, Jan 11, 2019 at 11:12 AM kefu chai <tchaikov@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > On Tue, Jan 8, 2019 at 9:42 PM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > On Tue, Jan 8, 2019 at 6:26 AM kefu chai <tchaikov@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > On Tue, Jan 8, 2019 at 9:27 AM Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > > > > > Apologies for resurrecting this thread, but do have a plan forward for > > > > > > > > > > > QEMU's librados2 dependencies post-Nautilus upgrade? Do we need to > > > > > > > > > > > create a librados2-compat RPM/DEB that has a dummy librados2 library? > > > > > > > > > > > > > > > > > > > > Jason, thanks for raising this concern. yeah, i think we should offer > > > > > > > > > > an upgrade path for QEMU and probably other existing librados2's C API > > > > > > > > > > consumers, i will try to whip up a librados2-compat package before > > > > > > > > > > nautilus rolls out. > > > > > > > > > > > > > > > > > > Thanks. Let me know if you need / want help. > > > > > > > > > > > > > > > > > > > > > > > > > Jason, i just realized that you are talking about runtime > > > > > > > > compatibility instead of built-time compatibility. i thought it was > > > > > > > > something like libmariadbclient-dev-compat [0], which provides > > > > > > > > symbolic links like > > > > > > > > > > > > > > > > /usr/lib/x86_64-linux-gnu/libmariadbclient.so <= > > > > > > > > /usr/lib/x86_64-linux-gnu/libmysqlclient.so > > > > > > > > > > > > > > > > but apparently, librados2-compat is not a dev package. and we does > > > > > > > > offer librados.so so the existing librados2 application can compile > > > > > > > > just fine with the new librados-dev package. > > > > > > > > > > > > > > > > so presumably, you are suggesting a package which offers a just-enough > > > > > > > > librados.so.2 which wraps (or even links to!) librados.so.3. and just > > > > > > > > like its predecessor, this new librados.so.2 will work with QEMU > > > > > > > > compiling against the old librados-dev. am i right? but this approach > > > > > > > > is a little bit controversial, IMHO. as we'd shipping a librados2 > > > > > > > > which is not strictly backward compatible to librados2 API. > > > > > > > > > > > > > > > > so back to your question, i don't see the need to provide the > > > > > > > > runtime-compatible package for existing librados2 clients. > > > > > > > > > > > > > > Well, we *cannot* release Nautilus if an upgrade will remove QEMU on > > > > > > > systems -- I think we can all agree to that. We also are not going to > > > > > > > be able to update N-number of distro operating systems to remove the > > > > > > > QEMU package's (soft) dependency on librados2. Therefore, I *need* to > > > > > > > provide a librados2 package with Nautilus. > > > > > > > > > > > > i don't follow you. nautilus does not conflict with librados2. as > > > > > > librados3 is a different packge, which can co-exists with librados2. > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > [0] https://packages.debian.org/sid/libmariadbclient-dev-compat > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > Regards > > > > > > > > Kefu Chai > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Jason > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Regards > > > > > > Kefu Chai > > > > > > > > > > > > > > > > > > > > -- > > > > > Jason > > > > > > > > > > > > > > > > -- > > > > Jason > > > > > > > > > > > > -- > > > Jason > > > > > > > > -- > > Regards > > Kefu Chai > > > > -- > Regards > Kefu Chai [1] https://github.com/ceph/ceph/pull/26237 -- Jason