Re: Add a common library

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fabio M. Di Nitto napsal(a):
On 2/8/2012 9:41 PM, Steven Dake wrote:
On 02/08/2012 12:27 PM, Fabio M. Di Nitto wrote:
On 2/8/2012 8:19 PM, Steven Dake wrote:

Assuming you are upgrading from version X to X+1, the symbols will
simply move from static inline to the share object. Specially in this
case where the application is not affected directly.

application -> libfoo (with static inline)
application -> libfoo -> libcommon

Even in a downgrade case, libcorosync_common would not be referenced by
any of the libraries.

I don´t have a strong opinion here, but I would prefer to have it a
shared lib. We had some nasty issues with static before (ask Jan how
long it took for him to debug that handle_ corruption when linking static).


Hey, that handle_ thing was *real* pain. Took like week (40 hours) of extra hard work.

I don't want someone that has -cpg in their app to have to put in
-lcoroxsync_common to get access to the symbols.  It used to work
transparently, but was recently changed in fedora.
I tested exactly this case in rawhide today and it didn´t show the
problem. Unless they are enforcing it in mock only build and that would
be wrong from the Fedora part since those options should be default
everywhere in a fedora environment.

Then again, this is upstream and not fedora.

Fabio
mock may not match f16?

Not sure, but it does behave this way on my f16 platform.

So those are my tests results on F16:

Environment is F16, fresh install, packages have been scratch built in koji.

We start from the last version of corosync without any common library
concept.

[root@fedora16-node1 nolib]# ldd /usr/lib64/libcpg.so.4.1.0 linux-vdso.so.1 => (0x00007fffc18f0000)
        libqb.so.0 => /usr/lib64/libqb.so.0 (0x00007f96eef97000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f96eed8f000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f96eeb8a000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f96ee96e000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f96ee5b8000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f96ef409000)

[root@fedora16-node1 tests]# cat test.c #include <corosync/cpg.h>

int main() {
        cpg_handle_t foo;
        cpg_initialize(&foo, NULL);
        return 0;
}

[root@fedora16-node1 tests]# gcc -Wall test.c -lcpg
[root@fedora16-node1 tests]# ldd a.out linux-vdso.so.1 => (0x00007fff52eef000)
        libcpg.so.4 => /usr/lib64/libcpg.so.4 (0x00007f1c9a8ab000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f1c9a4f5000)
        libqb.so.0 => /usr/lib64/libqb.so.0 (0x00007f1c9a294000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f1c9a08c000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f1c99e88000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1c99c6b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f1c9aabd000)

[root@fedora16-node1 tests]# ./a.out [root@fedora16-node1 tests]# echo $?
0

then we move to the corosync with shared common library. test.c is NOT
rebuilt and no pkg-config is used.

[root@fedora16-node1 tests]# ldd /usr/lib64/libcpg.so.4.1.0 linux-vdso.so.1 => (0x00007fff373ff000)
        libqb.so.0 => /usr/lib64/libqb.so.0 (0x00007f2bddc53000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f2bdda4b000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f2bdd846000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f2bdd62a000)
        libcorosync_common.so.4 => /usr/lib64/libcorosync_common.so.4 (0x00007f2bdd428000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f2bdd071000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2bde0c5000)

[root@fedora16-node1 tests]# ldd a.out linux-vdso.so.1 => (0x00007fff531ff000)
        libcpg.so.4 => /usr/lib64/libcpg.so.4 (0x00007fa50c191000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fa50bddb000)
        libqb.so.0 => /usr/lib64/libqb.so.0 (0x00007fa50bb7a000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fa50b972000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fa50b76e000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa50b551000)
        libcorosync_common.so.4 => /usr/lib64/libcorosync_common.so.4 (0x00007fa50b34f000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fa50c3a3000)

[root@fedora16-node1 tests]# ./a.out [root@fedora16-node1 tests]# echo $?
0

Now, the reasons I have grasped so far for not doing a proper shared
library are:

1) it needs application rebuild (-lcorosync_common) based on
   http://fedoraproject.org/wiki/UnderstandingDSOLinkChange

2) fear of some symbol collisions

3) need to do -lcorosync_common on some applications

So let break this down a bit:

#1 does not apply to this case and this is why.

By default any application should (must really) link with the libraries
it uses.

The DSOLinkChange above make sure that it does happen in the correct way.

Here is a more easy to understand example than the one in the wiki (I
think at least):

libfoo exports foo_init();
libbar exports bar_init(); and in bar_init it calls foo_init();

libbar links (correctly) with libfoo.

_before_ the DSO change:

application baz could link with libbar and still call foo_init();
directly. This is clearly wrong since an application must link with all
it´s dependencies (this is for example mandate by libtool).

_after_ the DSO changes:

application baz can no longer call foo_init(); even if linked with
libbar, but must link libfoo directly.

Our applications have never used any symbol from corosync_common, at
best they have it linked statically inline (see below).

Our applications are linking directly to various corosync libraries and
that dependency does not change at all.

What changes is the libcpg (just to match the above example) dependency
that will load corosync_common.

test.c is not affect at all, doesn´t need a rebuild, doesn´t need code
changes.

2# symbol clashing tests

First we build the new test application with the old corotypes.h headers
(cs_strerror is still static inline)

[root@fedora16-node1 tests]# cat test.c #include <corosync/cpg.h>
#include <corosync/corotypes.h>
#include <stdio.h>

int main() {
        const char *err;
        cpg_handle_t foo;
        cpg_initialize(&foo, NULL);
        err = cs_strerror(CS_OK);
        printf("error: %s\n", err); // just some funny code ok?
        return 0;
}

[root@fedora16-node1 tests]# gcc -Wall test.c -lcpg
[root@fedora16-node1 tests]# ldd a.out linux-vdso.so.1 => (0x00007fffd7f61000)
        libcpg.so.4 => /usr/lib64/libcpg.so.4 (0x00007f708e9cd000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f708e617000)
        libqb.so.0 => /usr/lib64/libqb.so.0 (0x00007f708e3b6000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f708e1ae000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f708dfaa000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f708dd8d000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f708ebdf000)

[root@fedora16-node1 tests]# ./a.out error: success

now we upgrade to the new corosync with common shared library (no test.c
rebuild yet)

[root@fedora16-node1 tests]# ldd a.out linux-vdso.so.1 => (0x00007fff71bff000)
        libcpg.so.4 => /usr/lib64/libcpg.so.4 (0x00007f05d0714000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f05d035e000)
        libqb.so.0 => /usr/lib64/libqb.so.0 (0x00007f05d00fd000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f05cfef5000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f05cfcf1000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f05cfad4000)
        libcorosync_common.so.4 => /usr/lib64/libcorosync_common.so.4 (0x00007f05cf8d2000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f05d0926000)

[root@fedora16-node1 tests]# ./a.out error: success

The reason why there is no collision is because test.c has it static
inline and it is referenced directly in the object. DSO doesn´t even
know about it and doesn´t attempt to resolve it AFAICT.

So runtime is not affected at all.

#3 Flip side of the coin, is that applications using cs_strerror (for
example), will have to do the correct linking.

rebuild test.c that uses cs_strerror with shared lib:

[root@fedora16-node1 tests]# gcc -Wall test.c -lcpg
/usr/bin/ld: /tmp/cckzipCf.o: undefined reference to symbol 'cs_strerror'
/usr/bin/ld: note: 'cs_strerror' is defined in DSO /usr/lib64/libcorosync_common.so.4 so try adding it to the linker command line
/usr/lib64/libcorosync_common.so.4: could not read symbols: Invalid operation
collect2: ld returned 1 exit status

This is absolutely correct in this transition case. The symbol has
moved, the DSO loader is helping you to find the new symbol (nothing to
do with that Fedora feature).

[root@fedora16-node1 tests]# gcc -Wall test.c -lcpg -lcorosync_common

[root@fedora16-node1 tests]# ldd a.out linux-vdso.so.1 => (0x00007fff85fff000)
        libcpg.so.4 => /usr/lib64/libcpg.so.4 (0x00007fef2874b000)
        libcorosync_common.so.4 => /usr/lib64/libcorosync_common.so.4 (0x00007fef28549000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fef28192000)
        libqb.so.0 => /usr/lib64/libqb.so.0 (0x00007fef27f32000)
        librt.so.1 => /lib64/librt.so.1 (0x00007fef27d2a000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fef27b25000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fef27909000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fef2895d000)

In this case, libcorosync_common becomes a first level dependency for
test.c (as explained above).

Rebuilding against the static lib will work, but it introduces a subtle
runtime problem that could make debugging rather interesting.

The common functions that are linked in libcpg or lib* are all visible
because we don´t filter exported symbols (something we should consider
doing).

[root@fedora16-node1 tests]# objdump -T /usr/lib64/libcpg.so.4.1.0 |grep cs_str
0000000000003290 g    DF .text  00000000000001f8  Base        cs_strerror

(this is from the statically linked libcpg with common)

[root@fedora16-node1 tests]# cat test.c #include <corosync/corotypes.h>
#include <stdio.h>

int main() {
        const char *err;
        err = cs_strerror(CS_OK);
        printf("error: %s\n", err);
        return 0;
}

[root@fedora16-node1 tests]# gcc -Wall test.c -lcpg -lquorum -lvotequorum
[root@fedora16-node1 tests]# ldd a.out linux-vdso.so.1 => (0x00007fff0d3ff000)
        libcpg.so.4 => /usr/lib64/libcpg.so.4 (0x00007f613076c000)
        libquorum.so.5 => /usr/lib64/libquorum.so.5 (0x00007f6130568000)
        libvotequorum.so.5 => /usr/lib64/libvotequorum.so.5 (0x00007f6130363000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f612ffad000)
        libqb.so.0 => /usr/lib64/libqb.so.0 (0x00007f612fd4d000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f612fb44000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f612f940000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f612f724000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f613097f000)
[root@fedora16-node1 tests]# ./a.out error: success

So now my question is.. which of the 3 cs_strerror is test.c going to use?

This situation can make development and debugging _very_ hard (nevermind
that cs_strerror is just a translator, but in general).

Let´s assume the following scenario:

install statically linked libcpg and libquorum in version 2.0
application foo uses both libcpg and libquorum
some bug fix comes in libcpg, common library is changed for some
unrelated reson.
install new libcpg.

Now.. what symbols is going to use application foo? new ones or old
ones? With a proper shared library approach this wouldn´t be an issue at
all.

You also need to take into account that not all distribution ship all
libraries in one package (corosynclib vs libcpg4 libquorum4). So it´s
not that uncommon as you might think and it leaves a window open for
lots of headaches.

You make your call, but static is bad vs a very small pain to get
applications linked correctly (as they should be) and pkg-config can be
used to ease the pain (tho i am still not sure we should enforce linking
with -lcorosync_common since not all applications use symbols from it).


Ok, it looks like I've misunderstood http://fedoraproject.org/wiki/UnderstandingDSOLinkChange. So as long as it works as you said (app using lcpg, but not cs_errorstr DON'Tt need to link with -common), then shared link solution seems to be better. But if it doesn't work (ie. lcpg linked with lcommon, app need to link with both), static approach is better.

Also I don't see reason for pkg-config to include -lcommon.

So one question is, where made Angus mistake so DSO warning appeared (and whole this thread started)?

Honza

Fabio
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss



[Index of Archives]     [Linux Clusters]     [Corosync Project]     [Linux USB Devel]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Linux Kernel]     [Linux SCSI]     [X.Org]

  Powered by Linux