Re: gcc binary output differs whether it is built from *.o or *.a

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



----- "Cedric Roux" <cedric.roux@xxxxxxxxxx> a écrit :
> beaugy.a@xxxxxxx wrote:
> > Hi all,
> > So, to put it in a nutshell, all my generated objects file are
> > identical on dev1 and dev2 and object files contained in my
> > convenience libraries are all identical. The only difference
> > remaining, before I generate my binary, resides in the generated
> > convenience libraries which are not identical, but their contents
> > are. So AFAK, this slight difference shall not make the difference.
> So
> > "why does gcc output (MD5 checksum) differs when I build a binary
> > using the project object files (*.o) or the project convenience
> > libraries (*.a)?" and "what can I do to fix that?".
> 
> Your *.o files are proceeded in a different order when
> on the command line and on the .a archive, putting symbols
> on different addresses, so obviously different binaries
> are produced.
> 
> Even if you do:
> gcc 1.o 2.o
> And:
> gcc 2.o 1.o
> you get binaries with different md5.

As you (and Eljay, later) suggested to me, the order of objects files
on my compilation commandline matters. I dare say I already noticed
that very point (as my bottom mail workaround will show you...). I
assume it has to do with the adressing, which will differ as object
files' order changes. Nevertheless, as far as I'm using the autotools,
object files' order remain the same on the commandline, so the as the
checksum should too.

For Eljay suggestion, about using "-frandom-seed=foo" for C++
objects generation. I already am using this option for C++
objects. And mainly, my project is C based.

For Eljay second remark, about debugging info that could make
checksums vary, I also dare say that I have noticed this
behaviour. Therefore, I removed all kind of "-g*" options in
compilation options.

At the end, with both "-frandom-seed=foo" and debug info removed, I am
able to always generate reproducibly identical object files.

After that if I link all object files in a binary, I always get the
same CRC checksum for the binary (anywhere, at anytime). But if I
choose, to group together my object files into various static
libraries, before linking those very static libraries in a binary, it
results in:
    - Binaries with a similar checksum (anywhere, at anytime);
    - But sometimes binaries with different checksums.

I assume that it does not come from the ar packaging, because
checksums on ar archives are never the same. The explanation comes, I
think, from the fact that ar, like tar, stores all the information
about object files archived (E.g. creation date, owner, group,
etc.). Information that may vary on the different compilation platforms.

Therefore, I think that the difference between the binaries produced
on my two platforms, comes from the (possibly) different ways, ar
archives extracts object files. Producing different object files
lists. And we are back to your suggestion about the order in the
commandline.

This is confirmed by the workaround I had to put in some of my
projects to generate reproducibly identical binaries:

[BEGIN: Makefile.am extract]
foo$(EXEEXT): $(foo_OBJECTS) $(foo_DEPENDENCIES) 
	@rm -f foo$(EXEEXT)
	@rm -rf tmp-ar-o
	@mkdir -p tmp-ar-o
	@pattern="[[:blank:]]*\([^[:blank:]]\+\)/lib\([^[:blank:]]\+\)\.la" ; \
	 lib_list=`echo "$(foo_LDADD)" | \
	           sed "s#$$pattern# \1/.libs/lib\2.a#g"` ; \
	 pattern="[[:blank:]]*\([\.][^[:blank:]]\+\)\.a" ; \
	 lib_list=`echo "$$lib_list" | \
	           sed "/[[:blank:]]\+-l/d ; s#$$pattern# ../\1.a#g"` ; \
	 pushd tmp-ar-o ; \
	 for lib in $$lib_list ; do \
	   echo "ar x $$lib" ; \
	   ar x $$lib ; \
	 done ; \
	 rm -f *-2.o ; \
	 for dupobj in `ls *-1.o` ; do \
	   obj=`echo $$dupobj | sed 's/-1//'` ; \
	   mv $$dupobj $$obj ; \
	 done ; \
	 popd
	$(LINK) $(foo_LDFLAGS) $(foo_OBJECTS) tmp-ar-o/*.o $(LIBS)
	@rm -rf tmp-ar-o
[END: Makefile.am extract]

(It extracts the project's local static libraries content into a
 temporary folder and then do the linking. For a better understanding,
 LDFLAGS are only external shared libraries (E.g. libm.so,
 libpthread.so, etc.)).

Then, how does gcc behaves at link time? How did it handles static
libraries? Or, how perhaps, the linker behaves? Does gcc pass it a
list of object files to link (considering it previously extracted the
files from the ar archive)? Why does some binaries, linked with static
libs, have a constant checksum, where others cannot consecutively have
the same checksum twice, with the same C sources?

Hope you guys could help me.

Thanks a lot for your help.

Regards,

-- 
Alexandre Beaugy



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux