On Tue, 2006-01-03 at 18:49 -0600, Johnny Hughes wrote: > On Fri, 2005-12-30 at 00:00 +0100, Maciej ?enczykowski wrote: > > > > e) why aren't identical files between the two trees hardlinked? > > > > $ ls -ali os/*/CentOS/RPMS/yum*noarch* > > 278532 -rw-rw-r-- 1 maze maze 395922 Sep 4 19:48 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm > > 1165388 -rw-rw-r-- 1 maze maze 395922 Oct 10 22:20 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm > > > > $ md5sum os/*/CentOS/RPMS/yum*noarch* > > 371d55a19f8e4ca13d22974128ab4671 i386/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm > > 371d55a19f8e4ca13d22974128ab4671 x86_64/CentOS/RPMS/yum-2.4.0-1.centos4.noarch.rpm > > > > Just an example of two identical files from my mirror, one of which is > > wasting space even though contents are identical. I expect we have this > > situation for almost _all_ i386 packages from the x86_64 distribution... > > > > We run a program called hardlink++ on the master mirror that should hard > link files that are identical. If it is not hardlinking those it > should. > > Are you using -H option on your rsyncing down? > > > $ pwd > > /opt/mirrors/centos/4.2/os/x86_64 > > $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done > > $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c > > 440745010 > > $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done > > $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c > > 426816227 > > > > $ pwd > > /opt/mirrors/centos/4.2/updates/x86_64 > > $ find|grep "i386\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done > > $ find|grep "i386\.rpm"|while read i;do cat "$i";done|wc -c > > 12819616 > > $ find|grep "noarch\.rpm"|while read i;do diff -qr "$i" "../i386/$i";done > > diff: ../i386/./RPMS/createrepo-0.4.3-1.noarch.rpm: No such file or directory > > $ find|grep "noarch\.rpm"|while read i;do cat "$i";done|wc -c > > 2164495 > > > > $ ls RPMS/createrepo-0.4.3-1.noarch.rpm -al > > -rw-rw-r-- 2 maze maze 18284 Sep 5 13:59 RPMS/createrepo-0.4.3-1.noarch.rpm > > > > That seems to me to be a 880 MB mirror space savings to be made there... > > Considering the i386/x86_64 mirror takes up 7.7GB (without iso's) that's > > quite a bit... > > > > I also imagine the noarch files are shared with most of the other > > architectures... so I'd assume another 400MB per every next arch can be > > saved... > > One thing to please remember is that we develop these files from > separate locations on separate machines, so they have to be stand alone > on those machines initially ... we then combine them together on the > mirror and run hardlink++. That SHOULD hardlink all the files that are > the same. OK ... have done some specific testing, I have found out this about hardlink++ It only links files that have the same date/time stamp ... which means if a file has the same size and MD5 sum but a different date, it will not get linked. This is not what I thought it did. I will try to get the arches I control (i386 / x86_64) better hardlinked in the future and try to maintain them that way, since what I thought the hardlink++ was doing, it is not. However, there are only so many hours in the day. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.centos.org/pipermail/centos/attachments/20060104/f192ad73/attachment.bin