Re: Duplicated files in the pristine FC4t2 installation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2005-05-02 at 16:25 -0400, Peter Jones wrote:
> On Mon, 2005-05-02 at 12:35 -0700, Roland McGrath wrote:
> > > Roland McGrath wrote:
> > > > I think what one clearly wants is for rpm to maintain an installed file
> > > > indexed keyed by md5sum.  Then you can have a tool that just uses this
> > > > database to identify duplicates (and doesn't take forever), or have rpm do
> > > > so itself when installing new files.
> > > > 
> > > 
> > > Hmm, what about hash collisions, that would be really really BAD
> > 
> > If you are concerned about them you can still compare contents before
> > declaring two files identical.  But using the hashes as the main detector
> > makes it fast, since you only examine the data of files that are 99.999%
> > likely to be identical.
> 
> And in the vast majority of cases, there's a simpler heuristic you can
> use first: is the basename the same?

The easiest way seems to be only to stat all the files to be compared,
put all info to some array of pointers to the info structures, sort the
array by size [this will automagically detect all zero-sized files that
won't be linked and are skipped] then just go from top to bottom in the
array and check in-depth all the files with equal size, i.e. byte-by-
byte compare during the md5sum is calculated. This avoids all the md5sum
collisions. This is how it's done in the slink utility, the md5sums are
printed in the log just FYI and isn't used as a measure of file
equality. The basename heuristics seems less reliable and more
calculation-time/design expensive to me.


Jindrich

-- 
Jindrich Novy <jnovy@xxxxxxxxxx>, http://people.redhat.com/jnovy/

The worst evil in the world is refusal to think.

-- 
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
http://www.redhat.com/mailman/listinfo/fedora-devel-list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]
  Powered by Linux