On Mon, 2005-05-02 at 12:35 -0700, Roland McGrath wrote: > > Roland McGrath wrote: > > > I think what one clearly wants is for rpm to maintain an installed file > > > indexed keyed by md5sum. Then you can have a tool that just uses this > > > database to identify duplicates (and doesn't take forever), or have rpm do > > > so itself when installing new files. > > > > > > > Hmm, what about hash collisions, that would be really really BAD > > If you are concerned about them you can still compare contents before > declaring two files identical. But using the hashes as the main detector > makes it fast, since you only examine the data of files that are 99.999% > likely to be identical. And in the vast majority of cases, there's a simpler heuristic you can use first: is the basename the same? But really, this is 160MB of wasted space. We don't support installing onto USB, so from glancing at pricewatch, the smallest disk they list that we support installing onto would appear to be an 18GB SCSI drive for $23. There are larger, cheaper drives, too. So we're talking about saving just under 1% of the least-desirable supported install target currently being sold. Let's just stop? -- Peter -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx http://www.redhat.com/mailman/listinfo/fedora-devel-list