Hi! > > > the use of a good hash function. The chance of an accidental > > > collision is infinitesimally small. For a set of > > > > > > 100 files: 0.00000000000003% > > > 1,000,000 files: 0.000003% > > > > I do not think we want to play with probability like this. I mean... > > imagine 4G files, 1KB each. That's 4TB disk space, not _completely_ > > unreasonable, and collision probability is going to be ~100% due to > > birthday paradox. > > > > You'll still want to back up your 4TB server... > > Certainly, but tar isn't going to remember all the inode numbers. > Even if you solve the storage requirements (not impossible) it would > have to do (4e9^2)/2=8e18 comparisons, which computers don't have > enough CPU power just yet. Storage requirements would be 16GB of RAM... that's small enough. If you sort, you'll only need 32*2^32 comparisons, and that's doable. I do not claim it is _likely_. You'd need hardlinks, as you noticed. But system should work, not "work with high probability", and I believe we should solve this in long term. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html