Seeing duplicate files, with duplicate names, inode number, etc

phil at cryer.us (phil cryer) · Wed, 27 Oct 2010 12:23:14 -0500

We're building our cluster of data, downloading book data from
Internet Archive. I've come across one that looks like this:
http://cluster.biodiversitylibrary.org/n/naturwissenschaft19deut/

Almost all the files appear to be there twice, but have the same name,
timestamp and inode! What could be causing this, and how can we fix
it? At issue is space; it appears that we're using far more space than
we should, and an `du -h` or `ls -lsh` both say this directory takes
3.9G when it should really be about 1/2 that. If it has done this on
many of the directories, it could explain how we're using 78T of 97T
of space already.

P
-- 
http://philcryer.com