We're building our cluster of data, downloading book data from Internet Archive. I've come across one that looks like this: http://cluster.biodiversitylibrary.org/n/naturwissenschaft19deut/ Almost all the files appear to be there twice, but have the same name, timestamp and inode! What could be causing this, and how can we fix it? At issue is space; it appears that we're using far more space than we should, and an `du -h` or `ls -lsh` both say this directory takes 3.9G when it should really be about 1/2 that. If it has done this on many of the directories, it could explain how we're using 78T of 97T of space already. P -- http://philcryer.com