On 11/9/2013 1:47 AM, Anand Avati wrote: > Thanks for the detailed info. I have not yet looked into your logs, but > will do so soon. There have been patches on rebalance which do fix > issues related to ownership. But I am not (yet) sure about bugs which > caused data loss. One question I have is - > > [2013-10-29 23:13:49.611069] I [dht-rebalance.c:647:dht___migrate_file] > 0-mdfs-dht: /REDACTED/mdfs/KPA/__kpacontentminepix/docs/008/__058: > attempting to move from mdfs-replicate-1 to mdfs-replicate-6 > [2013-10-29 23:13:49.611582] I [dht-rebalance.c:647:dht___migrate_file] > 0-mdfs-dht: /REDACTED/mdfs/KPA/__kpacontentminepix/docs/008/__058: > attempting to move from mdfs-replicate-1 to mdfs-replicate-6 > > Are these two lines from the same log file or separate log files? If > they are from the same log, then it might be you > need http://review.gluster.org/4300 (available in 3.4) They are from the same log file - the one that I put on my dropbox account and linked in the original message. They are consecutive log entries. There are three visible problems caused by the failed rebalance, which failed after moving 1.5TB of data. One is a relatively small number of lost files - 91 that I know about. They are completely gone, can't find them even on the bricks. I even looked through the .gluster directory on all the bricks for files with one link. There weren't any. The second problem is 32362 files that show up in the fuse mount with ---------T permissions, but have a read error when trying to access them. A few of those files have since become readable (as root) with no changes on my part, but most of them are still unreadable. I have located all of those files via the bricks and saved a copy elsewhere so I can replace the unreadable ones. The third problem is over 800000 files with 000 permissions. The only files I've really looked at in depth are the ones that were mentioned in the rebalance log as failed to migrate. There's far too many files on the volume to do much else. Spot-checking hasn't turned up any other problems, though. Here's a df output showing the bricks on the first server along with the fuse-mounted volumes, followed by the gluster volume info: http://fpaste.org/52886/13839892/ Let me know if there's any other info I can provide. Thanks, Shawn