On Thu, Nov 1, 2012 at 11:32 PM, Sam Lang <sam.lang@xxxxxxxxxxx> wrote: > On Thu 01 Nov 2012 11:22:59 AM CDT, Nathan Howell wrote: >> >> We have a small (3 node) Ceph cluster that occasionally has issues. It >> loses files and directories, truncates them or fills the contents with >> NULL bytes. So far we haven't been able to build a repro case but it >> seems to happen when bulk loading data into the cluster, a process >> that is run each evening by a cron job. We've gone about a month >> without any issues but had it happen again yesterday during a larger >> bulk load. The data is backed up outside of ceph and can be reloaded >> but finding the corrupt files takes quite a while. >> >> Has anyone heard of similar issues before? Should I try upgrading to >> 0.48.2 or a newer kernel? > > > Hi Nathan, > > Do the writes succeed? I.e. the programs creating the files don't get > errors back? Are you seeing any problems with the ceph mds or osd processes > crashing? Can you describe your I/O workload during these bulk loads? How > many files, how much data, multiple clients writing, etc. > > As far as I know, there haven't been any fixes to 0.48.2 to resolve problems > like yours. You might try the ceph fuse client to see if you get the same > behavior. If not, then at least we have narrowed down the problem to the > ceph kernel client. Are you using hard links, by any chance? Do you have one or many MDS systems? What filesystem are you using on your OSDs? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html