I upgraded to 0.54 and now there are some hints in the logs. The directories referenced in the log entries are now missing: 2012-11-23 07:28:04.802864 mds.0 [ERR] loaded dup inode 1000000662f [2,head] v3851654 at /xxx/20120203, but inode 1000000662f.head v3853093 already exists at ~mds0/stray7/1000000662f 2012-11-23 07:28:04.802889 mds.0 [ERR] loaded dup inode 10000003a4b [2,head] v431518 at /xxx/20120206, but inode 10000003a4b.head v3853192 already exists at ~mds0/stray8/10000003a4b 2012-11-23 07:28:04.802909 mds.0 [ERR] loaded dup inode 1000000149e [2,head] v431522 at /xxx/20120207, but inode 1000000149e.head v3853206 already exists at ~mds0/stray8/1000000149e 2012-11-23 07:28:04.802927 mds.0 [ERR] loaded dup inode 10000000a5f [2,head] v431526 at /xxx/20120208, but inode 10000000a5f.head v3853208 already exists at ~mds0/stray8/10000000a5f Any ideas? On Thu, Nov 15, 2012 at 11:00 AM, Nathan Howell <nathan.d.howell@xxxxxxxxx> wrote: > Yes, successfully written files were disappearing. We switched to ceph-fuse > and haven't seen any files truncated since. Older files (written months ago) > are still having their entire contents replaced with NULL bytes, seemly at > random. I can't yet say for sure this has happened since switching over to > fuse... but we think it has. > > I'm going to test all of the archives over the next few days and restore > them from S3, so we should be back in a known-good state after that. In the > event more files end up corrupted, is there any logging that I can enable > that would help track down the problem? > > thanks, > -n > > > On Sat, Nov 3, 2012 at 9:54 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> On Fri, Nov 2, 2012 at 12:30 AM, Nathan Howell >> <nathan.d.howell@xxxxxxxxx> wrote: >> > On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang <sam.lang@xxxxxxxxxxx> wrote: >> >> Do the writes succeed? I.e. the programs creating the files don't get >> >> errors back? Are you seeing any problems with the ceph mds or osd >> >> processes >> >> crashing? Can you describe your I/O workload during these bulk loads? >> >> How >> >> many files, how much data, multiple clients writing, etc. >> >> >> >> As far as I know, there haven't been any fixes to 0.48.2 to resolve >> >> problems >> >> like yours. You might try the ceph fuse client to see if you get the >> >> same >> >> behavior. If not, then at least we have narrowed down the problem to >> >> the >> >> ceph kernel client. >> > >> > Yes, the writes succeed. Wednesday's failure looked like this: >> > >> > 1) rsync 100-200mb tarball directly into ceph from a remote site >> > 2) untar ~500 files from tarball in ceph into a new directory in ceph >> > 3) wait for a while >> > 4) the .tar file and some log files disappeared but the untarred files >> > were fine >> >> Just to be clear, you copied a tarball into Ceph and untarred all in >> Ceph, and the extracted contents were fine but the tarball >> disappeared? So this looks like a case of successfully-written files >> disappearing? >> Did you at any point check the tarball from a machine other than the >> initial client that copied it in? >> >> This truncation sounds like maybe Yan's fix will deal with it. But if >> you've also seen files with the proper size but be empty or corrupted, >> that sounds like an OSD bug. Sam, are you aware of any btrfs issues >> that could cause this? >> >> Nathan, you've also seen parts of the filesystem hierarchy get lost? >> That's rather more concerning; under what circumstances have you seen >> that? >> -Greg >> >> > Total filesystem size is: >> > >> > pgmap v2221244: 960 pgs: 960 active+clean; 2418 GB data, 7293 GB used, >> > 6151 GB / 13972 GB avail >> > >> > Generally our load looks like: >> > >> > Constant trickle of 1-2mb files from 3 machines, about 1GB per day >> > total. No file is written to by more than 1 machine, but the files go >> > into shared directories. >> > >> > Grid jobs are running constantly and are doing sequential reads from >> > the filesystem. Compute nodes have the filesystem mounted read-only. >> > They're primarily located at a remote site (~40ms away) and tend to >> > average 1-2 megabits/sec. >> > >> > Nightly data jobs load in ~10GB from a few remote sites in to <10 >> > large files. These are split up into about 1000 smaller files but the >> > originals are also kept. All of this is done on one machine. The >> > journals and osd drives are write saturated while this is going on. >> > >> > >> > On Thu, Nov 1, 2012 at 4:02 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> >> Are you using hard links, by any chance? >> > >> > No, we are using a handfull of soft links though. >> > >> > >> >> Do you have one or many MDS systems? >> > >> > ceph mds stat says: e686: 1/1/1 up {0=xxx=up:active}, 2 up:standby >> > >> > >> >> What filesystem are you using on your OSDs? >> > >> > btrfs >> > >> > >> > thanks, >> > -n > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html