On Fri, Nov 2, 2012 at 7:30 AM, Nathan Howell <nathan.d.howell@xxxxxxxxx> wrote: > On Thu, Nov 1, 2012 at 3:32 PM, Sam Lang <sam.lang@xxxxxxxxxxx> wrote: >> Do the writes succeed? I.e. the programs creating the files don't get >> errors back? Are you seeing any problems with the ceph mds or osd processes >> crashing? Can you describe your I/O workload during these bulk loads? How >> many files, how much data, multiple clients writing, etc. >> >> As far as I know, there haven't been any fixes to 0.48.2 to resolve problems >> like yours. You might try the ceph fuse client to see if you get the same >> behavior. If not, then at least we have narrowed down the problem to the >> ceph kernel client. > > Yes, the writes succeed. Wednesday's failure looked like this: > > 1) rsync 100-200mb tarball directly into ceph from a remote site > 2) untar ~500 files from tarball in ceph into a new directory in ceph > 3) wait for a while > 4) the .tar file and some log files disappeared but the untarred files were fine > > Total filesystem size is: > > pgmap v2221244: 960 pgs: 960 active+clean; 2418 GB data, 7293 GB used, > 6151 GB / 13972 GB avail > > Generally our load looks like: > > Constant trickle of 1-2mb files from 3 machines, about 1GB per day > total. No file is written to by more than 1 machine, but the files go > into shared directories. > > Grid jobs are running constantly and are doing sequential reads from > the filesystem. Compute nodes have the filesystem mounted read-only. > They're primarily located at a remote site (~40ms away) and tend to > average 1-2 megabits/sec. > > Nightly data jobs load in ~10GB from a few remote sites in to <10 > large files. These are split up into about 1000 smaller files but the > originals are also kept. All of this is done on one machine. The > journals and osd drives are write saturated while this is going on. > > > On Thu, Nov 1, 2012 at 4:02 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> Are you using hard links, by any chance? > > No, we are using a handfull of soft links though. > > >> Do you have one or many MDS systems? > > ceph mds stat says: e686: 1/1/1 up {0=xxx=up:active}, 2 up:standby > > >> What filesystem are you using on your OSDs? > > btrfs > > my recent patch ''ceph: Fix i_size update race" probably can fix the truncated file issue. Yan, Zheng -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html