There is a directory structure hash, it's just that it's at the end of the name and you'll have to check the xattr I mentioned to find it. I think that file is actually the one we are talking about removing. ./DIR_9/DIR_5/DIR_4/DIR_D/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long: user.cephos.lfn3: default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_3189d_0 Notice that the user.cephosd.lfn3 attr has the full name, and it *does* have a hash 79CED459 (you referred to it as a directory hash I think, but it's actually the hash we used to place it on this osd to begin with). In specifically this case, you shouldn't find any files in the DIR_9/DIR_5/DIR_4/DIR_D directory since there are 16 subdirectories (so all hash values should hash to one of those). The one in DIR_9/DIR_5/DIR_4/DIR_D/DIR_E is completely fine -- that's the actual object file, don't remove that. If you look at the attr: ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long: user.cephos.lfn3: default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_ffffffffffffffff_0 The hash is 79CED459, which means that (assuming DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/DIR_C does *not* exist) it's in the right place. The ENOENT return 2016-03-07 16:11:41.828332 7ff30cdad700 10 filestore(/var/lib/ceph/osd/ceph-307) remove 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0 = -2 2016-03-07 21:44:02.197676 7fe96b56f700 10 filestore(/var/lib/ceph/osd/ceph-307) remove 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0 = -2 actually was a symptom in this case, but, in general, it's not indicative of anything -- the filestore can get ENOENT return values for legitimate reasons. To reiterate: files that end in something like fa202ec9b4b3b217275a_0_long are *not* necessarily orphans -- you need to check the user.cephos.lfn3 attr (as you did before) for the full length file name and determine whether the file is in the right place. -Sam On Wed, Mar 16, 2016 at 7:49 AM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote: > Hi Sam, > > In the 70.459 logs from the deep-scrub, there is an error: > > $ zgrep "= \-2$" ceph-osd.307.log.1.gz > 2016-03-07 16:11:41.828332 7ff30cdad700 10 > filestore(/var/lib/ceph/osd/ceph-307) remove > 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0 > = -2 > 2016-03-07 21:44:02.197676 7fe96b56f700 10 > filestore(/var/lib/ceph/osd/ceph-307) remove > 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0 > = -2 > > I'm taking this as an indication of the error you mentioned. It looks to > me as if this bug leaves two files with "issues" based upon what I see on > the filesystem. > > First, I have a size-0 file in a directory where I expect only to have > directories: > > root@ceph03:/var/lib/ceph/osd/ceph-307/current/70.459s0_head/DIR_9/DIR_5/DIR_4/DIR_D# > ls -ltr > total 320 > -rw-r--r-- 1 root root 0 Jan 23 21:49 > default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long > drwxr-xr-x 2 root root 16384 Feb 5 15:13 DIR_6 > drwxr-xr-x 2 root root 16384 Feb 5 17:26 DIR_3 > drwxr-xr-x 2 root root 16384 Feb 10 00:01 DIR_C > drwxr-xr-x 2 root root 16384 Mar 4 10:50 DIR_7 > drwxr-xr-x 2 root root 16384 Mar 4 16:46 DIR_A > drwxr-xr-x 2 root root 16384 Mar 5 02:37 DIR_2 > drwxr-xr-x 2 root root 16384 Mar 5 17:39 DIR_4 > drwxr-xr-x 2 root root 16384 Mar 8 16:50 DIR_F > drwxr-xr-x 2 root root 16384 Mar 15 15:51 DIR_8 > drwxr-xr-x 2 root root 16384 Mar 15 21:18 DIR_D > drwxr-xr-x 2 root root 16384 Mar 15 22:25 DIR_0 > drwxr-xr-x 2 root root 16384 Mar 15 22:35 DIR_9 > drwxr-xr-x 2 root root 16384 Mar 15 22:56 DIR_E > drwxr-xr-x 2 root root 16384 Mar 15 23:21 DIR_1 > drwxr-xr-x 2 root root 12288 Mar 16 00:07 DIR_B > drwxr-xr-x 2 root root 16384 Mar 16 00:34 DIR_5 > > I assume that this file is an issue as well......and needs to be removed. > > > then, in the directory where the file should be, I have the same file: > > root@ceph03:/var/lib/ceph/osd/ceph-307/current/70.459s0_head/DIR_9/DIR_5/DIR_4/DIR_D/DIR_E# > ls -ltr | grep -v __head_ > total 64840 > -rw-r--r-- 1 root root 1048576 Jan 23 21:49 > default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long > > In the directory DIR_E here (from above), there is only one file without a > __head_ in the pathname -- the file above....Should I be deleting both these > _long files without the __head_ in DIR_E and in one above .../DIR_E? > > Since there is no directory structure HASH in these files, is that the > indication that it is an orphan? > > Thanks, > Jeff > > > > > On Tue, Mar 15, 2016 at 8:38 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >> >> Ah, actually, I think there will be duplicates only around half the >> time -- either the old link or the new link could be orphaned >> depending on which xfs decides to list first. Only if the old link is >> orphaned will it match the name of the object once it's recreated. I >> should be able to find time to put together a branch in the next week >> or two if you want to wait. It's still probably worth trying removing >> that object in 70.459. >> -Sam >> >> On Tue, Mar 15, 2016 at 6:03 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >> > The bug is entirely independent of hardware issues -- entirely a ceph >> > bug. xfs doesn't let us specify an ordering when reading a directory, >> > so we have to keep directory sizes small. That means that when one of >> > those pg collection subfolders has 320 files in it, we split it into >> > up to 16 smaller directories. Overwriting or removing an ec object >> > requires us to rename the old version out of the way in case we need >> > to roll back (that's the generation number I mentioned above). For >> > crash safety, this involves first creating a link to the new name, >> > then removing the old one. Both the old and new link will be in the >> > same subdirectory. If creating the new link pushes the directory to >> > 320 files then we do a split while both links are present. If the >> > file in question is using the special long filename handling, then a >> > bug in the resulting link juggling causes us to orphan the old version >> > of the file. Your cluster seems to have an unusual number of objects >> > with very long names, which is why it is so visible on your cluster. >> > >> > There are critical pool sizes where the PGs will all be close to one >> > of those limits. It's possible you are not close to one of those >> > limits. It's also possible you are nearing one now. In any case, the >> > remapping gave the orphaned files an opportunity to cause trouble, but >> > they don't appear due to remapping. >> > -Sam >> > >> > On Tue, Mar 15, 2016 at 5:41 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> >> > wrote: >> >> One more question.....did we hit the bug because we had hardware issues >> >> during the remapping or would it have happened regardless of the >> >> hardware >> >> issues? e.g. I'm not planning to add any additional hardware soon, >> >> but >> >> would the bug pop again on an (unpatched) system not subject to any >> >> remapping? >> >> >> >> thanks, >> >> jeff >> >> >> >> On Tue, Mar 15, 2016 at 7:27 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >> >>> >> >>> [back on list] >> >>> >> >>> ceph-objectstore-tool has a whole bunch of machinery for modifying an >> >>> offline objectstore. It would be the easiest place to put it -- you >> >>> could add a >> >>> >> >>> ceph-objectstore-tool --op filestore-repair-orphan-links [--dry-run] >> >>> ... >> >>> >> >>> command which would mount the filestore in a special mode and iterate >> >>> over all collections and repair them. If you want to go that route, >> >>> we'd be happy to help you get it written. Once it fixes your cluster, >> >>> we'd then be able to merge and backport it in case anyone else hits >> >>> it. >> >>> >> >>> You'd probably be fine doing it while the OSD is live...but as a rule >> >>> I usually prefer to do my osd surgery offline. Journal doesn't matter >> >>> here, the orphaned files are basically invisible to the filestore >> >>> (except when doing a collection scan for scrub) since they are in the >> >>> wrong directory. >> >>> >> >>> I don't think the orphans are necessarily going to be 0 size. There >> >>> might be quirk of how radosgw creates these objects that always causes >> >>> them to be created 0 size than then overwritten with a writefull -- if >> >>> that's true it might be the case that you would only see 0 size ones. >> >>> -Sam >> >>> >> >>> On Tue, Mar 15, 2016 at 4:02 PM, Jeffrey McDonald <jmcdonal@xxxxxxx> >> >>> wrote: >> >>> > Thanks, I can try to write a tool to do this. Does >> >>> > ceph-objectstore-tool >> >>> > provide a framework? >> >>> > >> >>> > Can I safely delete the files while the OSD is alive or should I >> >>> > take it >> >>> > offline? Any concerns about the journal? >> >>> > >> >>> > Are there any other properties of the orphans, e.g. will the orphans >> >>> > always >> >>> > be size 0? >> >>> > >> >>> > Thanks! >> >>> > Jeff >> >>> > >> >>> > On Tue, Mar 15, 2016 at 5:35 PM, Samuel Just <sjust@xxxxxxxxxx> >> >>> > wrote: >> >>> >> >> >>> >> Ok, a branch merged to master which should fix this >> >>> >> (https://github.com/ceph/ceph/pull/8136). It'll be backported in >> >>> >> due >> >>> >> course. The problem is that that patch won't clean orphaned files >> >>> >> that already exist. >> >>> >> >> >>> >> Let me explain a bit about what the orphaned files look like. The >> >>> >> problem is files with object names that result in escaped filenames >> >>> >> longer than the max filename ceph will create (~250 iirc). >> >>> >> Normally, >> >>> >> the name of the file is an escaped and sanitized version of the >> >>> >> object >> >>> >> name: >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/default.325674.107\u\ushadow\u.KePEE8heghHVnlb1\uEIupG0I5eROwRn\u77__head_C1DCD459__46_ffffffffffffffff_0 >> >>> >> >> >>> >> corresponds to an object like >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> c1dcd459/default.325674.107__shadow_.KePEE8heghHVnlb1_EIupG0I5eROwRn_77/head//70 >> >>> >> >> >>> >> the DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ path is derived from the hash >> >>> >> starting with the last value: cd459 -> >> >>> >> DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ >> >>> >> >> >>> >> It ends up in DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ because that's the >> >>> >> longest path that exists (DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/DIR_D does >> >>> >> not >> >>> >> exist -- if DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ ever gets too full, >> >>> >> DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/DIR_D would be created and this file >> >>> >> would be moved into it). >> >>> >> >> >>> >> When the escaped filename gets too long, we truncate the filename, >> >>> >> and >> >>> >> then append a hash and a number yielding a name like: >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long >> >>> >> >> >>> >> The _long at the end is always present with files like this. >> >>> >> fa202ec9b4b3b217275a is the hash of the filename. The 0 indicates >> >>> >> that it's the 0th file with this prefix and this hash -- if there >> >>> >> are >> >>> >> hash collisions with the same prefix, you'll see _1_ and _2_ and so >> >>> >> on >> >>> >> to distinguish them (very very unlikely). When the filename has >> >>> >> been >> >>> >> truncated as with this one, you will find the full file name in the >> >>> >> attrs (attr user.cephosd.lfn3): >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long: >> >>> >> user.cephos.lfn3: >> >>> >> >> >>> >> >> >>> >> >> >>> >> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_ffffffffffffffff_0 >> >>> >> >> >>> >> Let's look at one of the orphaned files (the one with the same >> >>> >> file-name as the previous one, actually): >> >>> >> >> >>> >> >> >>> >> >> >>> >> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long: >> >>> >> user.cephos.lfn3: >> >>> >> >> >>> >> >> >>> >> >> >>> >> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_3189d_0 >> >>> >> >> >>> >> This one has the same filename as the previous object, but is an >> >>> >> orphan. What makes it an orphan is that it has hash 79CED459, but >> >>> >> is >> >>> >> in ./DIR_9/DIR_5/DIR_4/DIR_D even though >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E exists (objects-files are always at >> >>> >> the farthest directory from the root matching their hash). All of >> >>> >> the >> >>> >> orphans will be long-file-name objects (but most long-file-name >> >>> >> objects are fine and are neither orphans nor have duplicates -- >> >>> >> it's a >> >>> >> fairly low occurrence bug). In your case, I think *all* of the >> >>> >> orphans will probably happen to have files with duplicate names in >> >>> >> the >> >>> >> correct directory -- though might not if the object had actually >> >>> >> been >> >>> >> deleted since the bug happened. When there are duplicates, the >> >>> >> full >> >>> >> object names will either be the same or differ by the generation >> >>> >> number at the end (ffffffffffffffff_0 vs 3189d_0) in this case. >> >>> >> >> >>> >> Once the orphaned files are cleaned up, your cluster should be back >> >>> >> to >> >>> >> normal. If you want to wait, someone might get time to build a >> >>> >> patch >> >>> >> for ceph-objectstore-tool to automate this. You can try removing >> >>> >> the >> >>> >> orphan we identified in pg 70.459 and re-scrubbing to confirm that >> >>> >> that fixes the pg. >> >>> >> -Sam >> >>> >> >> >>> >> On Wed, Mar 9, 2016 at 6:58 AM, Jeffrey McDonald <jmcdonal@xxxxxxx> >> >>> >> wrote: >> >>> >> > Hi, I went back to the mon logs to see if I could illicit any >> >>> >> > additional >> >>> >> > information about this PG. >> >>> >> > Prior to 1/27/16, the deep-scrub on this OSD passes(then I see >> >>> >> > obsolete >> >>> >> > rollback objects found): >> >>> >> > >> >>> >> > ceph.log.4.gz:2016-01-20 09:43:36.195640 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 538 >> >>> >> > : cluster [INF] 70.459 deep-scrub ok >> >>> >> > ceph.log.4.gz:2016-01-27 09:51:49.952459 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 583 >> >>> >> > : cluster [INF] 70.459 deep-scrub starts >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:57.196311 osd.108 >> >>> >> > 10.31.0.69:6816/4283 >> >>> >> > 335 : >> >>> >> > cluster [ERR] osd.108 pg 70.459s5 found obsolete rollback obj >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/5 >> >>> >> > generation < trimmed_to 130605'206504...repaired >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:57.043942 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 584 >> >>> >> > : cluster [ERR] osd.307 pg 70.459s0 found obsolete rollback obj >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0 >> >>> >> > generation < trimmed_to 130605'206504...repaired >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225017 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 585 >> >>> >> > : cluster [ERR] 70.459s0 shard 4(3) missing >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70 >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225068 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 586 >> >>> >> > : cluster [ERR] 70.459s0 shard 10(2) missing >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70 >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225088 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 587 >> >>> >> > : cluster [ERR] 70.459s0 shard 26(1) missing >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70 >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225127 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 588 >> >>> >> > : cluster [ERR] 70.459s0 shard 132(4) missing >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70 >> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.926032 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 589 >> >>> >> > : cluster [ERR] 70.459s0 deep-scrub stat mismatch, got >> >>> >> > 21324/21323 >> >>> >> > objects, >> >>> >> > 0/0 clones, 21324/21323 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 >> >>> >> > whiteouts, >> >>> >> > 64313094166/64308899862 bytes,0/0 hit_set_archive bytes. >> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.927589 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 590 >> >>> >> > : cluster [ERR] 70.459s0 deep-scrub 1 missing, 0 inconsistent >> >>> >> > objects >> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.931250 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 591 >> >>> >> > : cluster [ERR] 70.459 deep-scrub 5 errors >> >>> >> > ceph.log.4.gz:2016-01-28 10:32:37.083809 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 592 >> >>> >> > : cluster [INF] 70.459 repair starts >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:44.608297 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 593 >> >>> >> > : cluster [ERR] osd.307 pg 70.459s0 found obsolete rollback obj >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0 >> >>> >> > generation < trimmed_to 130605'206504...repaired >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802549 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 594 >> >>> >> > : cluster [ERR] 70.459s0 shard 4(3) missing >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70 >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802933 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 595 >> >>> >> > : cluster [ERR] 70.459s0 shard 10(2) missing >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70 >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802978 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 596 >> >>> >> > : cluster [ERR] 70.459s0 shard 26(1) missing >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70 >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.803039 osd.307 >> >>> >> > 10.31.0.67:6848/127170 >> >>> >> > 597 >> >>> >> > : cluster [ERR] 70.459s0 shard 132(4) missing >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70 >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:44.781639 osd.108 >> >>> >> > 10.31.0.69:6816/4283 >> >>> >> > 338 : >> >>> >> > cluster [ERR] osd.108 pg 70.459s5 found obsolete rollback obj >> >>> >> > >> >>> >> > >> >>> >> > >> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/5 >> >>> >> > generation < trimmed_to 130605'206504...repaired >> >>> >> > ceph.log.4.gz:2016-01-28 11:01:18.119350 osd.26 >> >>> >> > 10.31.0.103:6812/77378 >> >>> >> > 2312 >> >>> >> > : cluster [INF] 70.459s1 restarting backfill on osd.305(0) from >> >>> >> > (0'0,0'0] >> >>> >> > MAX to 130605'206506 >> >>> >> > ceph.log.4.gz:2016-02-01 13:40:55.096030 osd.307 >> >>> >> > 10.31.0.67:6848/13421 >> >>> >> > 16 : >> >>> >> > cluster [INF] 70.459s0 restarting backfill on osd.210(1) from >> >>> >> > (0'0,0'0] >> >>> >> > MAX >> >>> >> > to 135195'206996 >> >>> >> > ceph.log.4.gz:2016-02-01 13:41:10.623892 osd.307 >> >>> >> > 10.31.0.67:6848/13421 >> >>> >> > 27 : >> >>> >> > cluster [INF] 70.459s0 restarting backfill on osd.25(1) from >> >>> >> > (0'0,0'0] >> >>> >> > MAX >> >>> >> > to 135195'206996 >> >>> >> > ... >> >>> >> > >> >>> >> > >> >>> >> > Regards, >> >>> >> > Jeff >> >>> >> > >> >>> >> > >> >>> >> > -- >> >>> >> > >> >>> >> > Jeffrey McDonald, PhD >> >>> >> > Assistant Director for HPC Operations >> >>> >> > Minnesota Supercomputing Institute >> >>> >> > University of Minnesota Twin Cities >> >>> >> > 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx >> >>> >> > 117 Pleasant St SE phone: +1 612 625-6905 >> >>> >> > Minneapolis, MN 55455 fax: +1 612 624-8861 >> >>> >> > >> >>> >> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > -- >> >>> > >> >>> > Jeffrey McDonald, PhD >> >>> > Assistant Director for HPC Operations >> >>> > Minnesota Supercomputing Institute >> >>> > University of Minnesota Twin Cities >> >>> > 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx >> >>> > 117 Pleasant St SE phone: +1 612 625-6905 >> >>> > Minneapolis, MN 55455 fax: +1 612 624-8861 >> >>> > >> >>> > >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Jeffrey McDonald, PhD >> >> Assistant Director for HPC Operations >> >> Minnesota Supercomputing Institute >> >> University of Minnesota Twin Cities >> >> 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx >> >> 117 Pleasant St SE phone: +1 612 625-6905 >> >> Minneapolis, MN 55455 fax: +1 612 624-8861 >> >> >> >> > > > > > -- > > Jeffrey McDonald, PhD > Assistant Director for HPC Operations > Minnesota Supercomputing Institute > University of Minnesota Twin Cities > 599 Walter Library email: jeffrey.mcdonald@xxxxxxxxxxx > 117 Pleasant St SE phone: +1 612 625-6905 > Minneapolis, MN 55455 fax: +1 612 624-8861 > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com