Re: inconsistent PG -> unfound objects on an erasure coded system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



There is a directory structure hash, it's just that it's at the end of
the name and you'll have to check the xattr I mentioned to find it.

I think that file is actually the one we are talking about removing.

./DIR_9/DIR_5/DIR_4/DIR_D/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
user.cephos.lfn3:
default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_3189d_0

Notice that the user.cephosd.lfn3 attr has the full name, and it
*does* have a hash 79CED459 (you referred to it as a directory hash I
think, but it's actually the hash we used to place it on this osd to
begin with).

In specifically this case, you shouldn't find any files in the
DIR_9/DIR_5/DIR_4/DIR_D directory since there are 16 subdirectories
(so all hash values should hash to one of those).

The one in DIR_9/DIR_5/DIR_4/DIR_D/DIR_E is completely fine -- that's
the actual object file, don't remove that.  If you look at the attr:

./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
user.cephos.lfn3:
default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_ffffffffffffffff_0

The hash is 79CED459, which means that (assuming
DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/DIR_C does *not* exist) it's in the
right place.

The ENOENT return

2016-03-07 16:11:41.828332 7ff30cdad700 10
filestore(/var/lib/ceph/osd/ceph-307) remove
70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
= -2
2016-03-07 21:44:02.197676 7fe96b56f700 10
filestore(/var/lib/ceph/osd/ceph-307) remove
70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
= -2

actually was a symptom in this case, but, in general, it's not
indicative of anything -- the filestore can get ENOENT return values
for legitimate reasons.

To reiterate: files that end in something like
fa202ec9b4b3b217275a_0_long are *not* necessarily orphans -- you need
to check the user.cephos.lfn3 attr (as you did before) for the full
length file name and determine whether the file is in the right place.
-Sam

On Wed, Mar 16, 2016 at 7:49 AM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote:
> Hi Sam,
>
> In the 70.459 logs from the deep-scrub, there is an error:
>
>  $ zgrep "= \-2$" ceph-osd.307.log.1.gz
> 2016-03-07 16:11:41.828332 7ff30cdad700 10
> filestore(/var/lib/ceph/osd/ceph-307) remove
> 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
> = -2
> 2016-03-07 21:44:02.197676 7fe96b56f700 10
> filestore(/var/lib/ceph/osd/ceph-307) remove
> 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
> = -2
>
> I'm taking this as an indication of the error you mentioned.    It looks to
> me as if this bug leaves two files with "issues" based upon what I see on
> the filesystem.
>
> First, I have a size-0 file in a directory where I expect only to have
> directories:
>
> root@ceph03:/var/lib/ceph/osd/ceph-307/current/70.459s0_head/DIR_9/DIR_5/DIR_4/DIR_D#
> ls -ltr
> total 320
> -rw-r--r-- 1 root root     0 Jan 23 21:49
> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long
> drwxr-xr-x 2 root root 16384 Feb  5 15:13 DIR_6
> drwxr-xr-x 2 root root 16384 Feb  5 17:26 DIR_3
> drwxr-xr-x 2 root root 16384 Feb 10 00:01 DIR_C
> drwxr-xr-x 2 root root 16384 Mar  4 10:50 DIR_7
> drwxr-xr-x 2 root root 16384 Mar  4 16:46 DIR_A
> drwxr-xr-x 2 root root 16384 Mar  5 02:37 DIR_2
> drwxr-xr-x 2 root root 16384 Mar  5 17:39 DIR_4
> drwxr-xr-x 2 root root 16384 Mar  8 16:50 DIR_F
> drwxr-xr-x 2 root root 16384 Mar 15 15:51 DIR_8
> drwxr-xr-x 2 root root 16384 Mar 15 21:18 DIR_D
> drwxr-xr-x 2 root root 16384 Mar 15 22:25 DIR_0
> drwxr-xr-x 2 root root 16384 Mar 15 22:35 DIR_9
> drwxr-xr-x 2 root root 16384 Mar 15 22:56 DIR_E
> drwxr-xr-x 2 root root 16384 Mar 15 23:21 DIR_1
> drwxr-xr-x 2 root root 12288 Mar 16 00:07 DIR_B
> drwxr-xr-x 2 root root 16384 Mar 16 00:34 DIR_5
>
> I assume that this file is an issue as well......and needs to be removed.
>
>
> then, in the directory where the file should be, I have the same file:
>
> root@ceph03:/var/lib/ceph/osd/ceph-307/current/70.459s0_head/DIR_9/DIR_5/DIR_4/DIR_D/DIR_E#
> ls -ltr | grep -v __head_
> total 64840
> -rw-r--r-- 1 root root 1048576 Jan 23 21:49
> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long
>
> In the directory DIR_E here (from above), there is only one file without a
> __head_ in the pathname -- the file above....Should I be deleting both these
> _long files without the __head_ in DIR_E and in one above .../DIR_E?
>
> Since there is no directory structure HASH in these files, is that the
> indication that it is an orphan?
>
> Thanks,
> Jeff
>
>
>
>
> On Tue, Mar 15, 2016 at 8:38 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>>
>> Ah, actually, I think there will be duplicates only around half the
>> time -- either the old link or the new link could be orphaned
>> depending on which xfs decides to list first.  Only if the old link is
>> orphaned will it match the name of the object once it's recreated.  I
>> should be able to find time to put together a branch in the next week
>> or two if you want to wait.  It's still probably worth trying removing
>> that object in 70.459.
>> -Sam
>>
>> On Tue, Mar 15, 2016 at 6:03 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> > The bug is entirely independent of hardware issues -- entirely a ceph
>> > bug.  xfs doesn't let us specify an ordering when reading a directory,
>> > so we have to keep directory sizes small.  That means that when one of
>> > those pg collection subfolders has 320 files in it, we split it into
>> > up to 16 smaller directories.  Overwriting or removing an ec object
>> > requires us to rename the old version out of the way in case we need
>> > to roll back (that's the generation number I mentioned above).  For
>> > crash safety, this involves first creating a link to the new name,
>> > then removing the old one.  Both the old and new link will be in the
>> > same subdirectory.  If creating the new link pushes the directory to
>> > 320 files then we do a split while both links are present.  If the
>> > file in question is using the special long filename handling, then a
>> > bug in the resulting link juggling causes us to orphan the old version
>> > of the file.  Your cluster seems to have an unusual number of objects
>> > with very long names, which is why it is so visible on your cluster.
>> >
>> > There are critical pool sizes where the PGs will all be close to one
>> > of those limits.  It's possible you are not close to one of those
>> > limits.  It's also possible you are nearing one now.  In any case, the
>> > remapping gave the orphaned files an opportunity to cause trouble, but
>> > they don't appear due to remapping.
>> > -Sam
>> >
>> > On Tue, Mar 15, 2016 at 5:41 PM, Jeffrey McDonald <jmcdonal@xxxxxxx>
>> > wrote:
>> >> One more question.....did we hit the bug because we had hardware issues
>> >> during the remapping or would it have happened regardless of the
>> >> hardware
>> >> issues?   e.g. I'm not planning to add any additional hardware soon,
>> >> but
>> >> would the bug pop again on an (unpatched) system not subject to any
>> >> remapping?
>> >>
>> >> thanks,
>> >> jeff
>> >>
>> >> On Tue, Mar 15, 2016 at 7:27 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> >>>
>> >>> [back on list]
>> >>>
>> >>> ceph-objectstore-tool has a whole bunch of machinery for modifying an
>> >>> offline objectstore.  It would be the easiest place to put it -- you
>> >>> could add a
>> >>>
>> >>> ceph-objectstore-tool --op filestore-repair-orphan-links [--dry-run]
>> >>> ...
>> >>>
>> >>> command which would mount the filestore in a special mode and iterate
>> >>> over all collections and repair them.  If you want to go that route,
>> >>> we'd be happy to help you get it written.  Once it fixes your cluster,
>> >>> we'd then be able to merge and backport it in case anyone else hits
>> >>> it.
>> >>>
>> >>> You'd probably be fine doing it while the OSD is live...but as a rule
>> >>> I usually prefer to do my osd surgery offline.  Journal doesn't matter
>> >>> here, the orphaned files are basically invisible to the filestore
>> >>> (except when doing a collection scan for scrub) since they are in the
>> >>> wrong directory.
>> >>>
>> >>> I don't think the orphans are necessarily going to be 0 size.  There
>> >>> might be quirk of how radosgw creates these objects that always causes
>> >>> them to be created 0 size than then overwritten with a writefull -- if
>> >>> that's true it might be the case that you would only see 0 size ones.
>> >>> -Sam
>> >>>
>> >>> On Tue, Mar 15, 2016 at 4:02 PM, Jeffrey McDonald <jmcdonal@xxxxxxx>
>> >>> wrote:
>> >>> > Thanks,  I can try to write a tool to do this.   Does
>> >>> > ceph-objectstore-tool
>> >>> > provide a framework?
>> >>> >
>> >>> > Can I safely delete the files while the OSD is alive or should I
>> >>> > take it
>> >>> > offline?   Any concerns about the journal?
>> >>> >
>> >>> > Are there any other properties of the orphans, e.g. will the orphans
>> >>> > always
>> >>> > be size 0?
>> >>> >
>> >>> > Thanks!
>> >>> > Jeff
>> >>> >
>> >>> > On Tue, Mar 15, 2016 at 5:35 PM, Samuel Just <sjust@xxxxxxxxxx>
>> >>> > wrote:
>> >>> >>
>> >>> >> Ok, a branch merged to master which should fix this
>> >>> >> (https://github.com/ceph/ceph/pull/8136).  It'll be backported in
>> >>> >> due
>> >>> >> course.  The problem is that that patch won't clean orphaned files
>> >>> >> that already exist.
>> >>> >>
>> >>> >> Let me explain a bit about what the orphaned files look like.  The
>> >>> >> problem is files with object names that result in escaped filenames
>> >>> >> longer than the max filename ceph will create (~250 iirc).
>> >>> >> Normally,
>> >>> >> the name of the file is an escaped and sanitized version of the
>> >>> >> object
>> >>> >> name:
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/default.325674.107\u\ushadow\u.KePEE8heghHVnlb1\uEIupG0I5eROwRn\u77__head_C1DCD459__46_ffffffffffffffff_0
>> >>> >>
>> >>> >> corresponds to an object like
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> c1dcd459/default.325674.107__shadow_.KePEE8heghHVnlb1_EIupG0I5eROwRn_77/head//70
>> >>> >>
>> >>> >> the DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ path is derived from the hash
>> >>> >> starting with the last value: cd459 ->
>> >>> >> DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/
>> >>> >>
>> >>> >> It ends up in DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ because that's the
>> >>> >> longest path that exists (DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/DIR_D does
>> >>> >> not
>> >>> >> exist -- if DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ ever gets too full,
>> >>> >> DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/DIR_D would be created and this file
>> >>> >> would be moved into it).
>> >>> >>
>> >>> >> When the escaped filename gets too long, we truncate the filename,
>> >>> >> and
>> >>> >> then append a hash and a number yielding a name like:
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long
>> >>> >>
>> >>> >> The _long at the end is always present with files like this.
>> >>> >> fa202ec9b4b3b217275a is the hash of the filename.  The 0 indicates
>> >>> >> that it's the 0th file with this prefix and this hash -- if there
>> >>> >> are
>> >>> >> hash collisions with the same prefix, you'll see _1_ and _2_ and so
>> >>> >> on
>> >>> >> to distinguish them (very very unlikely).  When the filename has
>> >>> >> been
>> >>> >> truncated as with this one, you will find the full file name in the
>> >>> >> attrs (attr user.cephosd.lfn3):
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
>> >>> >> user.cephos.lfn3:
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_ffffffffffffffff_0
>> >>> >>
>> >>> >> Let's look at one of the orphaned files (the one with the same
>> >>> >> file-name as the previous one, actually):
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
>> >>> >> user.cephos.lfn3:
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_3189d_0
>> >>> >>
>> >>> >> This one has the same filename as the previous object, but is an
>> >>> >> orphan.  What makes it an orphan is that it has hash 79CED459, but
>> >>> >> is
>> >>> >> in ./DIR_9/DIR_5/DIR_4/DIR_D even though
>> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E exists (objects-files are always at
>> >>> >> the farthest directory from the root matching their hash).  All of
>> >>> >> the
>> >>> >> orphans will be long-file-name objects (but most long-file-name
>> >>> >> objects are fine and are neither orphans nor have duplicates --
>> >>> >> it's a
>> >>> >> fairly low occurrence bug).  In your case, I think *all* of the
>> >>> >> orphans will probably happen to have files with duplicate names in
>> >>> >> the
>> >>> >> correct directory -- though might not if the object had actually
>> >>> >> been
>> >>> >> deleted since the bug happened.  When there are duplicates, the
>> >>> >> full
>> >>> >> object names will either be the same or differ by the generation
>> >>> >> number at the end (ffffffffffffffff_0 vs 3189d_0) in this case.
>> >>> >>
>> >>> >> Once the orphaned files are cleaned up, your cluster should be back
>> >>> >> to
>> >>> >> normal.  If you want to wait, someone might get time to build a
>> >>> >> patch
>> >>> >> for ceph-objectstore-tool to automate this.  You can try removing
>> >>> >> the
>> >>> >> orphan we identified in pg 70.459 and re-scrubbing to confirm that
>> >>> >> that fixes the pg.
>> >>> >> -Sam
>> >>> >>
>> >>> >> On Wed, Mar 9, 2016 at 6:58 AM, Jeffrey McDonald <jmcdonal@xxxxxxx>
>> >>> >> wrote:
>> >>> >> > Hi, I went back to the mon logs to see if I could illicit any
>> >>> >> > additional
>> >>> >> > information about this PG.
>> >>> >> > Prior to 1/27/16, the deep-scrub on this OSD passes(then I see
>> >>> >> > obsolete
>> >>> >> > rollback objects found):
>> >>> >> >
>> >>> >> > ceph.log.4.gz:2016-01-20 09:43:36.195640 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 538
>> >>> >> > : cluster [INF] 70.459 deep-scrub ok
>> >>> >> > ceph.log.4.gz:2016-01-27 09:51:49.952459 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 583
>> >>> >> > : cluster [INF] 70.459 deep-scrub starts
>> >>> >> > ceph.log.4.gz:2016-01-27 10:10:57.196311 osd.108
>> >>> >> > 10.31.0.69:6816/4283
>> >>> >> > 335 :
>> >>> >> > cluster [ERR] osd.108 pg 70.459s5 found obsolete rollback obj
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/5
>> >>> >> > generation < trimmed_to 130605'206504...repaired
>> >>> >> > ceph.log.4.gz:2016-01-27 10:10:57.043942 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 584
>> >>> >> > : cluster [ERR] osd.307 pg 70.459s0 found obsolete rollback obj
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
>> >>> >> > generation < trimmed_to 130605'206504...repaired
>> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225017 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 585
>> >>> >> > : cluster [ERR] 70.459s0 shard 4(3) missing
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225068 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 586
>> >>> >> > : cluster [ERR] 70.459s0 shard 10(2) missing
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225088 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 587
>> >>> >> > : cluster [ERR] 70.459s0 shard 26(1) missing
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225127 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 588
>> >>> >> > : cluster [ERR] 70.459s0 shard 132(4) missing
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.926032 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 589
>> >>> >> > : cluster [ERR] 70.459s0 deep-scrub stat mismatch, got
>> >>> >> > 21324/21323
>> >>> >> > objects,
>> >>> >> > 0/0 clones, 21324/21323 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0
>> >>> >> > whiteouts,
>> >>> >> > 64313094166/64308899862 bytes,0/0 hit_set_archive bytes.
>> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.927589 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 590
>> >>> >> > : cluster [ERR] 70.459s0 deep-scrub 1 missing, 0 inconsistent
>> >>> >> > objects
>> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.931250 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 591
>> >>> >> > : cluster [ERR] 70.459 deep-scrub 5 errors
>> >>> >> > ceph.log.4.gz:2016-01-28 10:32:37.083809 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 592
>> >>> >> > : cluster [INF] 70.459 repair starts
>> >>> >> > ceph.log.4.gz:2016-01-28 10:51:44.608297 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 593
>> >>> >> > : cluster [ERR] osd.307 pg 70.459s0 found obsolete rollback obj
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
>> >>> >> > generation < trimmed_to 130605'206504...repaired
>> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802549 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 594
>> >>> >> > : cluster [ERR] 70.459s0 shard 4(3) missing
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802933 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 595
>> >>> >> > : cluster [ERR] 70.459s0 shard 10(2) missing
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802978 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 596
>> >>> >> > : cluster [ERR] 70.459s0 shard 26(1) missing
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.803039 osd.307
>> >>> >> > 10.31.0.67:6848/127170
>> >>> >> > 597
>> >>> >> > : cluster [ERR] 70.459s0 shard 132(4) missing
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >>> >> > ceph.log.4.gz:2016-01-28 10:51:44.781639 osd.108
>> >>> >> > 10.31.0.69:6816/4283
>> >>> >> > 338 :
>> >>> >> > cluster [ERR] osd.108 pg 70.459s5 found obsolete rollback obj
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/5
>> >>> >> > generation < trimmed_to 130605'206504...repaired
>> >>> >> > ceph.log.4.gz:2016-01-28 11:01:18.119350 osd.26
>> >>> >> > 10.31.0.103:6812/77378
>> >>> >> > 2312
>> >>> >> > : cluster [INF] 70.459s1 restarting backfill on osd.305(0) from
>> >>> >> > (0'0,0'0]
>> >>> >> > MAX to 130605'206506
>> >>> >> > ceph.log.4.gz:2016-02-01 13:40:55.096030 osd.307
>> >>> >> > 10.31.0.67:6848/13421
>> >>> >> > 16 :
>> >>> >> > cluster [INF] 70.459s0 restarting backfill on osd.210(1) from
>> >>> >> > (0'0,0'0]
>> >>> >> > MAX
>> >>> >> > to 135195'206996
>> >>> >> > ceph.log.4.gz:2016-02-01 13:41:10.623892 osd.307
>> >>> >> > 10.31.0.67:6848/13421
>> >>> >> > 27 :
>> >>> >> > cluster [INF] 70.459s0 restarting backfill on osd.25(1) from
>> >>> >> > (0'0,0'0]
>> >>> >> > MAX
>> >>> >> > to 135195'206996
>> >>> >> > ...
>> >>> >> >
>> >>> >> >
>> >>> >> > Regards,
>> >>> >> > Jeff
>> >>> >> >
>> >>> >> >
>> >>> >> > --
>> >>> >> >
>> >>> >> > Jeffrey McDonald, PhD
>> >>> >> > Assistant Director for HPC Operations
>> >>> >> > Minnesota Supercomputing Institute
>> >>> >> > University of Minnesota Twin Cities
>> >>> >> > 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
>> >>> >> > 117 Pleasant St SE           phone: +1 612 625-6905
>> >>> >> > Minneapolis, MN 55455        fax:   +1 612 624-8861
>> >>> >> >
>> >>> >> >
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> >
>> >>> > Jeffrey McDonald, PhD
>> >>> > Assistant Director for HPC Operations
>> >>> > Minnesota Supercomputing Institute
>> >>> > University of Minnesota Twin Cities
>> >>> > 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
>> >>> > 117 Pleasant St SE           phone: +1 612 625-6905
>> >>> > Minneapolis, MN 55455        fax:   +1 612 624-8861
>> >>> >
>> >>> >
>> >>
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Jeffrey McDonald, PhD
>> >> Assistant Director for HPC Operations
>> >> Minnesota Supercomputing Institute
>> >> University of Minnesota Twin Cities
>> >> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
>> >> 117 Pleasant St SE           phone: +1 612 625-6905
>> >> Minneapolis, MN 55455        fax:   +1 612 624-8861
>> >>
>> >>
>
>
>
>
> --
>
> Jeffrey McDonald, PhD
> Assistant Director for HPC Operations
> Minnesota Supercomputing Institute
> University of Minnesota Twin Cities
> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
> 117 Pleasant St SE           phone: +1 612 625-6905
> Minneapolis, MN 55455        fax:   +1 612 624-8861
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux