Re: inconsistent PG -> unfound objects on an erasure coded system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Basically, the lookup process is:

try DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/DIR_C/DIR_9/DIR_7...doesn't exist
try DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/DIR_C/DIR_9/...doesn't exist
try DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/DIR_C/...doesn't exist
try DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/...does exist, object must be here

If DIR_E did not exist, then it would check DIR_9/DIR_5/DIR_4/DIR_D
and so on.  The hash is always 32 bit (8 hex digits) -- baked into the
rados object distribution algorithms.  When DIR_E hits the threshhold
(320 iirc), the objects (files) in that directory will be moved one
more directory deeper.  An object with hash 79CED459 would then be in
DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/DIR_C/.

Basically, the depth of the tree is dynamic.  The file will be in the
deepest existing path that matches the hash (might even be different
between replicas, the tree structure is purely internal to the
filestore).
-Sam

On Wed, Mar 16, 2016 at 10:46 AM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote:
> OK, I think I have it now.   I do have one more question, in this case, the
> hash indicates the directory structure but how do I know from the hash how
> many levels I should go down.    If the hash is a 32-bit hex integer, *how
> do I know how many should be included as part of the hash for the directory
> structure*?
>
> e.g. our example: the hash is 79CED459 and the directory is then the last
> five taken in reverse order, what happens if there are only 4 levels of
> hierarchy?    I only have this one example so far.....is the 79C of the hash
> constant?   Would the hash pick up another hex character if the pg splits
> again?
>
> Thanks,
> Jeff
>
> On Wed, Mar 16, 2016 at 10:24 AM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>>
>> There is a directory structure hash, it's just that it's at the end of
>> the name and you'll have to check the xattr I mentioned to find it.
>>
>> I think that file is actually the one we are talking about removing.
>>
>>
>> ./DIR_9/DIR_5/DIR_4/DIR_D/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
>> user.cephos.lfn3:
>>
>> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_3189d_0
>>
>> Notice that the user.cephosd.lfn3 attr has the full name, and it
>> *does* have a hash 79CED459 (you referred to it as a directory hash I
>> think, but it's actually the hash we used to place it on this osd to
>> begin with).
>>
>> In specifically this case, you shouldn't find any files in the
>> DIR_9/DIR_5/DIR_4/DIR_D directory since there are 16 subdirectories
>> (so all hash values should hash to one of those).
>>
>> The one in DIR_9/DIR_5/DIR_4/DIR_D/DIR_E is completely fine -- that's
>> the actual object file, don't remove that.  If you look at the attr:
>>
>>
>> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
>> user.cephos.lfn3:
>>
>> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_ffffffffffffffff_0
>>
>> The hash is 79CED459, which means that (assuming
>> DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/DIR_C does *not* exist) it's in the
>> right place.
>>
>> The ENOENT return
>>
>> 2016-03-07 16:11:41.828332 7ff30cdad700 10
>> filestore(/var/lib/ceph/osd/ceph-307) remove
>>
>> 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
>> = -2
>> 2016-03-07 21:44:02.197676 7fe96b56f700 10
>> filestore(/var/lib/ceph/osd/ceph-307) remove
>>
>> 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
>> = -2
>>
>> actually was a symptom in this case, but, in general, it's not
>> indicative of anything -- the filestore can get ENOENT return values
>> for legitimate reasons.
>>
>> To reiterate: files that end in something like
>> fa202ec9b4b3b217275a_0_long are *not* necessarily orphans -- you need
>> to check the user.cephos.lfn3 attr (as you did before) for the full
>> length file name and determine whether the file is in the right place.
>> -Sam
>>
>> On Wed, Mar 16, 2016 at 7:49 AM, Jeffrey McDonald <jmcdonal@xxxxxxx>
>> wrote:
>> > Hi Sam,
>> >
>> > In the 70.459 logs from the deep-scrub, there is an error:
>> >
>> >  $ zgrep "= \-2$" ceph-osd.307.log.1.gz
>> > 2016-03-07 16:11:41.828332 7ff30cdad700 10
>> > filestore(/var/lib/ceph/osd/ceph-307) remove
>> >
>> > 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
>> > = -2
>> > 2016-03-07 21:44:02.197676 7fe96b56f700 10
>> > filestore(/var/lib/ceph/osd/ceph-307) remove
>> >
>> > 70.459s0_head/79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
>> > = -2
>> >
>> > I'm taking this as an indication of the error you mentioned.    It looks
>> > to
>> > me as if this bug leaves two files with "issues" based upon what I see
>> > on
>> > the filesystem.
>> >
>> > First, I have a size-0 file in a directory where I expect only to have
>> > directories:
>> >
>> >
>> > root@ceph03:/var/lib/ceph/osd/ceph-307/current/70.459s0_head/DIR_9/DIR_5/DIR_4/DIR_D#
>> > ls -ltr
>> > total 320
>> > -rw-r--r-- 1 root root     0 Jan 23 21:49
>> >
>> > default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long
>> > drwxr-xr-x 2 root root 16384 Feb  5 15:13 DIR_6
>> > drwxr-xr-x 2 root root 16384 Feb  5 17:26 DIR_3
>> > drwxr-xr-x 2 root root 16384 Feb 10 00:01 DIR_C
>> > drwxr-xr-x 2 root root 16384 Mar  4 10:50 DIR_7
>> > drwxr-xr-x 2 root root 16384 Mar  4 16:46 DIR_A
>> > drwxr-xr-x 2 root root 16384 Mar  5 02:37 DIR_2
>> > drwxr-xr-x 2 root root 16384 Mar  5 17:39 DIR_4
>> > drwxr-xr-x 2 root root 16384 Mar  8 16:50 DIR_F
>> > drwxr-xr-x 2 root root 16384 Mar 15 15:51 DIR_8
>> > drwxr-xr-x 2 root root 16384 Mar 15 21:18 DIR_D
>> > drwxr-xr-x 2 root root 16384 Mar 15 22:25 DIR_0
>> > drwxr-xr-x 2 root root 16384 Mar 15 22:35 DIR_9
>> > drwxr-xr-x 2 root root 16384 Mar 15 22:56 DIR_E
>> > drwxr-xr-x 2 root root 16384 Mar 15 23:21 DIR_1
>> > drwxr-xr-x 2 root root 12288 Mar 16 00:07 DIR_B
>> > drwxr-xr-x 2 root root 16384 Mar 16 00:34 DIR_5
>> >
>> > I assume that this file is an issue as well......and needs to be
>> > removed.
>> >
>> >
>> > then, in the directory where the file should be, I have the same file:
>> >
>> >
>> > root@ceph03:/var/lib/ceph/osd/ceph-307/current/70.459s0_head/DIR_9/DIR_5/DIR_4/DIR_D/DIR_E#
>> > ls -ltr | grep -v __head_
>> > total 64840
>> > -rw-r--r-- 1 root root 1048576 Jan 23 21:49
>> >
>> > default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long
>> >
>> > In the directory DIR_E here (from above), there is only one file without
>> > a
>> > __head_ in the pathname -- the file above....Should I be deleting both
>> > these
>> > _long files without the __head_ in DIR_E and in one above .../DIR_E?
>> >
>> > Since there is no directory structure HASH in these files, is that the
>> > indication that it is an orphan?
>> >
>> > Thanks,
>> > Jeff
>> >
>> >
>> >
>> >
>> > On Tue, Mar 15, 2016 at 8:38 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> >>
>> >> Ah, actually, I think there will be duplicates only around half the
>> >> time -- either the old link or the new link could be orphaned
>> >> depending on which xfs decides to list first.  Only if the old link is
>> >> orphaned will it match the name of the object once it's recreated.  I
>> >> should be able to find time to put together a branch in the next week
>> >> or two if you want to wait.  It's still probably worth trying removing
>> >> that object in 70.459.
>> >> -Sam
>> >>
>> >> On Tue, Mar 15, 2016 at 6:03 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> >> > The bug is entirely independent of hardware issues -- entirely a ceph
>> >> > bug.  xfs doesn't let us specify an ordering when reading a
>> >> > directory,
>> >> > so we have to keep directory sizes small.  That means that when one
>> >> > of
>> >> > those pg collection subfolders has 320 files in it, we split it into
>> >> > up to 16 smaller directories.  Overwriting or removing an ec object
>> >> > requires us to rename the old version out of the way in case we need
>> >> > to roll back (that's the generation number I mentioned above).  For
>> >> > crash safety, this involves first creating a link to the new name,
>> >> > then removing the old one.  Both the old and new link will be in the
>> >> > same subdirectory.  If creating the new link pushes the directory to
>> >> > 320 files then we do a split while both links are present.  If the
>> >> > file in question is using the special long filename handling, then a
>> >> > bug in the resulting link juggling causes us to orphan the old
>> >> > version
>> >> > of the file.  Your cluster seems to have an unusual number of objects
>> >> > with very long names, which is why it is so visible on your cluster.
>> >> >
>> >> > There are critical pool sizes where the PGs will all be close to one
>> >> > of those limits.  It's possible you are not close to one of those
>> >> > limits.  It's also possible you are nearing one now.  In any case,
>> >> > the
>> >> > remapping gave the orphaned files an opportunity to cause trouble,
>> >> > but
>> >> > they don't appear due to remapping.
>> >> > -Sam
>> >> >
>> >> > On Tue, Mar 15, 2016 at 5:41 PM, Jeffrey McDonald <jmcdonal@xxxxxxx>
>> >> > wrote:
>> >> >> One more question.....did we hit the bug because we had hardware
>> >> >> issues
>> >> >> during the remapping or would it have happened regardless of the
>> >> >> hardware
>> >> >> issues?   e.g. I'm not planning to add any additional hardware soon,
>> >> >> but
>> >> >> would the bug pop again on an (unpatched) system not subject to any
>> >> >> remapping?
>> >> >>
>> >> >> thanks,
>> >> >> jeff
>> >> >>
>> >> >> On Tue, Mar 15, 2016 at 7:27 PM, Samuel Just <sjust@xxxxxxxxxx>
>> >> >> wrote:
>> >> >>>
>> >> >>> [back on list]
>> >> >>>
>> >> >>> ceph-objectstore-tool has a whole bunch of machinery for modifying
>> >> >>> an
>> >> >>> offline objectstore.  It would be the easiest place to put it --
>> >> >>> you
>> >> >>> could add a
>> >> >>>
>> >> >>> ceph-objectstore-tool --op filestore-repair-orphan-links
>> >> >>> [--dry-run]
>> >> >>> ...
>> >> >>>
>> >> >>> command which would mount the filestore in a special mode and
>> >> >>> iterate
>> >> >>> over all collections and repair them.  If you want to go that
>> >> >>> route,
>> >> >>> we'd be happy to help you get it written.  Once it fixes your
>> >> >>> cluster,
>> >> >>> we'd then be able to merge and backport it in case anyone else hits
>> >> >>> it.
>> >> >>>
>> >> >>> You'd probably be fine doing it while the OSD is live...but as a
>> >> >>> rule
>> >> >>> I usually prefer to do my osd surgery offline.  Journal doesn't
>> >> >>> matter
>> >> >>> here, the orphaned files are basically invisible to the filestore
>> >> >>> (except when doing a collection scan for scrub) since they are in
>> >> >>> the
>> >> >>> wrong directory.
>> >> >>>
>> >> >>> I don't think the orphans are necessarily going to be 0 size.
>> >> >>> There
>> >> >>> might be quirk of how radosgw creates these objects that always
>> >> >>> causes
>> >> >>> them to be created 0 size than then overwritten with a writefull --
>> >> >>> if
>> >> >>> that's true it might be the case that you would only see 0 size
>> >> >>> ones.
>> >> >>> -Sam
>> >> >>>
>> >> >>> On Tue, Mar 15, 2016 at 4:02 PM, Jeffrey McDonald
>> >> >>> <jmcdonal@xxxxxxx>
>> >> >>> wrote:
>> >> >>> > Thanks,  I can try to write a tool to do this.   Does
>> >> >>> > ceph-objectstore-tool
>> >> >>> > provide a framework?
>> >> >>> >
>> >> >>> > Can I safely delete the files while the OSD is alive or should I
>> >> >>> > take it
>> >> >>> > offline?   Any concerns about the journal?
>> >> >>> >
>> >> >>> > Are there any other properties of the orphans, e.g. will the
>> >> >>> > orphans
>> >> >>> > always
>> >> >>> > be size 0?
>> >> >>> >
>> >> >>> > Thanks!
>> >> >>> > Jeff
>> >> >>> >
>> >> >>> > On Tue, Mar 15, 2016 at 5:35 PM, Samuel Just <sjust@xxxxxxxxxx>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Ok, a branch merged to master which should fix this
>> >> >>> >> (https://github.com/ceph/ceph/pull/8136).  It'll be backported
>> >> >>> >> in
>> >> >>> >> due
>> >> >>> >> course.  The problem is that that patch won't clean orphaned
>> >> >>> >> files
>> >> >>> >> that already exist.
>> >> >>> >>
>> >> >>> >> Let me explain a bit about what the orphaned files look like.
>> >> >>> >> The
>> >> >>> >> problem is files with object names that result in escaped
>> >> >>> >> filenames
>> >> >>> >> longer than the max filename ceph will create (~250 iirc).
>> >> >>> >> Normally,
>> >> >>> >> the name of the file is an escaped and sanitized version of the
>> >> >>> >> object
>> >> >>> >> name:
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/default.325674.107\u\ushadow\u.KePEE8heghHVnlb1\uEIupG0I5eROwRn\u77__head_C1DCD459__46_ffffffffffffffff_0
>> >> >>> >>
>> >> >>> >> corresponds to an object like
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> c1dcd459/default.325674.107__shadow_.KePEE8heghHVnlb1_EIupG0I5eROwRn_77/head//70
>> >> >>> >>
>> >> >>> >> the DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ path is derived from the hash
>> >> >>> >> starting with the last value: cd459 ->
>> >> >>> >> DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/
>> >> >>> >>
>> >> >>> >> It ends up in DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ because that's the
>> >> >>> >> longest path that exists (DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/DIR_D
>> >> >>> >> does
>> >> >>> >> not
>> >> >>> >> exist -- if DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ ever gets too full,
>> >> >>> >> DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/DIR_D would be created and this
>> >> >>> >> file
>> >> >>> >> would be moved into it).
>> >> >>> >>
>> >> >>> >> When the escaped filename gets too long, we truncate the
>> >> >>> >> filename,
>> >> >>> >> and
>> >> >>> >> then append a hash and a number yielding a name like:
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long
>> >> >>> >>
>> >> >>> >> The _long at the end is always present with files like this.
>> >> >>> >> fa202ec9b4b3b217275a is the hash of the filename.  The 0
>> >> >>> >> indicates
>> >> >>> >> that it's the 0th file with this prefix and this hash -- if
>> >> >>> >> there
>> >> >>> >> are
>> >> >>> >> hash collisions with the same prefix, you'll see _1_ and _2_ and
>> >> >>> >> so
>> >> >>> >> on
>> >> >>> >> to distinguish them (very very unlikely).  When the filename has
>> >> >>> >> been
>> >> >>> >> truncated as with this one, you will find the full file name in
>> >> >>> >> the
>> >> >>> >> attrs (attr user.cephosd.lfn3):
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
>> >> >>> >> user.cephos.lfn3:
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_ffffffffffffffff_0
>> >> >>> >>
>> >> >>> >> Let's look at one of the orphaned files (the one with the same
>> >> >>> >> file-name as the previous one, actually):
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
>> >> >>> >> user.cephos.lfn3:
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_3189d_0
>> >> >>> >>
>> >> >>> >> This one has the same filename as the previous object, but is an
>> >> >>> >> orphan.  What makes it an orphan is that it has hash 79CED459,
>> >> >>> >> but
>> >> >>> >> is
>> >> >>> >> in ./DIR_9/DIR_5/DIR_4/DIR_D even though
>> >> >>> >> ./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E exists (objects-files are always
>> >> >>> >> at
>> >> >>> >> the farthest directory from the root matching their hash).  All
>> >> >>> >> of
>> >> >>> >> the
>> >> >>> >> orphans will be long-file-name objects (but most long-file-name
>> >> >>> >> objects are fine and are neither orphans nor have duplicates --
>> >> >>> >> it's a
>> >> >>> >> fairly low occurrence bug).  In your case, I think *all* of the
>> >> >>> >> orphans will probably happen to have files with duplicate names
>> >> >>> >> in
>> >> >>> >> the
>> >> >>> >> correct directory -- though might not if the object had actually
>> >> >>> >> been
>> >> >>> >> deleted since the bug happened.  When there are duplicates, the
>> >> >>> >> full
>> >> >>> >> object names will either be the same or differ by the generation
>> >> >>> >> number at the end (ffffffffffffffff_0 vs 3189d_0) in this case.
>> >> >>> >>
>> >> >>> >> Once the orphaned files are cleaned up, your cluster should be
>> >> >>> >> back
>> >> >>> >> to
>> >> >>> >> normal.  If you want to wait, someone might get time to build a
>> >> >>> >> patch
>> >> >>> >> for ceph-objectstore-tool to automate this.  You can try
>> >> >>> >> removing
>> >> >>> >> the
>> >> >>> >> orphan we identified in pg 70.459 and re-scrubbing to confirm
>> >> >>> >> that
>> >> >>> >> that fixes the pg.
>> >> >>> >> -Sam
>> >> >>> >>
>> >> >>> >> On Wed, Mar 9, 2016 at 6:58 AM, Jeffrey McDonald
>> >> >>> >> <jmcdonal@xxxxxxx>
>> >> >>> >> wrote:
>> >> >>> >> > Hi, I went back to the mon logs to see if I could illicit any
>> >> >>> >> > additional
>> >> >>> >> > information about this PG.
>> >> >>> >> > Prior to 1/27/16, the deep-scrub on this OSD passes(then I see
>> >> >>> >> > obsolete
>> >> >>> >> > rollback objects found):
>> >> >>> >> >
>> >> >>> >> > ceph.log.4.gz:2016-01-20 09:43:36.195640 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 538
>> >> >>> >> > : cluster [INF] 70.459 deep-scrub ok
>> >> >>> >> > ceph.log.4.gz:2016-01-27 09:51:49.952459 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 583
>> >> >>> >> > : cluster [INF] 70.459 deep-scrub starts
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:57.196311 osd.108
>> >> >>> >> > 10.31.0.69:6816/4283
>> >> >>> >> > 335 :
>> >> >>> >> > cluster [ERR] osd.108 pg 70.459s5 found obsolete rollback obj
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/5
>> >> >>> >> > generation < trimmed_to 130605'206504...repaired
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:57.043942 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 584
>> >> >>> >> > : cluster [ERR] osd.307 pg 70.459s0 found obsolete rollback
>> >> >>> >> > obj
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
>> >> >>> >> > generation < trimmed_to 130605'206504...repaired
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225017 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 585
>> >> >>> >> > : cluster [ERR] 70.459s0 shard 4(3) missing
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225068 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 586
>> >> >>> >> > : cluster [ERR] 70.459s0 shard 10(2) missing
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225088 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 587
>> >> >>> >> > : cluster [ERR] 70.459s0 shard 26(1) missing
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:10:58.225127 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 588
>> >> >>> >> > : cluster [ERR] 70.459s0 shard 132(4) missing
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.926032 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 589
>> >> >>> >> > : cluster [ERR] 70.459s0 deep-scrub stat mismatch, got
>> >> >>> >> > 21324/21323
>> >> >>> >> > objects,
>> >> >>> >> > 0/0 clones, 21324/21323 dirty, 0/0 omap, 0/0 hit_set_archive,
>> >> >>> >> > 0/0
>> >> >>> >> > whiteouts,
>> >> >>> >> > 64313094166/64308899862 bytes,0/0 hit_set_archive bytes.
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.927589 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 590
>> >> >>> >> > : cluster [ERR] 70.459s0 deep-scrub 1 missing, 0 inconsistent
>> >> >>> >> > objects
>> >> >>> >> > ceph.log.4.gz:2016-01-27 10:13:52.931250 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 591
>> >> >>> >> > : cluster [ERR] 70.459 deep-scrub 5 errors
>> >> >>> >> > ceph.log.4.gz:2016-01-28 10:32:37.083809 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 592
>> >> >>> >> > : cluster [INF] 70.459 repair starts
>> >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:44.608297 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 593
>> >> >>> >> > : cluster [ERR] osd.307 pg 70.459s0 found obsolete rollback
>> >> >>> >> > obj
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
>> >> >>> >> > generation < trimmed_to 130605'206504...repaired
>> >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802549 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 594
>> >> >>> >> > : cluster [ERR] 70.459s0 shard 4(3) missing
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802933 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 595
>> >> >>> >> > : cluster [ERR] 70.459s0 shard 10(2) missing
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.802978 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 596
>> >> >>> >> > : cluster [ERR] 70.459s0 shard 26(1) missing
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:45.803039 osd.307
>> >> >>> >> > 10.31.0.67:6848/127170
>> >> >>> >> > 597
>> >> >>> >> > : cluster [ERR] 70.459s0 shard 132(4) missing
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
>> >> >>> >> > ceph.log.4.gz:2016-01-28 10:51:44.781639 osd.108
>> >> >>> >> > 10.31.0.69:6816/4283
>> >> >>> >> > 338 :
>> >> >>> >> > cluster [ERR] osd.108 pg 70.459s5 found obsolete rollback obj
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/5
>> >> >>> >> > generation < trimmed_to 130605'206504...repaired
>> >> >>> >> > ceph.log.4.gz:2016-01-28 11:01:18.119350 osd.26
>> >> >>> >> > 10.31.0.103:6812/77378
>> >> >>> >> > 2312
>> >> >>> >> > : cluster [INF] 70.459s1 restarting backfill on osd.305(0)
>> >> >>> >> > from
>> >> >>> >> > (0'0,0'0]
>> >> >>> >> > MAX to 130605'206506
>> >> >>> >> > ceph.log.4.gz:2016-02-01 13:40:55.096030 osd.307
>> >> >>> >> > 10.31.0.67:6848/13421
>> >> >>> >> > 16 :
>> >> >>> >> > cluster [INF] 70.459s0 restarting backfill on osd.210(1) from
>> >> >>> >> > (0'0,0'0]
>> >> >>> >> > MAX
>> >> >>> >> > to 135195'206996
>> >> >>> >> > ceph.log.4.gz:2016-02-01 13:41:10.623892 osd.307
>> >> >>> >> > 10.31.0.67:6848/13421
>> >> >>> >> > 27 :
>> >> >>> >> > cluster [INF] 70.459s0 restarting backfill on osd.25(1) from
>> >> >>> >> > (0'0,0'0]
>> >> >>> >> > MAX
>> >> >>> >> > to 135195'206996
>> >> >>> >> > ...
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > Regards,
>> >> >>> >> > Jeff
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > --
>> >> >>> >> >
>> >> >>> >> > Jeffrey McDonald, PhD
>> >> >>> >> > Assistant Director for HPC Operations
>> >> >>> >> > Minnesota Supercomputing Institute
>> >> >>> >> > University of Minnesota Twin Cities
>> >> >>> >> > 599 Walter Library           email:
>> >> >>> >> > jeffrey.mcdonald@xxxxxxxxxxx
>> >> >>> >> > 117 Pleasant St SE           phone: +1 612 625-6905
>> >> >>> >> > Minneapolis, MN 55455        fax:   +1 612 624-8861
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> >
>> >> >>> > --
>> >> >>> >
>> >> >>> > Jeffrey McDonald, PhD
>> >> >>> > Assistant Director for HPC Operations
>> >> >>> > Minnesota Supercomputing Institute
>> >> >>> > University of Minnesota Twin Cities
>> >> >>> > 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
>> >> >>> > 117 Pleasant St SE           phone: +1 612 625-6905
>> >> >>> > Minneapolis, MN 55455        fax:   +1 612 624-8861
>> >> >>> >
>> >> >>> >
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Jeffrey McDonald, PhD
>> >> >> Assistant Director for HPC Operations
>> >> >> Minnesota Supercomputing Institute
>> >> >> University of Minnesota Twin Cities
>> >> >> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
>> >> >> 117 Pleasant St SE           phone: +1 612 625-6905
>> >> >> Minneapolis, MN 55455        fax:   +1 612 624-8861
>> >> >>
>> >> >>
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Jeffrey McDonald, PhD
>> > Assistant Director for HPC Operations
>> > Minnesota Supercomputing Institute
>> > University of Minnesota Twin Cities
>> > 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
>> > 117 Pleasant St SE           phone: +1 612 625-6905
>> > Minneapolis, MN 55455        fax:   +1 612 624-8861
>> >
>> >
>
>
>
>
> --
>
> Jeffrey McDonald, PhD
> Assistant Director for HPC Operations
> Minnesota Supercomputing Institute
> University of Minnesota Twin Cities
> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
> 117 Pleasant St SE           phone: +1 612 625-6905
> Minneapolis, MN 55455        fax:   +1 612 624-8861
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux