Re: inconsistent PG -> unfound objects on an erasure coded system

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, a branch merged to master which should fix this
(https://github.com/ceph/ceph/pull/8136).  It'll be backported in due
course.  The problem is that that patch won't clean orphaned files
that already exist.

Let me explain a bit about what the orphaned files look like.  The
problem is files with object names that result in escaped filenames
longer than the max filename ceph will create (~250 iirc).  Normally,
the name of the file is an escaped and sanitized version of the object
name:

./DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/default.325674.107\u\ushadow\u.KePEE8heghHVnlb1\uEIupG0I5eROwRn\u77__head_C1DCD459__46_ffffffffffffffff_0

corresponds to an object like

c1dcd459/default.325674.107__shadow_.KePEE8heghHVnlb1_EIupG0I5eROwRn_77/head//70

the DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ path is derived from the hash
starting with the last value: cd459 -> DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/

It ends up in DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ because that's the
longest path that exists (DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/DIR_D does not
exist -- if DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/ ever gets too full,
DIR_9/DIR_5/DIR_4/DIR_D/DIR_C/DIR_D would be created and this file
would be moved into it).

When the escaped filename gets too long, we truncate the filename, and
then append a hash and a number yielding a name like:

./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long

The _long at the end is always present with files like this.
fa202ec9b4b3b217275a is the hash of the filename.  The 0 indicates
that it's the 0th file with this prefix and this hash -- if there are
hash collisions with the same prefix, you'll see _1_ and _2_ and so on
to distinguish them (very very unlikely).  When the filename has been
truncated as with this one, you will find the full file name in the
attrs (attr user.cephosd.lfn3):

./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
user.cephos.lfn3:
default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_ffffffffffffffff_0

Let's look at one of the orphaned files (the one with the same
file-name as the previous one, actually):

./DIR_9/DIR_5/DIR_4/DIR_D/default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkL_fa202ec9b4b3b217275a_0_long:
user.cephos.lfn3:
default.724733.17\u\ushadow\uprostate\srnaseq\s8e5da6e8-8881-4813-a4e3-327df57fd1b7\sUNCID\u2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304\uUNC14-SN744\u0400\uAC3LWGACXX\u7\uGAGTGG.tar.gz.2~\u1r\uFGidmpEP8GRsJkNLfAh9CokxkLf.4\u156__head_79CED459__46_3189d_0

This one has the same filename as the previous object, but is an
orphan.  What makes it an orphan is that it has hash 79CED459, but is
in ./DIR_9/DIR_5/DIR_4/DIR_D even though
./DIR_9/DIR_5/DIR_4/DIR_D/DIR_E exists (objects-files are always at
the farthest directory from the root matching their hash).  All of the
orphans will be long-file-name objects (but most long-file-name
objects are fine and are neither orphans nor have duplicates -- it's a
fairly low occurrence bug).  In your case, I think *all* of the
orphans will probably happen to have files with duplicate names in the
correct directory -- though might not if the object had actually been
deleted since the bug happened.  When there are duplicates, the full
object names will either be the same or differ by the generation
number at the end (ffffffffffffffff_0 vs 3189d_0) in this case.

Once the orphaned files are cleaned up, your cluster should be back to
normal.  If you want to wait, someone might get time to build a patch
for ceph-objectstore-tool to automate this.  You can try removing the
orphan we identified in pg 70.459 and re-scrubbing to confirm that
that fixes the pg.
-Sam

On Wed, Mar 9, 2016 at 6:58 AM, Jeffrey McDonald <jmcdonal@xxxxxxx> wrote:
> Hi, I went back to the mon logs to see if I could illicit any additional
> information about this PG.
> Prior to 1/27/16, the deep-scrub on this OSD passes(then I see obsolete
> rollback objects found):
>
> ceph.log.4.gz:2016-01-20 09:43:36.195640 osd.307 10.31.0.67:6848/127170 538
> : cluster [INF] 70.459 deep-scrub ok
> ceph.log.4.gz:2016-01-27 09:51:49.952459 osd.307 10.31.0.67:6848/127170 583
> : cluster [INF] 70.459 deep-scrub starts
> ceph.log.4.gz:2016-01-27 10:10:57.196311 osd.108 10.31.0.69:6816/4283 335 :
> cluster [ERR] osd.108 pg 70.459s5 found obsolete rollback obj
> 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/5
> generation < trimmed_to 130605'206504...repaired
> ceph.log.4.gz:2016-01-27 10:10:57.043942 osd.307 10.31.0.67:6848/127170 584
> : cluster [ERR] osd.307 pg 70.459s0 found obsolete rollback obj
> 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
> generation < trimmed_to 130605'206504...repaired
> ceph.log.4.gz:2016-01-27 10:10:58.225017 osd.307 10.31.0.67:6848/127170 585
> : cluster [ERR] 70.459s0 shard 4(3) missing
> cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
> ceph.log.4.gz:2016-01-27 10:10:58.225068 osd.307 10.31.0.67:6848/127170 586
> : cluster [ERR] 70.459s0 shard 10(2) missing
> cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
> ceph.log.4.gz:2016-01-27 10:10:58.225088 osd.307 10.31.0.67:6848/127170 587
> : cluster [ERR] 70.459s0 shard 26(1) missing
> cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
> ceph.log.4.gz:2016-01-27 10:10:58.225127 osd.307 10.31.0.67:6848/127170 588
> : cluster [ERR] 70.459s0 shard 132(4) missing
> cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
> ceph.log.4.gz:2016-01-27 10:13:52.926032 osd.307 10.31.0.67:6848/127170 589
> : cluster [ERR] 70.459s0 deep-scrub stat mismatch, got 21324/21323 objects,
> 0/0 clones, 21324/21323 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts,
> 64313094166/64308899862 bytes,0/0 hit_set_archive bytes.
> ceph.log.4.gz:2016-01-27 10:13:52.927589 osd.307 10.31.0.67:6848/127170 590
> : cluster [ERR] 70.459s0 deep-scrub 1 missing, 0 inconsistent objects
> ceph.log.4.gz:2016-01-27 10:13:52.931250 osd.307 10.31.0.67:6848/127170 591
> : cluster [ERR] 70.459 deep-scrub 5 errors
> ceph.log.4.gz:2016-01-28 10:32:37.083809 osd.307 10.31.0.67:6848/127170 592
> : cluster [INF] 70.459 repair starts
> ceph.log.4.gz:2016-01-28 10:51:44.608297 osd.307 10.31.0.67:6848/127170 593
> : cluster [ERR] osd.307 pg 70.459s0 found obsolete rollback obj
> 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/0
> generation < trimmed_to 130605'206504...repaired
> ceph.log.4.gz:2016-01-28 10:51:45.802549 osd.307 10.31.0.67:6848/127170 594
> : cluster [ERR] 70.459s0 shard 4(3) missing
> cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
> ceph.log.4.gz:2016-01-28 10:51:45.802933 osd.307 10.31.0.67:6848/127170 595
> : cluster [ERR] 70.459s0 shard 10(2) missing
> cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
> ceph.log.4.gz:2016-01-28 10:51:45.802978 osd.307 10.31.0.67:6848/127170 596
> : cluster [ERR] 70.459s0 shard 26(1) missing
> cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
> ceph.log.4.gz:2016-01-28 10:51:45.803039 osd.307 10.31.0.67:6848/127170 597
> : cluster [ERR] 70.459s0 shard 132(4) missing
> cffed459/default.325671.93__shadow_wrfout_d01_2005-04-18_00_00_00.2~DyHqLoH7FFV_6fz8MOzmPEVO3Td4bZx.10_82/head//70
> ceph.log.4.gz:2016-01-28 10:51:44.781639 osd.108 10.31.0.69:6816/4283 338 :
> cluster [ERR] osd.108 pg 70.459s5 found obsolete rollback obj
> 79ced459/default.724733.17__shadow_prostate/rnaseq/8e5da6e8-8881-4813-a4e3-327df57fd1b7/UNCID_2409283.304a95c1-2180-4a81-a85a-880427e97d67.140304_UNC14-SN744_0400_AC3LWGACXX_7_GAGTGG.tar.gz.2~_1r_FGidmpEP8GRsJkNLfAh9CokxkLf.4_156/head//70/202909/5
> generation < trimmed_to 130605'206504...repaired
> ceph.log.4.gz:2016-01-28 11:01:18.119350 osd.26 10.31.0.103:6812/77378 2312
> : cluster [INF] 70.459s1 restarting backfill on osd.305(0) from (0'0,0'0]
> MAX to 130605'206506
> ceph.log.4.gz:2016-02-01 13:40:55.096030 osd.307 10.31.0.67:6848/13421 16 :
> cluster [INF] 70.459s0 restarting backfill on osd.210(1) from (0'0,0'0] MAX
> to 135195'206996
> ceph.log.4.gz:2016-02-01 13:41:10.623892 osd.307 10.31.0.67:6848/13421 27 :
> cluster [INF] 70.459s0 restarting backfill on osd.25(1) from (0'0,0'0] MAX
> to 135195'206996
> ...
>
>
> Regards,
> Jeff
>
>
> --
>
> Jeffrey McDonald, PhD
> Assistant Director for HPC Operations
> Minnesota Supercomputing Institute
> University of Minnesota Twin Cities
> 599 Walter Library           email: jeffrey.mcdonald@xxxxxxxxxxx
> 117 Pleasant St SE           phone: +1 612 625-6905
> Minneapolis, MN 55455        fax:   +1 612 624-8861
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux