Re: luminous filesystem is degraded

Two Spirit <twospirit6905@xxxxxxxxx> · Tue, 12 Sep 2017 11:31:40 -0700

I don't have any OSDs that are down, so the 1 unfound object I think
needs to be manually cleared. I ran across a webpage a while ago  that
talked about how to clear it, but if you have a reference, would save
me a little time.

I've included the outputs of the commands you asked. The ceph test
network contains 6 osds, 3 mons, 3 mds, 1rgw 1mgr. ubuntu 64 bit
14.04/16.04 mix.

file system is degraded. Are there procedures how to get this back in operation?

On Tue, Sep 5, 2017 at 6:33 AM, Sage Weil <sweil@xxxxxxxxxx> wrote:
> On Mon, 4 Sep 2017, Two Spirit wrote:
>> Thanks for the info. I'm stumped what to do right now to get back to
>> an operation cluster -- still trying to find documentation on how to
>> recover.
>>
>>
>> 1) I have not yet modified any CRUSH rules from the defaults. I have
>> one ubuntu 14.04 OSD in the mix, and I had to set "ceph osd crush
>> tunables legacy" just to get it to work.
>>
>> 2) I have not yet implemented any Erasure Code pool. That is probably
>> one of the next tests I was going to do.  I'm still testing with basic
>> replication.
>
> Can you attach 'ceph health detail', 'ceph osd crush dump', and 'ceph osd
> dump'?
>
>> The degraded data redundancy seems to be stuck and not reducing
>> anymore. If I manually clear [if this is even possible] the 1 pg
>> undersized, should my degraded filesystem go back online?
>
> The problem is likely the 1 unfound object.  Are there any OSDs that are
> down that failed recently?  (Try 'ceph osd tree down' to see a simple
> summary.)
>
> sage
>
>
>>
>> On Mon, Sep 4, 2017 at 2:05 AM, John Spray <jspray@xxxxxxxxxx> wrote:
>> > On Sun, Sep 3, 2017 at 2:14 PM, Two Spirit <twospirit6905@xxxxxxxxx> wrote:
>> >> Setup: luminous on
>> >> Ubuntu 14.04/16.04 mix. 5 OSD. all up. 3 or 4 mds, 3mon,cephx
>> >> rebooting all 6 ceph systems did not clear the problem. Failure
>> >> occurred within 6 hours of start of test.
>> >> similar stress test with 4OSD,1MDS,1MON,cephx worked fine.
>> >>
>> >>
>> >> stress test
>> >> # cp * /mnt/cephfs
>> >>
>> >> # ceph -s
>> >>     health: HEALTH_WARN
>> >>             1 filesystem is degraded
>> >>             crush map has straw_calc_version=0
>> >>             1/731529 unfound (0.000%)
>> >>             Degraded data redundancy: 22519/1463058 objects degraded
>> >> (1.539%), 2 pgs unclean, 2 pgs degraded, 1 pg undersized
>> >>
>> >>   services:
>> >>     mon: 3 daemons, quorum xxx233,xxx266,xxx272
>> >>     mgr: xxx266(active)
>> >>     mds: cephfs-1/1/1 up  {0=xxx233=up:replay}, 3 up:standby
>> >>     osd: 5 osds: 5 up, 5 in
>> >>     rgw: 1 daemon active
>> >
>> > Your MDS is probably stuck in the replay state because it can't read
>> > from one of your degraded PGs.  Given that you have all your OSDs in,
>> > but one of your PGs is undersized (i.e. is short on OSDs), I would
>> > guess that something is wrong with your choice of CRUSH rules or EC
>> > config.
>> >
>> > John
>> >
>> >>
>> >> # ceph mds dump
>> >> dumped fsmap epoch 590
>> >> fs_name cephfs
>> >> epoch   589
>> >> flags   c
>> >> created 2017-08-24 14:35:33.735399
>> >> modified        2017-08-24 14:35:33.735400
>> >> tableserver     0
>> >> root    0
>> >> session_timeout 60
>> >> session_autoclose       300
>> >> max_file_size   1099511627776
>> >> last_failure    0
>> >> last_failure_osd_epoch  1573
>> >> compat  compat={},rocompat={},incompat={1=base v0.20,2=client
>> >> writeable ranges,3=default file layouts on dirs,4=dir inode in
>> >> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
>> >> omap,8=file layout v2}
>> >> max_mds 1
>> >> in      0
>> >> up      {0=579217}
>> >> failed
>> >> damaged
>> >> stopped
>> >> data_pools      [5]
>> >> metadata_pool   6
>> >> inline_data     disabled
>> >> balancer
>> >> standby_count_wanted    1
>> >> 579217: x.x.x.233:6804/1176521332 'xxx233' mds.0.589 up:replay seq 2
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
Attachment:
ceph_health_detail.out

Description: Binary data
Attachment:
ceph_osd_crush_dump.out

Description: Binary data
Attachment:
ceph_osd_dump.out

Description: Binary data
Attachment:
ceph_osd_tree_down.out

Description: Binary data