Re: which mds server is damaged?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 14, 2017 at 2:49 AM, Two Spirit <twospirit6905@xxxxxxxxx> wrote:
> I don't think there is a Troubleshooting MDS section. Nice to have one.

If you google "ceph troubleshooting mds" you'll find that the first
hit is the cephfs troubleshooting page here:
docs.ceph.com/docs/master/cephfs/troubleshooting/

> I'm not sure how to read "cephfs-0/1/1" as to what the 3 digits are.
> It is unclear to me if I have 3 or 4 MDS. I thought i only setup 3,
> but I think it telling me I actually have 4.
> I can't figure out which server is the damaged MDS. The ceph mds dump
> said "damaged 0".
> Since "ceph -s' is saying I only have 1 damaged mds, I should be able
> to remove it.
>
> how do I go from mds.0 to a physical hostname?

The thing that's damaged is a logical mds rank (0), not a physical MDS
daemon.  What this is telling you is that there is some serious
corruption in your metadata pool that prevents that particular rank
from starting.

>
> $ ceph mds rmfailed 0 --yes-i-really-mean-it
> removed failed mds.1:0
>
> I thought I asked it to remove mds.0, but it looks like I'm removing mds.1

Two things:
 * You ignored the part where rmfailed tells you "WARNING: this can
make your filesystem inaccessible!".  Please don't just ignore
messages like that, they're there for a reason.  Fortunately in this
situation rmfailed did not actually change anything, because you
didn't have any "failed" MDS ranks, just a damaged one.
 * the "mds.1:0" syntax is telling you that it's rank 0 in a
filesystem with ID 1.

> $ ceph health detail
> MDS_DAMAGE 1 mds daemon damaged
>     fs cephfs mds.0 is damaged

So if I recall correctly, you had a system with some unfound/damaged
PGs.  I would strongly suspect that to be the underlying cause of the
damaged to your CephFS metadata, so unless you've got something
irreplaceable in your CephFS filesystem, just blow it away and create
a fresh one.

John

>   services
>     mds: cephfs-0/1/1 up , 3 up:standby, 1 damaged
>
> $ ceph mds dump
> dumped fsmap epoch 632
> fs_name cephfs
> epoch   632
> flags   d
> created 2017-08-24 14:35:33.735399
> modified        2017-08-24 14:35:33.735400
> tableserver     0
> root    0
> session_timeout 60
> session_autoclose       300
> max_file_size   1099511627776
> last_failure    0
> last_failure_osd_epoch  1828
> compat  compat={},rocompat={},incompat={1=base v0.20,2=client
> writeable ranges,3=default file layouts on dirs,4=dir inode in
> separate object,5=mds uses versioned encoding,6=dirfrag is stored in
> omap,8=file layout v2}
> max_mds 1
> in      0
> up      {}
> failed
> damaged 0
> stopped
> data_pools      [5]
> metadata_pool   6
> inline_data     disabled
> balancer
> standby_count_wanted    1
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux