Re: Help with file system with failed mds daemon

Bryan Banister <bbanister@xxxxxxxxxxxxxxx> · Tue, 22 Aug 2017 19:49:28 +0000

Hi John,

Seems like you're right... strange that it seemed to work with only one mds before I shut the cluster down.  Here is the `ceph fs get` output for the two file systems:

[root@carf-ceph-osd15 ~]# ceph fs get carf_ceph_kube01
Filesystem 'carf_ceph_kube01' (2)
fs_name carf_ceph_kube01
epoch   22
flags   8
created 2017-08-21 12:10:57.948579
modified        2017-08-21 12:10:57.948579
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
last_failure    0
last_failure_osd_epoch  1218
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
 is stored in omap,8=file layout v2}
max_mds 1
in      0
up      {}
failed  0
damaged
stopped
data_pools      [23]
metadata_pool   24
inline_data     disabled
balancer
standby_count_wanted    0
[root@carf-ceph-osd15 ~]#
[root@carf-ceph-osd15 ~]# ceph fs get carf_ceph02
Filesystem 'carf_ceph02' (1)
fs_name carf_ceph02
epoch   26
flags   8
created 2017-08-18 14:20:50.152054
modified        2017-08-18 14:20:50.152054
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
last_failure    0
last_failure_osd_epoch  1198
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag
 is stored in omap,8=file layout v2}
max_mds 1
in      0
up      {0=474299}
failed
damaged
stopped
data_pools      [21]
metadata_pool   22
inline_data     disabled
balancer
standby_count_wanted    0
474299: 7.128.13.69:6800/304042158 'carf-ceph-osd15' mds.0.23 up:active seq 5

I also looked into trying to specify the mds_namespace option to the mount operation (http://docs.ceph.com/docs/master/cephfs/kernel/) but that doesn’t seem to be valid:
[ceph-admin@carf-ceph-osd04 ~]$ sudo mount -t ceph carf-ceph-osd15:6789:/ /mnt/carf_ceph02/ -o mds_namespace=carf_ceph02,name=cephfs.k8test,secretfile=k8test.secret
mount error 22 = Invalid argument

Thanks,
-Bryan

-----Original Message-----

From: John Spray [mailto:jspray@xxxxxxxxxx] 

Sent: Tuesday, August 22, 2017 11:18 AM

To: Bryan Banister <bbanister@xxxxxxxxxxxxxxx>

Cc: ceph-users@xxxxxxxxxxxxxx

Subject: Re: [ceph-users] Help with file system with failed mds daemon

Note: External Email
-------------------------------------------------

On Tue, Aug 22, 2017 at 4:58 PM, Bryan Banister
<bbanister@xxxxxxxxxxxxxxx> wrote:
> Hi all,
> 
> 
> 
> I’m still new to ceph and cephfs.  Trying out the multi-fs configuration on
> at Luminous test cluster.  I shutdown the cluster to do an upgrade and when
> I brought the cluster back up I now have a warnings that one of the file
> systems has a failed mds daemon:
> 
> 
> 
> 2017-08-21 17:00:00.000081 mon.carf-ceph-osd15 [WRN] overall HEALTH_WARN 1
> filesystem is degraded; 1 filesystem is have a failed mds daemon; 1 pools
> have many more objects per pg than average; application not enabled on 9
> pool(s)
> 
> 
> 
> I tried restarting the mds service on the system and it doesn’t seem to
> indicate any problems:
> 
> 2017-08-21 16:13:40.979449 7fffed8b0700  1 mds.0.20 shutdown: shutting down
> rank 0
> 
> 2017-08-21 16:13:41.012167 7ffff7fde1c0  0 set uid:gid to 167:167
> (ceph:ceph)
> 
> 2017-08-21 16:13:41.012180 7ffff7fde1c0  0 ceph version 12.1.4
> (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc), process (unknown),
> pid 16656
> 
> 2017-08-21 16:13:41.014105 7ffff7fde1c0  0 pidfile_write: ignore empty
> --pid-file
> 
> 2017-08-21 16:13:45.541442 7ffff10b7700  1 mds.0.23 handle_mds_map i am now
> mds.0.23
> 
> 2017-08-21 16:13:45.541449 7ffff10b7700  1 mds.0.23 handle_mds_map state
> change up:boot --> up:replay
> 
> 2017-08-21 16:13:45.541459 7ffff10b7700  1 mds.0.23 replay_start
> 
> 2017-08-21 16:13:45.541466 7ffff10b7700  1 mds.0.23  recovery set is
> 
> 2017-08-21 16:13:45.541475 7ffff10b7700  1 mds.0.23  waiting for osdmap 1198
> (which blacklists prior instance)
> 
> 2017-08-21 16:13:45.565779 7fffea8aa700  0 mds.0.cache creating system inode
> with ino:0x100
> 
> 2017-08-21 16:13:45.565920 7fffea8aa700  0 mds.0.cache creating system inode
> with ino:0x1
> 
> 2017-08-21 16:13:45.571747 7fffe98a8700  1 mds.0.23 replay_done
> 
> 2017-08-21 16:13:45.571751 7fffe98a8700  1 mds.0.23 making mds journal
> writeable
> 
> 2017-08-21 16:13:46.542148 7ffff10b7700  1 mds.0.23 handle_mds_map i am now
> mds.0.23
> 
> 2017-08-21 16:13:46.542149 7ffff10b7700  1 mds.0.23 handle_mds_map state
> change up:replay --> up:reconnect
> 
> 2017-08-21 16:13:46.542158 7ffff10b7700  1 mds.0.23 reconnect_start
> 
> 2017-08-21 16:13:46.542161 7ffff10b7700  1 mds.0.23 reopen_log
> 
> 2017-08-21 16:13:46.542171 7ffff10b7700  1 mds.0.23 reconnect_done
> 
> 2017-08-21 16:13:47.543612 7ffff10b7700  1 mds.0.23 handle_mds_map i am now
> mds.0.23
> 
> 2017-08-21 16:13:47.543616 7ffff10b7700  1 mds.0.23 handle_mds_map state
> change up:reconnect --> up:rejoin
> 
> 2017-08-21 16:13:47.543623 7ffff10b7700  1 mds.0.23 rejoin_start
> 
> 2017-08-21 16:13:47.543638 7ffff10b7700  1 mds.0.23 rejoin_joint_start
> 
> 2017-08-21 16:13:47.543666 7ffff10b7700  1 mds.0.23 rejoin_done
> 
> 2017-08-21 16:13:48.544768 7ffff10b7700  1 mds.0.23 handle_mds_map i am now
> mds.0.23
> 
> 2017-08-21 16:13:48.544771 7ffff10b7700  1 mds.0.23 handle_mds_map state
> change up:rejoin --> up:active
> 
> 2017-08-21 16:13:48.544779 7ffff10b7700  1 mds.0.23 recovery_done --
> successful recovery!
> 
> 2017-08-21 16:13:48.544924 7ffff10b7700  1 mds.0.23 active_start
> 
> 2017-08-21 16:13:48.544954 7ffff10b7700  1 mds.0.23 cluster recovered.
> 
> 
> 
> This seems like an easy problem to fix.  Any help is greatly appreciated!

I wonder if you have two filesystems but only one MDS?  Ceph will then
think that the second filesystem "has a failed MDS" because there
isn't an MDS online to service it.

John

> 
> -Bryan
> 
> 
> ________________________________
> 
> Note: This email is for the confidential use of the named addressee(s) only
> and may contain proprietary, confidential or privileged information. If you
> are not the intended recipient, you are hereby notified that any review,
> dissemination or copying of this email is strictly prohibited, and to please
> notify the sender immediately and destroy this email and any attachments.
> Email transmission cannot be guaranteed to be secure or error-free. The
> Company, therefore, does not make any guarantees as to the completeness or
> accuracy of this email or any attachments. This email is for informational
> purposes only and does not constitute a recommendation, offer, request or
> solicitation of any kind to buy, sell, subscribe, redeem or perform any type
> of transaction of a financial product.
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

Note: This email is for the confidential use of the named addressee(s) only and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you are hereby notified that any review, dissemination or copying of this
 email is strictly prohibited, and to please notify the sender immediately and destroy this email and any attachments. Email transmission cannot be guaranteed to be secure or error-free. The Company, therefore, does not make any guarantees as to the completeness
 or accuracy of this email or any attachments. This email is for informational purposes only and does not constitute a recommendation, offer, request or solicitation of any kind to buy, sell, subscribe, redeem or perform any type of transaction of a financial
 product.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com