Re: Ceph mds is stuck in creating status

John Spray <jspray@xxxxxxxxxx> · Mon, 15 Oct 2018 17:17:50 +0100

On Mon, Oct 15, 2018 at 4:24 PM Kisik Jeong <kisik.jeong@xxxxxxxxxxxx> wrote:
>
> Thank you for your reply, John.
>
> I  restarted my Ceph cluster and captured the mds logs.
>
> I found that mds shows slow request because some OSDs are laggy.
>
> I followed the ceph mds troubleshooting with 'mds slow request', but there is no operation in flight:
>
> root@hpc1:~/iodc# ceph daemon mds.hpc1 dump_ops_in_flight
> {
>     "ops": [],
>     "num_ops": 0
> }
>
> Is there any other reason that mds shows slow request? Thank you.

Those stuck requests seem to be stuck because they're targeting pools
that don't exist.  Has something strange happened in the history of
this cluster that might have left a filesystem referencing pools that
no longer exist?  Ceph is not supposed to permit removal of pools in
use by CephFS, but perhaps something went wrong.

Check out the "ceph osd dump --format=json-pretty" and "ceph fs dump
--format=json-pretty" outputs and how the pool ID's relate.  According
to those logs, data pool with ID 1 and metadata pool with ID 2 do not
exist.

John

> -Kisik
>
> 2018년 10월 15일 (월) 오후 11:43, John Spray <jspray@xxxxxxxxxx>님이 작성:
>>
>> On Mon, Oct 15, 2018 at 3:34 PM Kisik Jeong <kisik.jeong@xxxxxxxxxxxx> wrote:
>> >
>> > Hello,
>> >
>> > I successfully deployed Ceph cluster with 16 OSDs and created CephFS before.
>> > But after rebooting due to mds slow request problem, when creating CephFS, Ceph mds goes creating status and never changes.
>> > Seeing Ceph status, there is no other problem I think. Here is 'ceph -s' result:
>>
>> That's pretty strange.  Usually if an MDS is stuck in "creating", it's
>> because an OSD operation is stuck, but in your case all your PGs are
>> healthy.
>>
>> I would suggest setting "debug mds=20" and "debug objecter=10" on your
>> MDS, restarting it and capturing those logs so that we can see where
>> it got stuck.
>>
>> John
>>
>> > csl@hpc1:~$ ceph -s
>> >   cluster:
>> >     id:     1a32c483-cb2e-4ab3-ac60-02966a8fd327
>> >     health: HEALTH_OK
>> >
>> >   services:
>> >     mon: 1 daemons, quorum hpc1
>> >     mgr: hpc1(active)
>> >     mds: cephfs-1/1/1 up  {0=hpc1=up:creating}
>> >     osd: 16 osds: 16 up, 16 in
>> >
>> >   data:
>> >     pools:   2 pools, 640 pgs
>> >     objects: 7 objects, 124B
>> >     usage:   34.3GiB used, 116TiB / 116TiB avail
>> >     pgs:     640 active+clean
>> >
>> > However, CephFS still works in case of 8 OSDs.
>> >
>> > If there is any doubt of this phenomenon, please let me know. Thank you.
>> >
>> > PS. I attached my ceph.conf contents:
>> >
>> > [global]
>> > fsid = 1a32c483-cb2e-4ab3-ac60-02966a8fd327
>> > mon_initial_members = hpc1
>> > mon_host = 192.168.40.10
>> > auth_cluster_required = cephx
>> > auth_service_required = cephx
>> > auth_client_required = cephx
>> >
>> > public_network = 192.168.40.0/24
>> > cluster_network = 192.168.40.0/24
>> >
>> > [osd]
>> > osd journal size = 1024
>> > osd max object name len = 256
>> > osd max object namespace len = 64
>> > osd mount options f2fs = active_logs=2
>> >
>> > [osd.0]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.1]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.2]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.3]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.4]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.5]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.6]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.7]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.8]
>> > host = hpc9
>> > public_addr = 192.168.40.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.9]
>> > host = hpc10
>> > public_addr = 192.168.40.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.10]
>> > host = hpc9
>> > public_addr = 192.168.10.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.11]
>> > host = hpc10
>> > public_addr = 192.168.10.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.12]
>> > host = hpc9
>> > public_addr = 192.168.10.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.13]
>> > host = hpc10
>> > public_addr = 192.168.10.19
>> > cluster_addr = 192.168.40.19
>> >
>> > [osd.14]
>> > host = hpc9
>> > public_addr = 192.168.10.18
>> > cluster_addr = 192.168.40.18
>> >
>> > [osd.15]
>> > host = hpc10
>> > public_addr = 192.168.10.19
>> > cluster_addr = 192.168.40.19
>> >
>> > --
>> > Kisik Jeong
>> > Ph.D. Student
>> > Computer Systems Laboratory
>> > Sungkyunkwan University
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Kisik Jeong
> Ph.D. Student
> Computer Systems Laboratory
> Sungkyunkwan University
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com