Re: hammer-0.94.5 + kernel-4.1.15 - cephfs stuck

Nikola Ciprich <nikola.ciprich@xxxxxxxxxxx> · Wed, 3 Feb 2016 11:32:18 +0100

Hello Gregory,

in the meantime, I managed to break it further :(

I tried getting rid of active+remapped pgs and got some undersized
instead.. nto sure whether this can be related..

anyways here's the status:

ceph -s
    cluster ff21618e-5aea-4cfe-83b6-a0d2d5b4052a
     health HEALTH_WARN
            3 pgs degraded
            2 pgs stale
            3 pgs stuck degraded
            1 pgs stuck inactive
            2 pgs stuck stale
            242 pgs stuck unclean
            3 pgs stuck undersized
            3 pgs undersized
            recovery 65/3374343 objects degraded (0.002%)
            recovery 186187/3374343 objects misplaced (5.518%)
            mds0: Behind on trimming (155/30)
     monmap e3: 3 mons at {remrprv1a=10.0.0.1:6789/0,remrprv1b=10.0.0.2:6789/0,remrprv1c=10.0.0.3:6789/0}
            election epoch 522, quorum 0,1,2 remrprv1a,remrprv1b,remrprv1c
     mdsmap e342: 1/1/1 up {0=remrprv1c=up:active}, 2 up:standby
     osdmap e4385: 21 osds: 21 up, 21 in; 238 remapped pgs
      pgmap v18679192: 1856 pgs, 7 pools, 4223 GB data, 1103 kobjects
            12947 GB used, 22591 GB / 35538 GB avail
            65/3374343 objects degraded (0.002%)
            186187/3374343 objects misplaced (5.518%)
                1612 active+clean
                 238 active+remapped
                   3 active+undersized+degraded
                   2 stale+active+clean
                   1 creating
  client io 0 B/s rd, 40830 B/s wr, 17 op/s

> What's the full output of "ceph -s"? Have you looked at the MDS admin
> socket at all — what state does it say it's in?

[root@remrprv1c ceph]# ceph --admin-daemon /var/run/ceph/ceph-mds.remrprv1c.asok dump_ops_in_flight
{
    "ops": [
        {
            "description": "client_request(client.3052096:83 getattr Fs #10000000288 2016-02-03 10:10:46.361591 RETRY=1)",
            "initiated_at": "2016-02-03 10:23:25.791790",
            "age": 3963.093615,
            "duration": 9.519091,
            "type_data": [
                "failed to rdlock, waiting",
                "client.3052096:83",
                "client_request",
                {
                    "client": "client.3052096",
                    "tid": 83
                },
                [
                    {
                        "time": "2016-02-03 10:23:25.791790",
                        "event": "initiated"
                    },
                    {
                        "time": "2016-02-03 10:23:35.310881",
                        "event": "failed to rdlock, waiting"
                    }
                ]
            ]
        }
    ],
    "num_ops": 1
}

seems there's some lock stuck here.. 

Killing stuck client (it's postgres trying to access cephfs file
doesn't help..)

> -Greg
> 
> >
> > My question here is:
> >
> > 1) is there some known issue with hammer 0.94.5 or kernel 4.1.15
> > which could lead to cephfs hangs?
> >
> > 2) what can I do to debug what is the cause of this hang?
> >
> > 3) is there a way to recover this without hard resetting
> > node with hung cephfs mount?
> >
> > If I could provide more information, please let me know
> >
> > I'd really appreciate any help
> >
> > with best regards
> >
> > nik
> >
> >
> >
> >
> > --
> > -------------------------------------
> > Ing. Nikola CIPRICH
> > LinuxBox.cz, s.r.o.
> > 28.rijna 168, 709 00 Ostrava
> >
> > tel.:   +420 591 166 214
> > fax:    +420 596 621 273
> > mobil:  +420 777 093 799
> > www.linuxbox.cz
> >
> > mobil servis: +420 737 238 656
> > email servis: servis@xxxxxxxxxxx
> > -------------------------------------
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 

-- 
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@xxxxxxxxxxx
-------------------------------------
Attachment:
pgpjNNPpAVMyb.pgp

Description: PGP signature
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com