Re: Kernel RBD hang on OSD Failure

Tom Christensen <pavera@xxxxxxxxx> · Thu, 17 Dec 2015 10:19:10 -0700

I've just checked 1072 and 872, they both look the same, a single op for the object in question, in retry+read state, appears to be retrying forever.

On Thu, Dec 17, 2015 at 10:05 AM, Tom Christensen <pavera@xxxxxxxxx> wrote:
I had already nuked the previous hang, but we have another one:

osdc output:
70385   osd853  5.fb666328      rbd_data.36f804a163d632a.00000000000370ff               read
11024940        osd1072 5.f438406c      rbd_id.volume-44c74bb5-14f8-4279-b44f-8e867248531b              call
11241684        osd872  5.175f624d      rbd_id.volume-3e068bc7-75eb-4504-b109-df851a787f89              call
11689088        osd685  5.1fc9acd5      rbd_header.36f804a163d632a      442390'5605610926112768 watch

#ceph osd map rbd rbd_data.36f804a163d632a.00000000000370ff

osdmap e1309560 pool 'rbd' (5) object 'rbd_data.36f804a163d632a.00000000000370ff' -> pg 5.fb666328 (5.6328) -> up ([853,247,265], p853) acting ([853,247,265], p853)

As to the output of osd.853 ops, objecter_requests, dump_ops_in_flight, dump_historic_ops.. I see a single request in the output of osd.853 ops:
{
        "ops": [
            {
                "description": "osd_op(client.58016244.1:70385 rbd_data.36f804a163d632a.00000000000370ff [read 7274496~4096] 5.fb666328 RETRY=1 retry+read e1309006)",
                "initiated_at": "2015-12-17 10:03:35.360401",
                "age": 0.000503,
                "duration": 0.000233,
                "type_data": [
                    "reached pg",
                    {
                        "client": "client.58016244",
                        "tid": 70385
                    },
                    [
                        {
                            "time": "2015-12-17 10:03:35.360401",
                            "event": "initiated"
                        },
                        {
                            "time": "2015-12-17 10:03:35.360635",
                            "event": "reached_pg"
                        }
                    ]
                ]
            }
        ],
        "num_ops": 1

    }

The other commands either return nothing (ops_in_flight, objecter_requests) or in the case of historic ops, it returns 20 ops (thats what its set to keep), but none of them are this request or reference this object.  It seems this read is just retrying forever?

On Sat, Dec 12, 2015 at 12:10 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
On Sat, Dec 12, 2015 at 6:37 PM, Tom Christensen <pavera@xxxxxxxxx> wrote:

> We had a kernel map get hung up again last night/this morning.  The rbd is

> mapped but unresponsive, if I try to unmap it I get the following error:

> rbd: sysfs write failed

> rbd: unmap failed: (16) Device or resource busy

>

> Now that this has happened attempting to map another RBD fails, using lsblk

> fails as well, both of these tasks just hang forever.

>

> We have 1480 OSDs in the cluster so posting the osdmap seems excessive,

> however here is the beginning (didn't change in 5 runs):

> root@wrk-slc-01-02:~# cat

> /sys/kernel/debug/ceph/f3b7f409-e061-4e39-b4d0-ae380e29ae7e.client55440310/osdmap

> epoch 1284256

> flags

> pool 0 pg_num 2048 (2047) read_tier -1 write_tier -1

> pool 1 pg_num 512 (511) read_tier -1 write_tier -1

> pool 3 pg_num 2048 (2047) read_tier -1 write_tier -1

> pool 4 pg_num 512 (511) read_tier -1 write_tier -1

> pool 5 pg_num 32768 (32767) read_tier -1 write_tier -1

>

> Here is osdc output, it is not changed after 5 runs:

>

> root@wrk-slc-01-02:~# cat

> /sys/kernel/debug/ceph/f3b7f409-e061-4e39-b4d0-ae380e29ae7e.client55440310/osdc

> 93835   osd1206 5.6841959c      rbd_data.34df3ac703ced61.0000000000001dff

> read

> 9065810 osd1382 5.a50fa0ea      rbd_header.34df3ac703ced61

> 474103'5506530325561344 watch

> root@wrk-slc-01-02:~# cat

> /sys/kernel/debug/ceph/f3b7f409-e061-4e39-b4d0-ae380e29ae7e.client55440310/osdc

> 93835   osd1206 5.6841959c      rbd_data.34df3ac703ced61.0000000000001dff

> read

> 9067286 osd1382 5.a50fa0ea      rbd_header.34df3ac703ced61

> 474103'5506530325561344 watch

> root@wrk-slc-01-02:~# cat

> /sys/kernel/debug/ceph/f3b7f409-e061-4e39-b4d0-ae380e29ae7e.client55440310/osdc

> 93835   osd1206 5.6841959c      rbd_data.34df3ac703ced61.0000000000001dff

> read

> 9067831 osd1382 5.a50fa0ea      rbd_header.34df3ac703ced61

> 474103'5506530325561344 watch

> root@wrk-slc-01-02:~# ls /dev/rbd/rbd

> none  volume-daac5f12-e39b-4d64-a4fa-86c810aeb72d

> volume-daac5f12-e39b-4d64-a4fa-86c810aeb72d-part1

> root@wrk-slc-01-02:~# rbd info volume-daac5f12-e39b-4d64-a4fa-86c810aeb72d

> rbd image 'volume-daac5f12-e39b-4d64-a4fa-86c810aeb72d':

>         size 61439 MB in 7680 objects

>         order 23 (8192 kB objects)

>         block_name_prefix: rbd_data.34df3ac703ced61

>         format: 2

>         features: layering

>         flags:

>         parent:

> rbd/volume-93d9a102-260e-4500-b87d-9696c7fc2b67@snapshot-9ba998b6-ca57-40dd-8895-265023132e99

>         overlap: 61439 MB

>

> ceph status indicates the current osdmap epoch

> osdmap e1284866: 1480 osds: 1480 up, 1480 in

> pgmap v10231386: 37888 pgs, 5 pools, 745 TB data, 293 Mobjects

>

> root@wrk-slc-01-02:~# uname -r

> 3.19.0-25-generic

>

> So, the kernel driver is some 600 epochs behind current.  This does seem to

> be load related as we've been running 4 different kernels in our clients in

> our test environment and have not been able to recreate it there in a little

> over a week, however our production environment has had 2 of these hangs in

> the last 4 days.  Unfortunately I wasn't able to grab data from the first

> one.

If you haven't already nuked it, what's the output of:

$ ceph osd map <pool name of pool with id 5>

rbd_data.34df3ac703ced61.0000000000001dff

$ ceph osd map <pool name of pool with id 5> rbd_header.34df3ac703ced61

$ ceph daemon osd.1206 ops

$ ceph daemon osd.1206 objecter_requests

$ ceph daemon osd.1206 dump_ops_in_flight

$ ceph daemon osd.1206 dump_historic_ops

and repeat for osd.1382.

Thanks,

                Ilya

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com