Frequent slow requests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On a small cluster (3 nodes) I frequently have slow requests. When dumping the inflight ops from the hanging OSD, it seems it doesn't get a 'response' for one of the subops. The events always look like:

                "events": [
                    {
                        "time": "2018-06-14 07:10:07.256196",
                        "event": "initiated"
                    },
                    {
                        "time": "2018-06-14 07:10:07.256671",
                        "event": "queued_for_pg"
                    },
                    {
                        "time": "2018-06-14 07:10:07.256745",
                        "event": "reached_pg"
                    },
                    {
                        "time": "2018-06-14 07:10:07.256826",
                        "event": "started"
                    },
                    {
                        "time": "2018-06-14 07:10:07.256924",
                        "event": "waiting for subops from 18,20"
                    },
                    {
                        "time": "2018-06-14 07:10:07.263769",
                        "event": "op_commit"
                    },
                    {
                        "time": "2018-06-14 07:10:07.263775",
                        "event": "op_applied"
                    },
                    {
                        "time": "2018-06-14 07:10:07.269989",
                        "event": "sub_op_commit_rec from 18"
                    }
                 ]

The OSD id's are not the same. Looking at osd.20, the OSD process runs, it accepts requests ('ceph tell osd.20 bench' runs fine). When I restart the process for the OSD, the requests is completed. I could not find any pattern on which OSD is too blame (always an other one) or one of the servers, it's also differs.

The cluster runs Ceph 7.5 with 'ceph version 12.2.5 (cad919881333ac92274171586c827e01f554a70a) luminous (stable)'. It's just a testcluster with very little activity. What could be a cause of an (replica)OSD not replying?

Regards,

Frank de Bot

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux