osd_pg_create causing slow requests in Nautilus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We've run into a problem on our test cluster this afternoon which is running Nautilus (14.2.2).  It seems that any time PGs move on the cluster (from marking an OSD down, setting the primary-affinity to 0, or by using the balancer), a large number of the OSDs in the cluster peg the CPU cores they're running on for a while which causes slow requests.  From what I can tell it appears to be related to slow peering caused by osd_pg_create() taking a long time.

This was seen on quite a few OSDs while waiting for peering to complete:

# ceph daemon osd.3 ops
{
    "ops": [
        {
            "description": "osd_pg_create(e179061 287.7a:177739 287.9a:177739 287.e2:177739 287.e7:177739 287.f6:177739 287.187:177739 287.1aa:177739 287.216:177739 287.306:177739 287.3e6:177739)",
            "initiated_at": "2019-08-27 14:34:46.556413",
            "age": 318.25234538000001,
            "duration": 318.25241895300002,
            "type_data": {
                "flag_point": "started",
                "events": [
                    {
                        "time": "2019-08-27 14:34:46.556413",
                        "event": "initiated"
                    },
                    {
                        "time": "2019-08-27 14:34:46.556413",
                        "event": "header_read"
                    },
                    {
                        "time": "2019-08-27 14:34:46.556299",
                        "event": "throttled"
                    },
                    {
                        "time": "2019-08-27 14:34:46.556456",
                        "event": "all_read"
                    },
                    {
                        "time": "2019-08-27 14:35:12.456901",
                        "event": "dispatched"
                    },
                    {
                        "time": "2019-08-27 14:35:12.456903",
                        "event": "wait for new map"
                    },
                    {
                        "time": "2019-08-27 14:40:01.292346",
                        "event": "started"
                    }
                ]
            }
        },
...snip...
        {
            "description": "osd_pg_create(e179066 287.7a:177739 287.9a:177739 287.e2:177739 287.e7:177739 287.f6:177739 287.187:177739 287.1aa:177739 287.216:177739 287.306:177739 287.3e6:177739)",
            "initiated_at": "2019-08-27 14:35:09.908567",
            "age": 294.900191001,
            "duration": 294.90068416899999,
            "type_data": {
                "flag_point": "delayed",
                "events": [
                    {
                        "time": "2019-08-27 14:35:09.908567",
                        "event": "initiated"
                    },
                    {
                        "time": "2019-08-27 14:35:09.908567",
                        "event": "header_read"
                    },
                    {
                        "time": "2019-08-27 14:35:09.908520",
                        "event": "throttled"
                    },
                    {
                        "time": "2019-08-27 14:35:09.908617",
                        "event": "all_read"
                    },
                    {
                        "time": "2019-08-27 14:35:12.456921",
                        "event": "dispatched"
                    },
                    {
                        "time": "2019-08-27 14:35:12.456923",
                        "event": "wait for new map"
                    }
                ]
            }
        }
    ],
    "num_ops": 6
}


That "wait for new map" message made us think something was getting hung up on the monitors, so we restarted them all without any luck.

I'll keep investigating, but so far my google searches aren't pulling anything up so I wanted to see if anyone else is running into this?

Thanks,
Bryan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux