slow ops on one osd makes all my buckets unavailable

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

My harbor registry uses ceph object storage to save the images. But
I couldn't pull/push images from harbor a few moments ago. Ceph was
in warning health status in the same time.

The cluster just had a warning message said that osd.24 has slow ops.
I check the ceph-osd.24.log, and showed as below:























*2020-07-28 19:01:40.599 7f907a39c700 -1 osd.24 4324 get_health_metrics
reporting 1 slow ops, oldest is osd_op(client.166289.0:34144787 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]
ondisk+write+known_if_redirected e4324)2020-07-28 19:01:41.558 7f907a39c700
-1 osd.24 4324 get_health_metrics reporting 1 slow ops, oldest is
osd_op(client.166289.0:34144787 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]
ondisk+write+known_if_redirected e4324)2020-07-28 19:01:42.579 7f907a39c700
-1 osd.24 4324 get_health_metrics reporting 1 slow ops, oldest is
osd_op(client.166289.0:34144787 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]
ondisk+write+known_if_redirected e4324)2020-07-28 19:01:43.566 7f907a39c700
-1 osd.24 4324 get_health_metrics reporting 1 slow ops, oldest is
osd_op(client.166289.0:34144787 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]
ondisk+write+known_if_redirected e4324)2020-07-28 19:01:44.588 7f907a39c700
-1 osd.24 4324 get_health_metrics reporting 2 slow ops, oldest is
osd_op(client.166289.0:34144787 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]
ondisk+write+known_if_redirected e4324)2020-07-28 19:01:45.627 7f907a39c700
-1 osd.24 4324 get_health_metrics reporting 2 slow ops, oldest is
osd_op(client.166289.0:34144787 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]
ondisk+write+known_if_redirected e4324)2020-07-28 19:01:46.674 7f907a39c700
-1 osd.24 4324 get_health_metrics reporting 2 slow ops, oldest is
osd_op(client.166289.0:34144787 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[]
ondisk+write+known_if_redirected e4324)2020-07-28 19:01:47.701 7f907a39c700
-1 osd.24 4324 get_health_metrics reporting 1 slow ops, oldest is
osd_op(client.166289.0:34144852 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.bucket_list] snapc 0=[] ondisk+read+known_if_redirected
e4324)2020-07-28 19:01:48.729 7f907a39c700 -1 osd.24 4324
get_health_metrics reporting 1 slow ops, oldest is
osd_op(client.166289.0:34144852 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.bucket_list] snapc 0=[] ondisk+read+known_if_redirected
e4324)2020-07-28 19:01:49.729 7f907a39c700 -1 osd.24 4324
get_health_metrics reporting 2 slow ops, oldest is
osd_op(client.166289.0:34144852 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.bucket_list] snapc 0=[] ondisk+read+known_if_redirected
e4324)2020-07-28 19:01:50.889 7f907a39c700 -1 osd.24 4324
get_health_metrics reporting 2 slow ops, oldest is
osd_op(client.166289.0:34144852 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.bucket_list] snapc 0=[] ondisk+read+known_if_redirected e4324)......*
*......*

*2020-07-28 21:03:35.053 7f907a39c700 -1 osd.24 4324 get_health_metrics
reporting 46 slow ops, oldest is osd_op(client.166298.0:34904067 17.4f
17:f29d8b20:::.dir.313c8244-fe4d-4d46-bf9b-0e33e46be041.157033.2:head [call
rgw.bucket_list] snapc 0=[] ondisk+read+known_if_redirected e4324)*


After restarted osd.24, the cluster became health again, so was harbor.
What confuse me is that why my harbor couldn't get data from its bucket
while the log indicated that there was a client block on the other bucket.
I don't think some slow ops on an osd has any bad effect on all buckets.

Any idea is appreciated. Thanks
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux