osd is immidietly down and uses CPU full.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Servers: 6 (include 7osds) total 42osdsl
OS: Centos7
Ceph: 10.2.5

Hi, everyone

The cluster is used for VM image storage and object storage.
And I have a bucket which has more than 20 million objects.

Now, I have a problem that cluster blocks operation.

Suddenly cluster blocked operations, then VMs can't read disk.
After a few hours, osd.1 was down.

There is no disk fail messages in dmesg.
And no error is in smartctl -a /dev/sde.

I tried to wake up osd.1, but osd.1 is down soon.
Just after re-waking up osd.1, VM can access to the disk.
But osd.1 always uses 100% CPU, then cluster marked osd.1 down and the osd was dead by suicide timeout.

I found that the osdmap epoch of osd.1 is different from other one.
So I think osd.1 was dead.


Question.
(1) Why does the epoch of osd.1 differ from other osds ones ?
I checked all osds oldest_map and newest_map by ~ceph daemon osd.X status~
 All osd's ecpoch are same number except osd.1

(2) Why does osd.1 use CPU full?

 After the cluster marked osd.1 down, osd.1 keeps up busy.
 When I execute "ceph tell osd.1 injectargs --debug-ms 5/1", osd.1 doesn't answer.


Thank you.
--
Makito
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux