Re: Fwd: HW failure cause client IO drops

M Ranga Swami Reddy <swamireddy@xxxxxxxxx> · Tue, 16 Apr 2019 14:13:32 +0530

OSD processes/daemon running as is...So ceph not making those OSD down or out.But as battery failed, which leads temperature high, leads CPU utlization increased  - leads
OSD response time more, so that other OSDs failed to response on time..
causing the utter slow or no IO...

On Tue, Apr 16, 2019 at 12:23 PM Eugen Block <eblock@xxxxxx> wrote:
Good morning,

the OSDs are usually marked out after 10 minutes, that's when  

rebalancing starts. But the I/O should not drop during that time, this  

could be related to your pool configuration. If you have a replicated  

pool of size 3 and also set min_size to 3 the I/O would pause if a  

node or OSD fails. So more information about the cluster would help,  

can you share that?

ceph osd tree

ceph osd pool ls detail

Were all pools affected or just specific pools?

Regards,

Eugen

Zitat von M Ranga Swami Reddy <swamireddy@xxxxxxxxx>:

> Hello - Recevenlt we had an issue with storage node's battery failure,

> which cause ceph client IO dropped to '0' bytes. Means ceph cluster

> couldn't perform IO operations on the cluster till the node takes out. This

> is not expected from Ceph, as some HW fails, those respective OSDs should

> mark as out/down and IO should go as is..

>

> Please let me know if anyone seen the similar behavior and is this issue

> resolved?

>

> Thanks

> Swami

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com