Re: near 300 pg per osd make cluster very very unstable?

lin zhou <hnuzhoulin2@xxxxxxxxx> · Sun, 23 Jun 2019 07:54:21 +0800

Add more details about the newest chain reaction fault.
1.one osd nodes occur osd_op_tp and osd_tp timed out (the reason is unknown)
2.then many osd no reply occur, from other nodes and the first fault node.
3.then large number osd wrongly mark down, starting flapping, large peer
4.then the first fault node hang and can not ssh login ,when I set osd
nodown and try to set the the osds in the fault node to down to
recover
5.then peer pg changed
6.then more nodes hang, begin disappear monitor data, can not ssh
7.all my vms hang

all this is just after I ceph osd in.

lin zhou <hnuzhoulin2@xxxxxxxxx> 于2019年6月23日周日 上午7:33写道：
>
> recently our ceph cluster very unstable, even replace a failed disk
> may trigger a chain reaction,  cause large quantities of osd been
> wrongly marked down.
> I am not sure if it is because we have near 300 pgs in each sas osds
> and small bigger than 300  pgs for ssd osd.
>
> from logs, it all starts from osd_op_tp timed out, then osd no reply,
> then large wrongly mark down.
>
> 1. 45 machines, each machine has 16 sas and 8 ssd, all file journal in
> the osd data dir.
> 2. use rbd in this cluster
> 3. 300+ compute node to hold vm
> 4. osd node current has a hundred thousand threads and fifty thousand
> established network connection.
> 5. dell R730xd, and dell say no hardware error log
>
> so someone else faces the same unstable problem or using 300+ pgs?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com