OSD blocked every request until a peer came online

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I'm running a 3 nodes cluster with 6 OSDs in each node. I'm using two types of pools, size 3 and size 2 and min_size 1, with the node being the failure domain. I've stopped every OSD in a single node to make some maintenance which left the cluster in a degraded, but operational state. For yet unknown reasons one of the OSDs still running restarted and a few PGs became down+peering. I guess this is normal, because those PGs were in a size 2 pool and their replica must have been on the node which was in maintenance so the restarting OSD couldn't get a peer to check the contents so it wasn't elected as primary. But, the interesting thing was that every request hitting that OSD was blocked, even for PGs which had peers on the third, fully operational node. In the OSD's logs I was blocked requests rising in numbers and delay. After bringing back the node from maintenance the restarted OSD found its peers and everything went back to normal.

My question is that is it normal, that an OSD blocks every request if a few of its PGs are down+peering? I thought only those requests would be blocked that tries to hit the downed PGs. By the way, I'm running 0.87.

Best regards,
Mate
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux