pgs not active

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,
We had a situation where 1 drive failed at the same time as a node. This caused files in cephfs not not be readable and 'ceph status' to display the error message "pgs not active".

Our cluster is either 3 replicas or equivalent EC (k2m2). Eventually all the PGs became active and not data was lost.

So the question is why were the PGs not active?

I'm guessing these PGs were not active because the number of replicas was less than "min_size"? The min_size for the 3 replicated pool is two and because of the down node and the down drive for some PGs two replicas would have been missing. That makes sense. (For the EC pool the min_size is 3, same story but a little sadder.)

It would be great to get those PGs active sooner. Besides buying faster hardware, any suggestions?

I also have a couple questions:

If the replicas available are less than min_size, is reading of those PGs allowed? That seems to be a safe operation, but the min_size definition says "Sets the minimum number of replicas required for I/O" which implies reading, not just writing.

Does ceph prioritize repairing PGs which a client tries to access? It seems as though most of the files on our cluster are not not heavily accessed. If a client tried to access a specific file, it shouldn't take too long to repair the broken PGs if the other repairs are paused.

Thanks for your help!
C.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux