On 4/12/22 09:27, Dan van der Ster wrote:
Hi Stefan, Thanks for the report. 9 hours fsck is the longest I've heard about yet -- and on NVMe, that's quite surprising!
I believe Mark Schouten had to wait 3 days! before the fsck would finish. Although this might have been before optimizations in this area were made.
Which firmware are you running on those Samsung's? For a different reason Mark and we have been comparing performance of that drive between what's in his lab vs what we have in our data centre. We have no obvious perf issues running EDA5702Q; Mark has some issue with the Quincy RC running FW EDA53W0Q. I'm not sure if it's related, but worth checking...
We have mainly EDA5402Q running. We have been running EDA5202Q before without issues. One OSD recently replaced came with EDA5702Q.
In any case, I'm also surprised you decided to drain the boxes before fsck. Wouldn't 9 hours of down osds, with noout set, going to be less invasive?
Yes, less invasive, but more risk. Note that even an "online" fsck does not mean that the OSDs are ONLINE: they aren't. So if a disk in some other failure domain decides to die, it has availability impact (min_size=2). Besides that, we believe that the slow ops we sometimes see have their origin in the past (consolidating all cephfs metadata on 3 NVMe nodes and back to all nodes again). So by re-provisioning the OSDs we hope to get rid of them as well
Gr. Stefan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx