Re: Successful Upgrade from 14.2.18 to 15.2.16

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/12/22 09:27, Dan van der Ster wrote:
Hi Stefan,

Thanks for the report. 9 hours fsck is the longest I've heard about
yet -- and on NVMe, that's quite surprising!

I believe Mark Schouten had to wait 3 days! before the fsck would finish. Although this might have been before optimizations in this area were made.


Which firmware are you running on those Samsung's? For a different
reason Mark and we have been comparing performance of that drive
between what's in his lab vs what we have in our data centre. We have
no obvious perf issues running EDA5702Q; Mark has some issue with the
Quincy RC running FW EDA53W0Q. I'm not sure if it's related, but worth
checking...

We have mainly EDA5402Q running. We have been running EDA5202Q before without issues. One OSD recently replaced came with EDA5702Q.


In any case, I'm also surprised you decided to drain the boxes before
fsck. Wouldn't 9 hours of down osds, with noout set, going to be less
invasive?

Yes, less invasive, but more risk. Note that even an "online" fsck does not mean that the OSDs are ONLINE: they aren't. So if a disk in some other failure domain decides to die, it has availability impact (min_size=2). Besides that, we believe that the slow ops we sometimes see have their origin in the past (consolidating all cephfs metadata on 3 NVMe nodes and back to all nodes again). So by re-provisioning the OSDs we hope to get rid of them as well

Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux