Hi Bob, deep scrub on HDDs has, at least in newer versions, a negligible effect on performance even with default settings (with op_queue wpq and cut-off high). You might be affected by a combination of two issues: a change of OSD meta that happens with bcache devices on reboot and cache promotion on deep scrub. About the bcache OSD meta change: maybe you are affected by the same observation that is discussed in this thread "Ceph Bluestore tweaks for Bcache", namely, that an OSD with bcache gets created with rotational=1 but after reboot it turns into rotational=0 with serious performance implications. Cache promotion: you really want to avoid data to be promoted to the cache device on srcub. I believe the only way to do this is not to promote on first read. I'm not sure if this can be configured in bcache and how. If scrubbing goes to cache, it will not only have a high latency, it will also wipe the interesting data out of the cache. Deep scrub is a read-once operation and should in any case come directly from the HDD disk, because that's where you want to check for read errors. The combination of the two above is probably horrible for performance. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Broccoli Bob <brockolibob@xxxxxxxxx> Sent: 07 February 2023 19:16:28 To: Anthony D'Atri Cc: ceph-users@xxxxxxx Subject: Re: Deep scrub debug option Hi Anthony, Thank you for posting that link. I can see there that the description for that option is:'Inject an expensive sleep during deep scrub IO to make it easier to induce preemption' Does this mean that it is supposed to be used in conjunction with the 'osd scrub max pre-emption' option? I am also still unclear if this option is strictly meant for developers to debug things or if it's ok to run on a production cluster. > Are your OSDs HDDs? Using EC? I am using bcache devices for OSDs so each OSD is a HDD that is fronted by a cache SSD. All pools are replicated, no EC. > How many deep scrubs do you have running in parallel? Assuming more than one, you could increase osd_deep_scrub_interval to spread them out over time. I have only 1 deep scrub running per osd ( osd max scrubs = 1). I have already increased the deep scrub interval which does help to spread them out, but when they do eventually happen there is still a noticeable performance impact. On Tuesday, 7 February 2023 at 14:54:53 GMT, Anthony D'Atri <aad@xxxxxxxxxxxxxx> wrote: Documented here: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fceph%2Fceph%2Fblob%2F9754cafc029e1da83f5ddd4332b69066fe6b3ffb%2Fsrc%2Fcommon%2Foptions%2Fglobal.yaml.in%23L3202&data=05%7C01%7Cfrans%40dtu.dk%7C8211624a7c444b0ded1308db0937d753%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638113907658408888%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wIQRiUlfzkaIzZKJunkbmzUj97owXKtsp0%2F%2FyMMDPjE%3D&reserved=0 Introduced back here with a bunch of other scrub tweaks: https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fceph%2Fceph%2Fpull%2F18971%2Ffiles&data=05%7C01%7Cfrans%40dtu.dk%7C8211624a7c444b0ded1308db0937d753%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638113907658408888%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=iVue6E8vY%2BIHxtysXF85xuMw3WZMpGglUep8NiHPSwc%3D&reserved=0 Are your OSDs HDDs? Using EC? How many deep scrubs do you have running in parallel? Assuming more than one, you could increase osd_deep_scrub_interval to spread them out over time. > On Feb 7, 2023, at 05:58, Broccoli Bob <brockolibob@xxxxxxxxx> wrote: > > I have been running a ceph cluster for a while and one of the main things that impacts performance is deep-scrubbing. I would like to limit this as much as possible and have tried the below options to do this: > > osd scrub sleep = 1 # Time to sleep before scrubbing next group of chunks > osd scrub chunk max = 1 # Maximum number of chunks to scrub during a single operation > osd scrub chunk min = 1 # Minimum number of chunks to scrub during a single operation > osd scrub max pre-emption = 30 # Maximum number of times ceph pre-empts a deep scrub due to a client operation blocking it > osd client op priority = 63 # The priority of client operations > osd requested scrub priority = 1 # The priority of administrator requested scrubs > osd scrub priority = 1 # The priority of scheduled scrubs > > These options combined do slow down scrubbing and make it easier on client IO, but there is still a performance impact when they happen. One other option I have found is: > ‘osd debug deep scrub sleep’ > > Setting this option to even 0.1 will have an immediate and large effect (I have seen nodes that were scrubbing at 50MB/s to down to <5MB/s). The effect is larger than all the above options combined and is exactly what I want, however there is no documentation about this option and I don’t know the full impact of configuring it. It also spams the logs with messages saying that the deep scrub is sleeping for x seconds. So I would like to know: > Is it safe to have the debug deep scrub option turned on? If so is there a way to stop it from spamming the logs? > If it is not safe are there any other options available to us to limit the impact of deep scrubbing? > Thank you and Regards > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> > To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx