On Thu, 29 Feb 2024 12:41:53 -0800 Tony Luck <tony.luck@xxxxxxxxx> wrote: > > Obviously can't talk about who was involved in this feature > > in it's definition, but I have strong confidence it will get implemented > > for reasons I can point at on a public list. > > a) There will be scrubbing on devices. > > b) It will need control (evidence for this is the BIOS controls mentioned below > > for equivalent main memory). > > c) Hotplug means that control must be done by OS driver (or via very fiddly > > pre hotplug hacks that I think we can all agree should not be necessary > > and aren't even an option on all platforms) > > d) No one likes custom solutions. > > This isn't a fancy feature with a high level of complexity which helps. Hi Tony, > > But how will users know what are appropriate scrubbing > parameters for these devices? > > Car analogy: Fuel injection systems on internal combustion engines > have tweakable controls. But no auto manufacturer wires them up to > a user accessible dashboad control. Good analogy - I believe performance tuning 3rd parties will change them for you. So the controls are used - be it not by every user. > > Back to computers: > > I'd expect the OEMs that produce memory devices to set appropriate > scrubbing rates based on their internal knowledge of the components > used in construction. Absolutely agree that they will set a default / baseline value, but reality is that 'everyone' (for the first few OEMs I googled) exposes tuning controls in their shipping BIOS menus to configure this because there are users who want to change it. I'd expect them to clamp the minimum scrub frequency to something that avoids them getting hardware returned on mass for reliability and the maximum at whatever ensures the perf is good enough that they sell hardware in the first place. I'd also expect a bios menu to allow cloud hosts etc to turn off exposing RAS2 or similar. > > What is the use case where some user would need to override these > parameters and scrub and a faster/slower rate than that set by the > manufacturer? Its a performance vs reliability trade off. If your larger scale architecture (many servers) requires a few nodes to be super stable you will pay pretty much any cost to keep them running. If a single node failure makes little or no difference, you'll happily crank this down (same with refresh) in order to save some power / get a small performance lift. Or if you care about latency tails, more than reliability you'll turn this off. For comedy value, some BIOS guides point out that leaving scrub on may affect performance benchmarking. Obviously not a good data point, but a hint at the sort of market that cares. Same market that buy cheaper RAM knowing they are going to have more system crashes. There is probably a description gap. That might be a paperwork question as part of system specification. What is relationship between scrub rate and error rate under particular styles of workload (because you get a free scrub whenever you access the memory)? The RAM dimms themselves could in theory provide inputs but the workload dependence makes this hard. Probably fallback on a a test and tune loop over very long runs. Single bit error rates used to detect when getting below a level people are happy with for instance. With the fancier units that can be supported, you can play more reliable memory games by scanning subsets of the memory more frequently. Though it was about a kernel daemon doing scrub, Jiaqi's RFC document here https://lore.kernel.org/all/20221103155029.2451105-1-jiaqiyan@xxxxxxxxxx/ provided justification for on demand scrub - some interesting stuff in the bit on hardware patrol scrubbing. I see you commented on the thread and complexity of hardware solutions. - Cheap memory makes this all more important. - Need for configuration of how fast and when depending on system state. - Lack of flexibility of what is scanned (RAS2 provides some by association with NUMA node + option to request particular ranges, CXL provides per end point controls). There are some gaps on hardware scrubbers, but offloading this problem definitely attractive. So my understanding is there is demand to tune this but it won't be exposed on every system. Jonathan > > -Tony