On Wed, Dec 09, 2020 at 03:51:24PM +0100, Ulf Hansson wrote: > > Even if the discarded blocks are flushed at some wisely selected > point, when the device is idle, that doesn't guarantee that the > internal garbage collection runs inside the device. In the end that > depends on the FW implementation of the card - and I assume it's > likely triggered based on some internal idle time and the amount of > "garbage" there is to deal with. At least from a file system perspective, I don't care when the internal garbage collection actually runs inside the device. What I do care is that (a) a read to a discarded sector returns zero's after it has been discard (or the storage device needs to tell me I can't count on that), and (b) that eventually, for write endurance reasons, the garbage collection will *eventually* happen. If the list of erase blocks or flash pages that are not in use are tracked in such a way that they are actually garbage collected before the device actually needs free blocks, it really doesn't matter if it happens right away, or hours later. (If the device is 90% free, because it was just formatted and we did a pre-discard at format time, then it could happen hours or days later.) But if the device's FTL is too incompetent such that it loses track of which erase blocks / flash pages do need to be GC'ed, such that it impacts device lifetime... well then, that's sad, and it would be nice to find out about this without having to do an expensive, time-consuming certification process. (OTOH, all the big companies are doing hardware certifications anyway, because you can't fully trust the storage vendors, and how many storage vendors are really going to admit, or make it easy to determine, "the FTL is so cost-optimized that it's cr*p"? :-) Having a way to tell the storage device that it would be better to suspend GC, or to accelerate GC, because we know the device is about to become much less likely to perform writes, would certainly be a good and useful thing to do, although I see that as mostly being useful for improving I/O performance, especially for low-end flash --- I suspect that for high-end SSD's, which are designed so that they can handle continuous write streams without much performance degradation, they have enough oomph in their internal CPU that they can do GC's in real-time while the device is under a continuous random write workload with only minimal performance impacts. > *) Use the runtime PM framework to detect an idle period and then > trigger background operations. The problem is, that we don't really > know how long we will be idle, meaning that we don't know if it's > really a wise decision to trigger the background operations in the > end. > > **) Invent a new type of generic block request, as to let userspace > trigger this. I think you really want to give userspace the ability to trigger this. Whether it's via a generic block request, or an ioctl, I'll leave that to the people maintain the driver and/or block layer. That's because userspace will have knowledge to things like, "the screen is off", or "the phone is on the wireless charger and/or the user has said, "OK, Google, goodnight" to trigger the night-time home automation commands. We can of course try to make some automatic determinations based on the runtime PM framework, but that doesn't necessarily tell us the likelihood that the system will become busy in the future; OTOH, maybe that doesn't matter, if the storage needs only a very tiny amount of time after it's told, "stop GC", to finish up what it's doing so it can respond to I/O request at full speed? - Ted