On Mon, 2025-01-27 at 12:37 -0800, Bart Van Assche wrote: > On 1/24/25 1:11 PM, Viacheslav Dubeyko wrote: > > On Fri, 2025-01-24 at 12:44 -0800, Bart Van Assche wrote: > > > On 1/23/25 12:33 PM, Viacheslav Dubeyko wrote: > > > > I would like to discuss a generalized data "temperature" > > > > estimation framework. > > > > > > Is data available that shows the effectiveness of this approach and > > > that compares this approach with existing approaches? > > > > Yes, I did the benchmarking. I can see the quantitative estimation of > > files' temperature. > > What has been measured in these benchmarks? > How temperature can be used depends on file system. So, my goal of benchmarking was to see the temperature values under file's updates. I integrated the temperature estimation framework into SSDFS file system and the temperature value has been stored into system log with the goal to see that math is working. And temperature is only quantitative estimation that can be used by any means. If we would like to compare the benchmarking results, then it means that we would like to compare the techniques of different file systems. Potentially, we can integrate the temperature estimation framework in any file system, but it needs to elaborate how a particular file system can benefit from it. So, as far as I can see, benchmarking is slightly tricky point here. > > Which existing approaches would you like to compare? > > F2FS has a built-in algorithm for assigning data temperatures. > Maybe, it is time to generalize this approach too? The generalized framework could contain several algorithms. If I understood correctly, F2FS approach is based on static assigning different temperatures to different files' extensions. And if we processing a file for particular extension, then we assume that this file is hot or cold. Am I correct here? If I am correct, then the goal of suggested approach is to switch from static assumption about data nature and to estimate it on quantitative basis with the goal to classify data on more fair basis. But it doesn't mean that F2FS way and suggested approach should compete. Technically speaking, both approaches could be complimentary ones. > > And what could we imply by effectiveness of the approach? Do you have > > a vision how we can estimate the effectiveness? :) > > Isn't the goal of providing data temperature information to the device > to reduce write amplification (W.A.)? I think that W.A. data would be > useful but I'm not sure whether such data is easy to extract from a > storage device. > Yes, we can consider it as one of the goals. Because, we can consider of improving performance, decreasing GC burden, collaborating effectively with storage device. The reducing of write amplification is important goal and it is possible to try to estimate it without extracting the data from storage device (but how accurate could be this data?). But, again, the problem here that we can estimate efficiency of file system(s) but not the temperature estimation framework itself. Maybe, we can consider of integration of suggested framework into F2FS? Because, we can compare the apples with apples, finally. What do you think? Thanks, Slava.