RE: [LSF/MM/BPF TOPIC] Generalized data temperature estimation framework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2025-01-27 at 12:37 -0800, Bart Van Assche wrote:
> On 1/24/25 1:11 PM, Viacheslav Dubeyko wrote:
> > On Fri, 2025-01-24 at 12:44 -0800, Bart Van Assche wrote:
> > > On 1/23/25 12:33 PM, Viacheslav Dubeyko wrote:
> > > > I would like to discuss a generalized data "temperature"
> > > > estimation framework.
> > > 
> > > Is data available that shows the effectiveness of this approach and
> > > that compares this approach with existing approaches?
> > 
> > Yes, I did the benchmarking. I can see the quantitative estimation of
> > files' temperature.
> 
> What has been measured in these benchmarks?
> 

How temperature can be used depends on file system. So, my goal of benchmarking
was to see the temperature values under file's updates. I integrated the
temperature estimation framework into SSDFS file system and the temperature
value has been stored into system log with the goal to see that math is working.
And temperature is only quantitative estimation that can be used by any means.

If we would like to compare the benchmarking results, then it means that we
would like to compare the techniques of different file systems. Potentially, we
can integrate the temperature estimation framework in any file system, but it
needs to elaborate how a particular file system can benefit from it.

So, as far as I can see, benchmarking is slightly tricky point here. 

> > Which existing approaches would you like to compare?
> 
> F2FS has a built-in algorithm for assigning data temperatures.
> 

Maybe, it is time to generalize this approach too? The generalized framework
could contain several algorithms.

If I understood correctly, F2FS approach is based on static assigning different
temperatures to different files' extensions. And if we processing a file for
particular extension, then we assume that this file is hot or cold. Am I correct
here?

If I am correct, then the goal of suggested approach is to switch from static
assumption about data nature and to estimate it on quantitative basis with the
goal to classify data on more fair basis. But it doesn't mean that F2FS way and
suggested approach should compete. Technically speaking, both approaches could
be complimentary ones.

> > And what could we imply by effectiveness of the approach? Do you have
> > a vision how we can estimate the effectiveness? :)
> 
> Isn't the goal of providing data temperature information to the device
> to reduce write amplification (W.A.)? I think that W.A. data would be
> useful but I'm not sure whether such data is easy to extract from a
> storage device.
> 

Yes, we can consider it as one of the goals. Because, we can consider of
improving performance, decreasing GC burden, collaborating effectively with
storage device. The reducing of write amplification is important goal and it is
possible to try to estimate it without extracting the data from storage device
(but how accurate could be this data?). But, again, the problem here that we can
estimate efficiency of file system(s) but not the temperature estimation
framework itself. Maybe, we can consider of integration of suggested framework
into F2FS? Because, we can compare the apples with apples, finally. What do you
think?

Thanks,
Slava.





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux