Re: [RFC PATCH] Introduce generalized data temperature estimation framework

Hans Holmberg <hans@xxxxxxxxxxxxx> · Wed, 29 Jan 2025 11:23:00 +0100

On Tue, Jan 28, 2025 at 11:31 PM Viacheslav Dubeyko
<Slava.Dubeyko@xxxxxxx> wrote:
>
> On Tue, 2025-01-28 at 09:45 +0100, Hans Holmberg wrote:
> > On Mon, Jan 27, 2025 at 9:59 PM Viacheslav Dubeyko
> > <Slava.Dubeyko@xxxxxxx> wrote:
> > >
> > > On Mon, 2025-01-27 at 15:19 +0100, Hans Holmberg wrote:
> > > > On Fri, Jan 24, 2025 at 10:03 PM Viacheslav Dubeyko
> > > > <Slava.Dubeyko@xxxxxxx> wrote:
> > > > >
> > > > >
> > >
> > > > So what I am asking myself is if this framework is added, who would
> > > > benefit? Without any benchmark results it's a bit hard to tell :)
> > > >
> > >
> > > Which benefits would you like to see? I assume we would like: (1) prolong device
> > > lifetime, (2) improve performance, (3) decrease GC burden. Do you mean these
> > > benefits?
> >
> > Yep, decreased write amplification essentially.
> >
>
> The important point here that the suggested framework offers only means to
> estimate temperature. But only file system technique can decrease or increase
> write amplification. So, we need to compare apples with apples. As far as I
> know, F2FS has algorithm of estimation and employing temperature. Do you imply
> F2FS or how do you see the way of estimation the write amplification decreasing?
> Because, every file system should have own way to employ temperature.

If you could show that this framework can decrease write amplification
in ssdfs, f2fs or
any other file system, I think that would be a good start.

Compare using your generated temperatures vs not using the temperature info.

>
> > >
> > > As far as I can see, different file systems can use temperature in different
> > > way. And this is slightly complicates the benchmarking. So, how can we define
> > > the effectiveness here and how can we measure it? Do you have a vision here? I
> > > am happy to make more benchmarking.
> > >
> > > My point is that the calculated file's temperature gives the quantitative way to
> > > distribute even user data among several temperature groups ("baskets"). And
> > > these baskets/segments/anything-else gives the way to properly group data. File
> > > systems can employ the temperature in various ways, but it can definitely helps
> > > to elaborate proper data placement policy. As a result, GC burden can be
> > > decreased, performance can be improved, and lifetime device can be prolong. So,
> > > how can we benchmark these points? And which approaches make sense to compare?
> > >
> >
> > To start off, it would be nice to demonstrate that write amplification
> > decreases for some workload when the temperature is taken into
> > account. It would be great if the workload would be an actual
> > application workload or a synthetic one mimicking some real-world-like
> > use case.
> > Run the same workload twice, measure write amplification and compare results.
> >
>
> Another trouble here. What is the way to measure write amplification, from your
> point of view? Which benchmarking tool or framework do you suggest for write
> amplification estimation?

FDP drives expose this information. You can retrieve the stats using
the nvme cli.
If you are using zoned storage, you can add write amp metrics inside
the file system
or just measure the amount of blocks written to the device using iostat.

> > > > Also, is there a good reason for only supporting buffered io? Direct
> > > > IO could benefit in the same way, right?
> > > >
> > >
> > > I think that Direct IO could benefit too. The question here how to account dirty
> > > memory pages and updated memory pages. Currently, I am using
> > > folio_account_dirtied() and folio_clear_dirty_for_io() to implement the
> > > calculation the temperature. As far as I can see, Direct IO requires another
> > > methods of doing this. The rest logic can be the same.
> >
> > It's probably a good idea to cover direct IO as well then as this is
> > intended to be a generalized framework.
>
> To cover Direct IO is a good point. But even page cache based approach makes
> sense because LFS and GC based file systems needs to manage data in efficient
> way. By the way, do you have a vision which methods can be used for the case of
> Direct IO to account dirty and updated memory pages?
>

Temperature feedback could instead be provided by file systems that
would actually
care about using the information.