Re: [PATCH][SMB3] allow controlling length of time directory entries are cached with dir leases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 4 Sept 2023 at 13:44, ronnie sahlberg <ronniesahlberg@xxxxxxxxx> wrote:
>
> On Sat, 2 Sept 2023 at 08:47, Steve French <smfrench@xxxxxxxxx> wrote:
> >
> > I also noticed that Windows apparently lets you control the size of
> > the directory entry cache (the file info cached for directories). See
> > below:
> >
> > DirectoryCacheEntrySizeMax
> > HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DirectoryCacheEntrySizeMax
> > The default is 64 KB. This is the maximum size of directory cache entries.
> >
> > Should we add a tuneable similar to this (per mount? per system?)
>
> Probably not because most of the time these settings do not really
> work, and often, when you increase them you make things worse.
>
> I think you want to implement something like when a cached directory
> is re-opened then the timeout is reset so that
> hot directories will remain in the cache longer. Which is what you
> want. You want hot data in the case.
>
> On the other hand, you want cold data to expire as quickly as
> possible, because cache that is held up by cold data can not
> be used to store hot data until the cold data has expired from the
> cache. So you want this timer as short as possible.
> The shortest possible timeout without it also expiring out hot data.
>
> Shorter timeouts and quicker expunge oc cache == better performance.
> It might not sound intuitively but I can show with a simple example.
>
> Assume your cache has 10 slots to store a directory.
> Assume you have 1000 cold directories that are accessed relatively infrequently.
> Assume you have 1 directory that is hot and is accessed 10 times more
> frequently than a cold directory.
>
> We now changed the timeout to 60 seconds. This means any cold
> directory that enters the cache will sit in the cache and block that
> entry for 60 seconds
> until it is cleared and something else can use that cache slot.
> This is akin a model where every 60 seconds you have a lottery where
> 10 directories will win entry to the cache.
> What is the probability that a hot directory wins the lottery and
> becomes cached?
> In this example it is 1%  because the access to the hot directory is
> only 1% of all access.
> All the cold directories have just a 0.1% chance of becomming cached
> but since there are so  many of them they will still dominate.
> (The problem we have to solve now is to get the hot directory into the
> cache as fast as possible and to get it to remain in cache for as long
> as possible.)
>
> On average thus we will have to wait 50 iterations until the hot
> directory will even enter the cache for the first time,  or it will on
> average take 50 minutes
> before the hot directory is even cached.
> If we had left the original 30 second timeout it would "only" have
> taken 25 minutes on average to get this directory into the cache.
> This kind of suggests that even 30 seconds would be way to big for
> this example and maybe we should use 10 seconds, or less.
>
> You want hot directories in the cache for as long as possible. A good
> way to do this is to make them sticky, so that if they are frequently
> accessed while
> cached, make them sticky so they do not expire as easily from the cache.
> On the other hand, any cold directory you want to expire from the
> cache as fast as possible since every time you hold a cold directory
> in cache, that part of the cache becomes
> useless and wasted.  You do this by, for example, setting this timeout
> as low as possible. So they are kicked out as soon as possible.
>
>


A slightly more complex implementation might be that
when you add a directory to the cache you set an initial, relatively
short timeout,
maybe 3 seconds, before it expires.
Then every time a directory is accessed while in the cache, you reset
the timeout
to double the expiry time up to a maximum of 30 seconds/60 seconds/...

That way a hot directory will relatively quickly become sticky while a
cold directory is kicked out
after just 3 seconds or so or whatever the initial short timeout is set to.


Probably would need a good set of data and real tests to tweak these
values and understand what
good defaults would be.
Probably would be a good idea with a bunch of dbench scripts to create
and perform some I/O resembling
common application workloads on sets of a few tens of thousands of
directories and measure cache-hit rate
as well as wall-clock time to run each test. Maybe also compare with
test runs agains a target with very roundtrip
versus very high roundtrips. And some mixture between the ratio of hot
vs cold directories.

To be realistic I think the total set of directories in the test would
need to be orders of magnitude
larger than the number of directories that can fit in the cache.

>
>
>
>
>
> >
> > On Fri, Sep 1, 2023 at 5:20 PM Steve French <smfrench@xxxxxxxxx> wrote:
> > >
> > > On Fri, Sep 1, 2023 at 11:31 AM ronnie sahlberg
> > > <ronniesahlberg@xxxxxxxxx> wrote:
> > > >
> > > > Maybe just re-set the timestamp every time the cached directory is reopened,
> > > > that way a hot directory will remain in cache indefinitely but one
> > > > that is cold will
> > > > quickly time out and make space for something else to be chaced.
> > >
> > >
> > > Makes sense
> > >
> > >
> > > --
> > > Thanks,
> > >
> > > Steve
> >
> >
> >
> > --
> > Thanks,
> >
> > Steve




[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux