On Mon, 4 Sept 2023 at 13:44, ronnie sahlberg <ronniesahlberg@xxxxxxxxx> wrote: > > On Sat, 2 Sept 2023 at 08:47, Steve French <smfrench@xxxxxxxxx> wrote: > > > > I also noticed that Windows apparently lets you control the size of > > the directory entry cache (the file info cached for directories). See > > below: > > > > DirectoryCacheEntrySizeMax > > HKLM\System\CurrentControlSet\Services\LanmanWorkstation\Parameters\DirectoryCacheEntrySizeMax > > The default is 64 KB. This is the maximum size of directory cache entries. > > > > Should we add a tuneable similar to this (per mount? per system?) > > Probably not because most of the time these settings do not really > work, and often, when you increase them you make things worse. > > I think you want to implement something like when a cached directory > is re-opened then the timeout is reset so that > hot directories will remain in the cache longer. Which is what you > want. You want hot data in the case. > > On the other hand, you want cold data to expire as quickly as > possible, because cache that is held up by cold data can not > be used to store hot data until the cold data has expired from the > cache. So you want this timer as short as possible. > The shortest possible timeout without it also expiring out hot data. > > Shorter timeouts and quicker expunge oc cache == better performance. > It might not sound intuitively but I can show with a simple example. > > Assume your cache has 10 slots to store a directory. > Assume you have 1000 cold directories that are accessed relatively infrequently. > Assume you have 1 directory that is hot and is accessed 10 times more > frequently than a cold directory. > > We now changed the timeout to 60 seconds. This means any cold > directory that enters the cache will sit in the cache and block that > entry for 60 seconds > until it is cleared and something else can use that cache slot. > This is akin a model where every 60 seconds you have a lottery where > 10 directories will win entry to the cache. > What is the probability that a hot directory wins the lottery and > becomes cached? > In this example it is 1% because the access to the hot directory is > only 1% of all access. > All the cold directories have just a 0.1% chance of becomming cached > but since there are so many of them they will still dominate. > (The problem we have to solve now is to get the hot directory into the > cache as fast as possible and to get it to remain in cache for as long > as possible.) > > On average thus we will have to wait 50 iterations until the hot > directory will even enter the cache for the first time, or it will on > average take 50 minutes > before the hot directory is even cached. > If we had left the original 30 second timeout it would "only" have > taken 25 minutes on average to get this directory into the cache. > This kind of suggests that even 30 seconds would be way to big for > this example and maybe we should use 10 seconds, or less. > > You want hot directories in the cache for as long as possible. A good > way to do this is to make them sticky, so that if they are frequently > accessed while > cached, make them sticky so they do not expire as easily from the cache. > On the other hand, any cold directory you want to expire from the > cache as fast as possible since every time you hold a cold directory > in cache, that part of the cache becomes > useless and wasted. You do this by, for example, setting this timeout > as low as possible. So they are kicked out as soon as possible. > > A slightly more complex implementation might be that when you add a directory to the cache you set an initial, relatively short timeout, maybe 3 seconds, before it expires. Then every time a directory is accessed while in the cache, you reset the timeout to double the expiry time up to a maximum of 30 seconds/60 seconds/... That way a hot directory will relatively quickly become sticky while a cold directory is kicked out after just 3 seconds or so or whatever the initial short timeout is set to. Probably would need a good set of data and real tests to tweak these values and understand what good defaults would be. Probably would be a good idea with a bunch of dbench scripts to create and perform some I/O resembling common application workloads on sets of a few tens of thousands of directories and measure cache-hit rate as well as wall-clock time to run each test. Maybe also compare with test runs agains a target with very roundtrip versus very high roundtrips. And some mixture between the ratio of hot vs cold directories. To be realistic I think the total set of directories in the test would need to be orders of magnitude larger than the number of directories that can fit in the cache. > > > > > > > > > On Fri, Sep 1, 2023 at 5:20 PM Steve French <smfrench@xxxxxxxxx> wrote: > > > > > > On Fri, Sep 1, 2023 at 11:31 AM ronnie sahlberg > > > <ronniesahlberg@xxxxxxxxx> wrote: > > > > > > > > Maybe just re-set the timestamp every time the cached directory is reopened, > > > > that way a hot directory will remain in cache indefinitely but one > > > > that is cold will > > > > quickly time out and make space for something else to be chaced. > > > > > > > > > Makes sense > > > > > > > > > -- > > > Thanks, > > > > > > Steve > > > > > > > > -- > > Thanks, > > > > Steve