Re: To keep indexes in memory, is large enough effective_cache_size enough?

"Sam R." <samruohola@xxxxxxxxx> · Tue, 25 Sep 2018 06:36:18 +0000 (UTC)

Hi!

"Index in memory" topic:

After read operation starts,
I think / it seems that a big part of an index gets loaded to memory quite quickly. A lot of IDs fit to one 8 KB page in PostgreSQL. When reading operation starts, pages start to be loaded to memory quickly.

So, this "feature" / PostgreSQL may work well in our very long "random" IDs cases still.

It is not needed / not possible to keep the whole index in memory: E.g. if there is not enough memory / size of index is bigger than memory, it is not even possible to keep whole index in memory. 
( Regarding in memory DB functionalities: I do not know would "In-memory" index / db work in such situations, if index would not fit in memory. We would like to keep most of the index in memory, but not whole index in all cases e.g. when there is not enough memory available. )

So, again, maybe PostgreSQL works well in our case.

Regarding double buffering: I do not know how much double buffering would slow down operations. 
It could also be possible to turn off kernel page cache on our DB server, to avoid double buffering. Although, we may still keep it in use.

BR Sam

    On Wednesday, September 19, 2018 5:50 PM, Sam R. <samruohola@xxxxxxxxx> wrote:

 Thanks for the comments!

Sam wrote:
>> The data in db table columns is not needed to be kept in memory, only the index. (hash index.)

Jeff Janes wrote:
> This sounds like speculation.  Do you have hard evidence that this is actually the case?

In our case the "ID" is randomly generated random number. (Large ID.)
It is not a "sequential" number, but random.

In generation phase, it is a very large random number. Our application may not even generate the random ID.

We use hash index over the ID.

At the moment, in "pure theory", we will read randomly through the hash index.
So, no one will be able to know what part of the data (from the table) should be kept in memory.

Side note: Of course there may be (even many) use cases, where same data is read again and again. Still: I am thinking now from a very theoretical point of view (which we may still apply in practice).

In generic: 
I am not certain how PostgreSQL or hash indexes work in detail, so my claim / wish of keeping only the index in memory may be faulty.  (This is one reason for these discussions.)

BR Sam