On Mon, Jul 13, 2020 at 8:05 PM Jeff Janes <jeff.janes@xxxxxxxxx> wrote:
On Mon, Jul 13, 2020 at 10:23 AM Henrique Montenegro <typoon@xxxxxxxxx> wrote:insert into users_no_dups (
created_ts,
user_id,
name,
url
) (
select
created_ts,
user_id,
name,
url
from
users
) on conflict do nothingOnce the size of the only index exceeds shared_buffers by a bit (the amount of "a bit" depends on your RAM, kernel version, settings for dirty_background_ratio, dirty_expire_centisecs, and probably other things, and is not easy to predict) the performance falls off a cliff when inserting values in a random order. Every insert dirties a random index leaf page, which quickly gets evicted from shared_buffers to make room for other random leaf pages to be read in, and then turns into flush calls when the kernel freaks out about the amount and age of dirty pages held in memory.
That is interesting to know. I will do some research on those things.
What happens if you add an "ORDER BY user_id" to your above select?
I don't know. I will give it a try right now.
shared_buffers = 8GB
RAM: 256GBOr, crank up shared_buffers by a lot. Like, beyond the size of the growing index, or up to 240GB if the index ever becomes larger than that. And make the time between checkpoints longer. If the dirty buffers are retained in shared_buffers longer, chances of them getting dirtied repeatedly between writes is much higher than if you just toss them to the kernel and hope for the best.
I cranked it up to 160GB to see how it goes.
Cheers,Jeff
I created the partitions as well as mentioned before. I was able to partition the table based on the user_id (found some logic to it). I was transferring the data from the original table (about 280 million records; 320GB) to the new partitioned table and things were going well with write speeds between 30MB/s and 50MB/s. After reading 270GB of the 320GB (in 4 and a half hours) and writing it to the new partitioned table, write speed went down to 7KB/s. It is so frustrating.
I will keep the partitions and try your suggestions to see how it goes.
I apologize for the long time between replies, it is just that testing this stuff takes 4+ hours each run.
If there are any other suggestions of things for me to look meanwhile as well, please keep them coming.
Thanks!
Henrique