I have a question about the files in .../data/postgresql/11/main/base, specifically in relation to very large tables and how they are written.
I have been attempting to restore a relatively large database with pg_restore and it has been running for more than a week. (I also have another thread about the same restore related to network vs. local disk I/O)
I ran the pg_restore in verbose mode using multiple jobs so I can tell what has finished and what has not. The database had 66 tables and most of them have been restored. Two of the tables were quite large (billions of rows translating to between 1 and 2TB of data on disk for those two tables) and those two tables are pretty much the only things remaining that has not been reported as finished by pg_restore.
As the process has been going for a week, I have been tracking the machine (a dedicated piece of hardware, non-virtualized) and have been noticing a progressive slowdown (as tracked by iostat). There is nothing running on the machine besides postgresql and the server is only doing this restore, nothing else. It is now, on average, running at less than 25% of the speed that it was running four days ago (as measured by rate of I/O).
I started to dig into what was happening on the machine and I noticed the following:
iotop reports that two postgres processes (I assume each processing one of the two tables that needs to be processed) are doing all the I/O. That makes sense
Total DISK READ : 1473.81 K/s | Total DISK WRITE : 617.30 K/s
Actual DISK READ: 1473.81 K/s | Actual DISK WRITE: 0.00 B/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
6601 be/4 postgres 586.44 K/s 7.72 K/s 0.00 % 97.39 % postgres: 11/main: postg~s thebruteff [local] COPY
6600 be/4 postgres 887.37 K/s 601.87 K/s 0.00 % 93.42 % postgres: 11/main: postg~s thebruteff [local] COPY
666 be/3 root 0.00 B/s 7.72 K/s 0.00 % 5.73 % [jbd2/sda1-8]
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
4 be/0 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kworker/0:0H]
So, the next thing I though I would do is an "lsof" command for each of those two processes to see what they were writing. That was a bit of a surpise:
# lsof -p 6600 | wc -l;
840
# lsof -p 6601 | wc -l;
906
Is that normal? That there be so many open file pointers? ~900 open file pointers for each of the processes?
The next I did was go to see the actual data files, to see how many there are. In my case they are in postgresql/11/main/base/24576 and there are 2076 files there. That made sense. However, I found that when I list them by modification date I see something interesting:
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.7
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.8
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.9
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.10
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.11
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.12
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.13
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.14
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.16
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.15
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.17
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.18
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.19
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.21
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.22
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.23
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.24
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.25
-rw------- 1 postgres postgres 1073741824 Oct 8 13:05 27083.26
-rw------- 1 postgres postgres 19062784 Oct 8 13:05 27082_fsm
-rw------- 1 postgres postgres 544489472 Oct 8 13:05 27077.34
-rw------- 1 postgres postgres 169705472 Oct 8 13:05 27082.72
-rw------- 1 postgres postgres 978321408 Oct 8 13:05 27083.27
-rw------- 1 postgres postgres 342925312 Oct 8 13:05 27076.88
If you notice, the file size is capped at 1 GB and as the giant table has grown it has added more files in this directory. However, the mysterious thing to me is that it keeps modifying those files constantly - even the ones that are completely full. So for the two large tables it has been restoring all week, the modification time for the ever growing list of files is being updating constantly.
Could it be that thats why I am seeing a slowdown over the course of the week - that for some reason as the number of files for the table has grown, the system is spending more and more time seeking around the disk to touch all those files for some reason?
Does anyone who understands the details of postgresql's interaction with the file system have an explanation for why all those files which are full are being touched constantly?