回复：Orphan files filling root partition after crash

"赵宇鹏(宇彭)" <zhaoyupeng.zyp@xxxxxxxxxxxxxxx> · Tue, 26 Mar 2024 14:23:38 +0800

Hello, it looks like I've run into the same issue as you. I exhausted the diskspace while executing DDL operations, and then after crash recovery, I found
there were orphaned files.
I believe the reason is that due to the lack of space, some of the WAL logs were
not persisted, such as the abort-type WAL logs. During the WAL replay phase,
the absence of abort-type WAL logs resulted in missing the corresponding unlink
operations.
I can replicate a similar scenario. For example, 16394 is an orphaned file that
was generated by the DDL but was not rolled back.

1. CREATE TABLE test(id int);
2. INSERT INTO test SELECT generate_series(1, 100000000);
3. ALTER TABLE test ALTER COLUMN id TYPE bigint;
   sudo chmod 000 pg_wal
4. sudo chmod 777 pg_wal
   pg_waldump xxx
5. pg_ctl -D /data start
6. ll -h /data/base/5

1.0G Mar 26 11:47 16387.1
1.0G Mar 26 11:47 16387.2
385M Mar 26 13:38 16387.3
888K Mar 26 13:38 16387_fsm
112K Mar 26 11:51 16387_vm
1.0G Mar 26 13:47 16394
179M Mar 26 13:47 16394.1
320K Mar 26 13:47 16394_fsm

postgres=# select pg_size_pretty(pg_database_size('postgres'));
 pg_size_pretty
----------------
 4668 MB
(1 row)
postgres=# select pg_size_pretty(pg_total_relation_size('test'));
 pg_size_pretty
----------------
 3458 MB
(1 row)
postgres=# select count(*) from pg_class where pg_relation_filenode(oid) = 16394;
 count
-------
     0
(1 row)

------------------------------------------------------------------
发件人：Dimitrios Apostolou <jimis@xxxxxxx>
发送时间：2024年3月26日(星期二) 11:15
收件人："pgsql-general"<pgsql-general@xxxxxxxxxxxxxxxxxxxx>
主　题：Orphan files filling root partition after crash

Hello list,

yesterday I was doing:

ALTER TABLE x ADD UNIQUE ... , ADD PRIMARY KEY ...;

The table is almost a billion rows long but lies in its own TABLESPACE
that has plenty of space.  But apparently the ALTER TABLE command is
writing a lot to the default tablespace (not the temp_tablespace, that is
already moved to a different partition).

That quickly filled up the 50GB free space in my root partition:

20:18:04.222 UTC [94144] PANIC:  could not write to file "pg_wal/xlogtemp.94144": No space left on device
[...]
20:19:11.578 UTC [94140] LOG:  WAL writer process (PID 94144) was terminated by signal 6: Aborted
20:19:11.578 UTC [94140] LOG:  terminating any other active server processes

After postgresql crashed and restarted, the disk space in the root
partition was still not freed! I believe this is because of "orphaned
files" as discussed in mailing list thread [1].

[1] https://www.postgresql.org/message-id/CAN-RpxDBA7HbTsJPq4t4VznmRFJkssP2SNEMuG%3DoNJ%2B%3DsxLQew%40mail.gmail.com

I ended up doing some risky actions to remediate the problem: Find the
filenames that have no identically named "oid" in pg_class, and delete
(move to backup) the biggest ones while the database is stopped.
Fortunately the database started up fine after that!

So what is the moral of the story? How to guard against this?

Why did the database write so much to the default tablespace, even when
both the table and the temp tablespace are elsewhere?  Also should one
always keep the default tablespace away from the wal partition? (I guess
it would have helped to avoid the crash, but the ALTER TABLE command would
have still run out of space, so I'm not sure if the orphan files would
have been avoided)?

Needless to say, I would have hoped the database cleaned-up after itself
even after an uncontrolled crash, or that it provided some tools for the
job. (I tried VACUUM FULL on the table, but the orphaned files did not go
away).

My postgresql version is 16.2 installed on Ubuntu.

Thank you,
Dimitris