PostgreSQL occasionally unable to rename WAL files (NTFS)

Guy Burgess <guy@xxxxxxxxxxxxx> · Thu, 11 Feb 2021 13:21:12 +1300



    Hello,
    Running 13.1 on Windows Server 2019, I am getting the following
      log entries occasionally:
        2021-02-11 12:34:10.149 NZDT [6072] LOG:  could not rename
      file "pg_wal/0000000100000099000000D3": Permission denied

          2021-02-11 12:40:31.377 NZDT [6072] LOG:  could not rename
      file "pg_wal/0000000100000099000000D3": Permission denied

          2021-02-11 12:46:06.294 NZDT [6072] LOG:  could not rename
      file "pg_wal/0000000100000099000000D3": Permission denied

          2021-02-11 12:46:16.502 NZDT [6072] LOG:  could not rename
      file "pg_wal/0000000100000099000000DA": Permission denied

          2021-02-11 12:50:20.917 NZDT [6072] LOG:  could not rename
      file "pg_wal/0000000100000099000000D3": Permission denied

          2021-02-11 12:50:31.098 NZDT [6072] LOG:  could not rename
      file "pg_wal/0000000100000099000000DA": Permission denied
    What appears to be happening is the affected WAL files (which is
      usually only 2 or 3 WAL files at a time) are somehow "losing"
      their NTFS permissions, so the PG process can't rename them -
      though of course the PG process created them. Even running icacls
      as admin gives "Access is denied" on those files. A further oddity
      is the affected files do end up disappearing after a while.

    
    The NTFS permissions on the pg_wal directory are correct, and
      most WAL files are unaffected. Chkdsk reports no problems, and the
      database is working fine otherwise. Have tried disabling antivirus
      software in case that was doing something but no difference. 

    
    I found another recent report of similar behaviour here:
https://stackoverflow.com/questions/65405479/postgresql-13-log-could-not-rename-file-pg-wal-0000000100000001000000c6

    
    WAL config as follows:
    
      wal_level = replica

        fsync = on

        synchronous_commit = on

        wal_sync_method = fsync

        full_page_writes = on

        wal_compression = off

        wal_log_hints = off

        wal_init_zero = on

        wal_recycle = on

        wal_buffers = -1

        wal_writer_delay = 200ms

        wal_writer_flush_after = 1MB

        wal_skip_threshold = 2MB

        commit_delay = 0

        commit_siblings = 5

        checkpoint_timeout = 5min

        max_wal_size = 2GB

        min_wal_size = 256MB

        checkpoint_completion_target = 0.7

        checkpoint_flush_after = 0

        checkpoint_warning = 30s

        archive_mode = off

      
    I'm thinking of disabling wal_recycle as a first step to see if
      that makes any difference, but thought I'd seek some comments
      first.

    
    Not sure how much of a problem this is - the database is running
      fine otherwise - but any thoughts would be appreciated.  

    
    Thanks & regards,

    
    Guy