Re: pg_basebackup, requested WAL has already been removed

Lonni J Friedman <netllama@xxxxxxxxx> · Fri, 10 May 2013 10:02:54 -0700

That's a good point.  Then i dunno, perhaps it is a bug, but I'd be
surprised if this wasn't working, as its not really a corner case that
could be missed in testing, as long as all the options were exercised.
 Hopefully someone else can weigh in.

On Fri, May 10, 2013 at 10:00 AM, Sergey Koposov <koposov@xxxxxxxxxxxxx> wrote:
>
> On Fri, 10 May 2013, Lonni J Friedman wrote:
>
>> Its definitely not a bug.  You need to set/increase wal_keep_segments
>> to a value that ensures that they aren't recycled faster than the time
>> required to complete the base backup (plus some buffer).
>
>
> But I thought that wal_keep_segments is not needed for the streaming regime
> ( "--xlog-method=stream")  And the documentation
> http://www.postgresql.org/docs/9.2/static/app-pgbasebackup.html
> only mentions wal_keep_segments when talking about --xlog-method=fetch.
>
>
>>
>> On Fri, May 10, 2013 at 9:48 AM, Sergey Koposov
>
> <koposov@xxxxxxxxxxxxx> wrote: >> Hi,
>>>
>>>
>>> I've recently started to use pg_basebackup --xlog-method=stream to backup
>>> my
>>> multi-Tb database.
>>> Before I did the backup when there was not much activity in the DB and it
>>> went perfectly fine, but today, I've started the backup and it failed
>>> twice
>>> almost at the same time as the
>>> CREATE INDEX  (and another time CLUSTER) commands were finished.
>>>
>>> Here:
>>>
>>> postgres@cappc118:/mnt/backup/wsdb_130510$ pg_basebackup
>>> --xlog-method=stream --progress --verbose --pg
>>> transaction log start point: 23AE/BD003E70
>>> pg_basebackup: starting background WAL receiver
>>> pg_basebackup: unexpected termination of replication stream: FATAL:
>>> requested WAL segment 00000001000023B1000000FE has already been removed
>>> 4819820/16816887078 kB (4%), 0/1 tablespace
>>> (/mnt/backup/wsdb_130510/base/1)
>>>
>>> And the logs from around that time contained:
>>>
>>> some_user:wsdb:2013-05-10 14:35:41 BST:10587LOG:  duration: 40128.163 ms
>>> statement: CREATE INDEX usno_cle
>>> an_q3c_idx ON usno_clean (q3c_ang2ipix(ra,dec));
>>> ::2013-05-10 14:35:43 BST:25529LOG:  checkpoints are occurring too
>>> frequently (8 seconds apart)
>>> ::2013-05-10 14:35:43 BST:25529HINT:  Consider increasing the
>>> configuration
>>> parameter "checkpoint_segmen
>>> ts".
>>> ::2013-05-10 14:35:51 BST:25529LOG:  checkpoints are occurring too
>>> frequently (8 seconds apart)
>>> ::2013-05-10 14:35:51 BST:25529HINT:  Consider increasing the
>>> configuration
>>> parameter "checkpoint_segmen
>>> ts".
>>> postgres:[unknown]:2013-05-10 14:35:55 BST:8177FATAL:  requested WAL
>>> segment
>>> 00000001000023B1000000FE has already been removed
>>> some_user:wsdb:2013-05-10 14:36:59 BST:10599LOG:  duration: 78378.194 ms
>>> statement: CLUSTER usno_clean_q3c_idx ON usno_clean;
>>>
>>> One the previous occasion when it happened the CREATE INDEX() was being
>>> executed:
>>>
>>> some_user:wsdb:2013-05-10 09:17:20 BST:3300LOG:  duration: 67.680 ms
>>> statement: SELECT name FROM  (SELECT pg_catalog.lower(name) AS name FROM
>>> pg_catalog.pg_settings   UNION ALL SELECT 'session authorization'   UNION
>>> ALL SELECT 'all') ss  WHERE substring(name,1,4)='rand'
>>>         LIMIT 1000
>>> ::2013-05-10 09:22:47 BST:25529LOG:  checkpoints are occurring too
>>> frequently (18 seconds apart)
>>> ::2013-05-10 09:22:47 BST:25529HINT:  Consider increasing the
>>> configuration
>>> parameter "checkpoint_segments".
>>> postgres:[unknown]:2013-05-10 09:22:49 BST:27659FATAL:  requested WAL
>>> segment 000000010000239900000040 has already been removed
>>> some_user:wsdb:2013-05-10 09:22:57 BST:3236LOG:  duration: 542955.262 ms
>>> statement: CREATE INDEX xmatch_temp_usnoid_idx ON xmatch_temp (usno_id);
>>>
>>> The .configuration
>>> PG 9.2.4, Debian 7.0, amd64
>>>
>>> shared_buffers = 10GB
>>> work_mem = 1GB
>>> maintenance_work_mem = 1GB
>>> effective_io_concurrency = 5
>>> synchronous_commit = off
>>> checkpoint_segments = 32
>>> max_wal_senders = 2
>>> effective_cache_size = 30GB
>>> autovacuum_max_workers = 3
>>> wal_level=archive
>>> archive_mode = off
>>>
>>> Does it look like a bug or am I missing something ?
>>>
>>> Thanks,
>>>         Sergey
>>>
>>>
>>> --
>>> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-general
>>
>>
>>
>>
>> --
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> L. Friedman                                    netllama@xxxxxxxxx
>> LlamaLand                       https://netllama.linux-sxs.org
>>
>

-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L. Friedman                                    netllama@xxxxxxxxx
LlamaLand                       https://netllama.linux-sxs.org

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general