> From: Shaun Thomas <sthomas@xxxxxxxxxxxxxxxx>
> To: Glyn Astill <glynastill@xxxxxxxxxxx>
> Cc: PostgreSQL General <pgsql-general@xxxxxxxxxxxxxx>
> Sent: Wednesday, 6 March 2013, 14:35
> Subject: Re: [GENERAL] Why does slony use a cursor? Anyone know?
>
> On 03/06/2013 04:49 AM, Glyn Astill wrote:
>
>> What version of slony are you on? The specifics of what you mention
>> don't sound quite right, but it sounds very much like bug 167 which
>> was fixed in 2.1.2 if I remember correctly.
>
> We're on 2.1.2. Presumably, anyway. I didn't encounter the problem in
> stage when I set up a testbed. But it also might not be related. The problem I
> can tell from the logs, is that it was closing the cursor pretty much right as
> soon as it got the results. 75 seconds to set up a cursor of that size and then
> an hour to sync all the data isn't a problem. 75 seconds for every 500 rows
> *is*.
>
> The stage test I did didn't do that when I deleted 20M rows from a 50M row
> table, but I also only set it up with a single replication set. My next test
> will be to test with two or three replication sets that all get big deletes like
> that. My guess is that it can't adequately swap between them on SYNC events,
> so it has to rebuild the cursor every time.
>
> To: Glyn Astill <glynastill@xxxxxxxxxxx>
> Cc: PostgreSQL General <pgsql-general@xxxxxxxxxxxxxx>
> Sent: Wednesday, 6 March 2013, 14:35
> Subject: Re: [GENERAL] Why does slony use a cursor? Anyone know?
>
> On 03/06/2013 04:49 AM, Glyn Astill wrote:
>
>> What version of slony are you on? The specifics of what you mention
>> don't sound quite right, but it sounds very much like bug 167 which
>> was fixed in 2.1.2 if I remember correctly.
>
> We're on 2.1.2. Presumably, anyway. I didn't encounter the problem in
> stage when I set up a testbed. But it also might not be related. The problem I
> can tell from the logs, is that it was closing the cursor pretty much right as
> soon as it got the results. 75 seconds to set up a cursor of that size and then
> an hour to sync all the data isn't a problem. 75 seconds for every 500 rows
> *is*.
>
> The stage test I did didn't do that when I deleted 20M rows from a 50M row
> table, but I also only set it up with a single replication set. My next test
> will be to test with two or three replication sets that all get big deletes like
> that. My guess is that it can't adequately swap between them on SYNC events,
> so it has to rebuild the cursor every time.
>
Yeah, you'd expect the reason for using the cursor would be to pull those 500 lines into memory, process them and then get the next 500 etc. I've not seen any such lags on our systems, that doesn't mean it's not happening with much milder symptoms.
You say it happened on your production setup but not when you tried to reproduce it in your test environment, so is there anything useful in the slony logs to suggest things were not quite right at the time? I'm guessing your slons were running and generating syncs.
I'd definitely be asking on the slony lists about this, either something isn't right with your setup or it's something they can resolve.