The trigger(which supplies data to channel), background worker registration and the background worker's main function are all in one .so file. The core background process logic is in another helper .so file. The background process main loads the helper .so file using dlopen and executes the core logic from it.
Issue faced:
It works fine in windows, but in linux where I tried with postgres 13.6, there is a problems. It runs as intended for a while, but after about 2-3hrs, postgresql postmaster's CPU utilization shoots up and seems to get stuck. The CPU shoot up is very sudden (less than 5 mins, till it starts shooting up it is normal). I am not able to establish connection with postgres from psql client. In the log the following message keeps repeating: WARNING: worker took too long to start; canceled.
I tried commenting out various areas of my code, adding sleeps at different places of the core processing loop and even disabling the trigger but the issue occurs.
Softwares and libraries used:
go version go1.17.5 linux/amd64,
Postgres 13.6, in Ubuntu
LibPq: https://github.com/lib/pq,
My core processing loop looks like this:
maxReconn := time.Minute
listener := pq.NewListener(<connectionstring>, minReconn, maxReconn, EventCallBackFn); // libpq used here.
defer listener.UnlistenAll();
if err = listener.Listen("mystream"); err != nil {
panic(err);
}
var itemsProcessedSinceLastSleep int = 0;
for {
select {
case signal := <-signalChan:
PgLog(PG_LOG, "Exiting loop due to termination signal : %d", signal);
return 1;
case pgstatus := <-pmStatusChan:
PgLog(PG_LOG, "Exiting loop as postmaster is not running : %d ", pgstatus);
return 1;
case data := <-listener.Notify:
itemsProcessedSinceLastSleep = itemsProcessedSinceLastSleep + 1;
if itemsProcessedSinceLastSleep >= 1000 {
time.Sleep(time.Millisecond * 10);
itemsProcessedSinceLastSleep = 0;
}
ProcessChangeReceivedAtStream(data); // This performs the data processing
case <-time.After(10 * time.Second):
time.Sleep(100 * time.Millisecond);
var cEpoch = time.Now().Unix();
if cEpoch - lastConnChkTime > 1800 {
lastConnChkTime = cEpoch;
if err := listener.Ping(); err!=nil {
PgLog(PG_LOG, "Seems to be a problem with connection")
}
}
default:
time.Sleep(time.Millisecond * 100);
}
}