On Fri, Apr 10, 2015 at 1:00 PM, Keith Fiske <keith@xxxxxxxxxx> wrote:
On Thu, Apr 9, 2015 at 11:56 PM, Craig Ringer <craig@xxxxxxxxxxxxxxx> wrote:On 9 April 2015 at 05:35, Keith Fiske <keith@xxxxxxxxxx> wrote:https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_apply.c#L2401I'm working on a background worker (BGW) for my pg_partman extension. I've gotten the basics of it working for my first round, but there's two features I'm missing that I'd like to add before release:1) Only allow one instance of this BGW to runLoad your extension in shared_preload_libraries, so that _PG_init runs in the postmaster. Register a static background worker then.If you need one worker per database (because it needs to access the DB) this won't work for you, though. What we do in BDR is have a single static background worker that's launched by the postmaster, which then launches and terminates per-database workers that do the "real work".Because of a limitation in the bgworker API in releases 9.4 and older, the static worker has to connect to a database if it wants to access shared catalogs like pg_database. This limitation has been lifted in 9.5 though, along with the need to use the database name instead of its oid to connect (which left bgworkers unable to handle RENAME DATABASE).(We still really need a hook on CREATE DATABASE too)2) Create a bgw_terminate_partman() function to stop it more intuitively than doing a pg_cancel_backend() on the PIDIf you want it to be able to be started/stopped dynamically, you should probably use RequestAddinShmemSpace to allocate a small shared memory block. Use that to register the PGPROC for the current worker when the worker starts, and add a boolean field you can use to ask it to terminate its self. You'll also need a LWLock to protect access to the segment, so you don't have races between a worker starting and the user asking to cancel it, etc.Unfortunately the BackgroundWorkerHandle struct is opaque, so you cannot store it in shared memory when it's returned by RegisterDynamicBackgroundWorker() and use it to later check the worker's status or ask it to exit. You have to use regular backend manipulation functions and PGPROC instead.Personally, I suggest that you leave the worker as a static worker, and leave it always running when the extension is active. If it isn't doing anything, have it sleep on its latch, then set its latch from other processes when something interesting happens. (You can put the process latch from PGPROC into your shmem segment so you can set it from elsewhere, or allocate a new latch).This is my first venture into writing C code for postgres, so I'm not familiar with a lot of the internals yet. I read http://www.postgresql.org/docs/9.4/static/bgworker.html and I see it mentioning how you can check the status of a BGW launched dynamically and the function to terminate one, but I'm not clear how how you can get the information on a currently running BGW to do these things.You can't. It's a pretty significant limitation in the current API. There's no way to enumerate bgworkers via the bgworker API, only via PGPROC.The BDR project has an extension with much more in-depth use of background workers, but it's probably *too* complicated. We have a static bgworker that launches and terminates dynamic bgworkers (per-database) that in turn launch and terminate more dynamic background workers (per-connection to peer databases).If you're interested, all the code is mirrored on github:and the relevant parts are:https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.h
... but there's a *lot* of code there.--Craig,Thanks for the response! Definitely cleared up a lot of questions I had regarding how to interact with currently running BGWs. Glad to know I can at least stop banging my head against the desk about that. I've still got a lot to learn as far as how to interact with shared memory, but now that I know that's the path I have to go down, I'm fine with that.My current plan now after your response this this:
- Statically launch master BGW with shared_preload_libraries
- Use dynamically launched BGW for each database that pg_partman will run on in the cluster. My previous idea of restricting it to one BGW would likely have stopped it from ever working on more than one database in a cluster.
- Will see if I can create a function that polls the cluster for currently existing databases that actually have pg_partman installed. This should eliminate the need for a GUC naming the databases to run for. Should allow handling if a database is renamed as well. This way, as soon as the extension is created on a database, it should hopefully "just work" and start managing it.9.4 is my targeted release to support for a while, so I'll just have to deal with the shortcomings you mentioned. Does the above sound like it could work then?
So I've made some progress and thought I'd found a way around requiring the use of shared memory (for now).
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe
The idea is there will be a main BGW that is always running in a loop. At the interval time period, the main loop will parse a GUC variable to get a list of databases and launch a dynamic BGW for each one to run the partition maintenance if pg_partman is installed.
I looked up the method that the shared_preload_libraries GUC uses to parse a csv list of values starting on line 238 with making a copy of the GUC variable to parse out
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L238
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L238
Everything within the pg_partman_bgw_main() function can read the value fine, all the way to the point where it assigns the dbname variable here
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L251
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L251
However, once I try and pass this variable to the dynamic BGW function, it seems to lose access to it. An example from my postgres logs:
2015-04-21 17:18:57 EDT [] [6866]: [6-1] user=,db=,e=00000 LOG: starting background worker process "pg_partman master background worker"
2015-04-21 17:18:57 EDT [] [6866]: [7-1] user=,db=,e=00000 LOG: database system is ready to accept connections
2015-04-21 17:18:57 EDT [] [6875]: [1-1] user=,db=,e=00000 LOG: pg_partman master background worker master process initialized with role keith on database template1
2015-04-21 17:18:57 EDT [] [6872]: [1-1] user=,db=,e=00000 LOG: autovacuum launcher started
2015-04-21 17:19:17 EDT [] [6875]: [2-1] user=,db=,e=00000 DEBUG: Dynamic bgw launch begun for keith
2015-04-21 17:19:17 EDT [] [6866]: [8-1] user=,db=,e=00000 LOG: registering background worker "pg_partman dynamic background worker (dbname=keith)"
2015-04-21 17:19:17 EDT [] [6866]: [9-1] user=,db=,e=00000 LOG: starting background worker process "pg_partman dynamic background worker (dbname=keith)"
2015-04-21 17:19:17 EDT [] [6877]: [1-1] user=,db=,e=00000 DEBUG: Before run_maint initialize connection for db @òö
2015-04-21 17:19:17 EDT [] [6877]: [2-1] user=,db=,e=3D000 FATAL: database " @òö " does not exist
2015-04-21 17:19:17 EDT [] [6866]: [10-1] user=,db=,e=00000 LOG: worker process: pg_partman dynamic background worker (dbname=keith) (PID 6877) exited with exit code 1
2015-04-21 17:19:17 EDT [] [6866]: [11-1] user=,db=,e=00000 LOG: unregistering background worker "pg_partman dynamic background worker (dbname=keith)"
2015-04-21 17:18:57 EDT [] [6866]: [6-1] user=,db=,e=00000 LOG: starting background worker process "pg_partman master background worker"
2015-04-21 17:18:57 EDT [] [6866]: [7-1] user=,db=,e=00000 LOG: database system is ready to accept connections
2015-04-21 17:18:57 EDT [] [6875]: [1-1] user=,db=,e=00000 LOG: pg_partman master background worker master process initialized with role keith on database template1
2015-04-21 17:18:57 EDT [] [6872]: [1-1] user=,db=,e=00000 LOG: autovacuum launcher started
2015-04-21 17:19:17 EDT [] [6875]: [2-1] user=,db=,e=00000 DEBUG: Dynamic bgw launch begun for keith
2015-04-21 17:19:17 EDT [] [6866]: [8-1] user=,db=,e=00000 LOG: registering background worker "pg_partman dynamic background worker (dbname=keith)"
2015-04-21 17:19:17 EDT [] [6866]: [9-1] user=,db=,e=00000 LOG: starting background worker process "pg_partman dynamic background worker (dbname=keith)"
2015-04-21 17:19:17 EDT [] [6877]: [1-1] user=,db=,e=00000 DEBUG: Before run_maint initialize connection for db @òö
2015-04-21 17:19:17 EDT [] [6877]: [2-1] user=,db=,e=3D000 FATAL: database " @òö " does not exist
2015-04-21 17:19:17 EDT [] [6866]: [10-1] user=,db=,e=00000 LOG: worker process: pg_partman dynamic background worker (dbname=keith) (PID 6877) exited with exit code 1
2015-04-21 17:19:17 EDT [] [6866]: [11-1] user=,db=,e=00000 LOG: unregistering background worker "pg_partman dynamic background worker (dbname=keith)"
You can see the first line of the pg_partman_bgw_run_maint() function just sees the variable as junk
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L312
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L312
Doing some debugging, I've narrowed down the issue back to where the pstrdup() makes a copy of the GUC variable. If I just have a single, valid database set for pg_partman_bgw.dbname like this in postgresql.conf
pg_partman_bgw.dbname = 'keith'
And I then replace this line
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L262
pg_partman_bgw.dbname = 'keith'
And I then replace this line
https://gist.github.com/keithf4/3dbda40f7f941ada7ffe#file-pg_partman_bgw-c-L262
with
worker.bgw_main_arg = CStringGetDatum(pg_partman_bgw_dbname);
worker.bgw_main_arg = CStringGetDatum(pg_partman_bgw_dbname);
everything works fine
2015-04-21 17:40:18 EDT [] [7113]: [8-1] user=,db=,e=00000 LOG: registering background worker "pg_partman dynamic background worker (dbname=keith)"
2015-04-21 17:40:18 EDT [] [7113]: [9-1] user=,db=,e=00000 LOG: starting background worker process "pg_partman dynamic background worker (dbname=keith)"
2015-04-21 17:40:18 EDT [] [7127]: [1-1] user=,db=,e=00000 DEBUG: Before run_maint initialize connection for db keith
2015-04-21 17:40:18 EDT [] [7127]: [2-1] user=,db=,e=00000 DEBUG: After run_maint initialize connection for db keith
2015-04-21 17:40:18 EDT [] [7127]: [3-1] user=,db=,e=00000 DEBUG: Checking if pg_partman extension is installed in database: keith
2015-04-21 17:40:18 EDT [] [7127]: [4-1] user=,db=,e=00000 LOG: pg_partman dynamic background worker (dbname=keith) dynamic background worker initialized with role keith on database keith
2015-04-21 17:40:18 EDT [] [7127]: [5-1] user=,db=,e=00000 LOG: pg_partman dynamic background worker (dbname=keith): SELECT partman.run_maintenance(p_analyze := true, p_jobmon := false) called by role keith on database keith
2015-04-21 17:40:18 EDT [] [7113]: [10-1] user=,db=,e=00000 LOG: worker process: pg_partman dynamic background worker (dbname=keith) (PID 7127) exited with exit code 0
2015-04-21 17:40:18 EDT [] [7113]: [11-1] user=,db=,e=00000 LOG: unregistering background worker "pg_partman dynamic background worker (dbname=keith)"
Obviously I want to be able to support more than a single database so I need a method of splitting the GUC variable. I even tried using a simpler method a coworker showed me using strtok_r() instead of this List method, but had the same result. All the examples in other tools I've seen for the bgw_main_arg value have been Int32 values (including your BDR tool). Is there any way to pass a string as the argument bgw_main_arg?