Search Postgresql Archives

Re: Background worker assistance & review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, Apr 9, 2015 at 11:56 PM, Craig Ringer <craig@xxxxxxxxxxxxxxx> wrote:


On 9 April 2015 at 05:35, Keith Fiske <keith@xxxxxxxxxx> wrote:
I'm working on a background worker (BGW) for my pg_partman extension. I've gotten the basics of it working for my first round, but there's two features I'm missing that I'd like to add before release:

1) Only allow one instance of this BGW to run

Load your extension in shared_preload_libraries, so that _PG_init runs in the postmaster. Register a static background worker then.

If you need one worker per database (because it needs to access the DB) this won't work for you, though. What we do in BDR is have a single static background worker that's launched by the postmaster, which then launches and terminates per-database workers that do the "real work".

Because of a limitation in the bgworker API in releases 9.4 and older, the static worker has to connect to a database if it wants to access shared catalogs like pg_database. This limitation has been lifted in 9.5 though, along with the need to use the database name instead of its oid to connect (which left bgworkers unable to handle RENAME DATABASE).
 
(We still really need a hook on CREATE DATABASE too)

2) Create a bgw_terminate_partman() function to stop it more intuitively than doing a pg_cancel_backend() on the PID

If you want it to be able to be started/stopped dynamically, you should probably use RequestAddinShmemSpace to allocate a small shared memory block. Use that to register the PGPROC for the current worker when the worker starts, and add a boolean field you can use to ask it to terminate its self. You'll also need a LWLock to protect access to the segment, so you don't have races between a worker starting and the user asking to cancel it, etc.

Unfortunately the BackgroundWorkerHandle struct is opaque, so you cannot store it in shared memory when it's returned by RegisterDynamicBackgroundWorker() and use it to later check the worker's status or ask it to exit. You have to use regular backend manipulation functions and PGPROC instead.

Personally, I suggest that you leave the worker as a static worker, and leave it always running when the extension is active. If it isn't doing anything, have it sleep on its latch, then set its latch from other processes when something interesting happens. (You can put the process latch from PGPROC into your shmem segment so you can set it from elsewhere, or allocate a new latch).

This is my first venture into writing C code for postgres, so I'm not familiar with a lot of the internals yet. I read http://www.postgresql.org/docs/9.4/static/bgworker.html and I see it mentioning how you can check the status of a BGW launched dynamically and the function to terminate one, but I'm not clear how how you can get the information on a currently running BGW to do these things.

You can't. It's a pretty significant limitation in the current API. There's no way to enumerate bgworkers via the bgworker API, only via PGPROC.
 
I used the worker_spi example for a lot of this, so if there's any additional guidance for a better way to do what I've done, I'd appreciate it. All I really have it doing now is calling the run_maintenance() function at a defined interval and don't need it doing more than that yet.

The BDR project has an extension with much more in-depth use of background workers, but it's probably *too* complicated. We have a static bgworker that launches and terminates dynamic bgworkers (per-database) that in turn launch and terminate  more dynamic background workers (per-connection to peer databases).

If you're interested, all the code is mirrored on github:


and the relevant parts are:

https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr_apply.c#L2401
https://github.com/2ndQuadrant/bdr/blob/bdr-plugin/next/bdr.h

... but there's a *lot* of code there.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Craig,

Thanks for the response! Definitely cleared up a lot of questions I had regarding how to interact with currently running BGWs. Glad to know I can at least stop banging my head against the desk about that. I've still got a lot to learn as far as how to interact with shared memory, but now that I know that's the path I have to go down, I'm fine with that.

My current plan now after your response this this:

- Statically launch master BGW with shared_preload_libraries
- Use dynamically launched BGW for each database that pg_partman will run on in the cluster. My previous idea of restricting it to one BGW would likely have stopped it from ever working on more than one database in a cluster.
- Will see if I can create a function that polls the cluster for currently existing databases that actually have pg_partman installed. This should eliminate the need for a GUC naming the databases to run for. Should allow handling if a database is renamed as well. This way, as soon as the extension is created on a database, it should hopefully "just work" and start managing it.

9.4 is my targeted release to support for a while, so I'll just have to deal with the shortcomings you mentioned. Does the above sound like it could work then?

--
Keith Fiske
Database Administrator
OmniTI Computer Consulting, Inc.
http://www.keithf4.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux