Hi Dave, lots of nice inputs but let's try to summarize your user's needs.
On Fri, Aug 24, 2018 at 3:18 PM, David Gauthier <davegauthierpg@xxxxxxxxx> wrote:
[...]He's writing a script/program that runs on a workstation and needs to write data to a DB. This process also sends work to a batch system on a server farm external to the workstation that will create multiple, parallel jobs/processes that also have to write to the DB as well. The workstation may have many of these jobs running at the same time. And there are 58 workstation which all have/use locally mounted disks for this work.
Do the workstations work independently or do they need to share information? If they share information, does it need to be ACID or is "eventually consistent" enough? Also, what is the size of the database? A local database is viable as long as it does not overload the workstation but it you have many instances of the job per workstation and 58 of them, I would rely on an optimized central DB.
Are the jobs mainly INSERT or a mix of INSERT and SELECT/UPDATE?
Things to consider: amount of RAM required, CPU, disk I/O... The cost factor should be considered.
At first blush, this is easy. Just create a DB on a server and have all those clients work with it. But he's also adamant about having the DB on the same server(s) that ran the script AND on the locally mounted disk. He said he doesn't want the overhead, dependencies and worries of anything like an external DB with a DBA, etc... . He also wants this to be fast.
If he insists on having a local copy of the database, suggest PG10 with logical replication - ideally the bidirectional replication. Once he realizes that every INSERT causes 57 replications and this will most likely kill the network, he may reconsider the requirement... Queries to a central DB is likely to cause less stress and will be easier to handle.
Now, if speed is critical, you may also consider an in-memory DB like Redis. There are schemes running in multi-master.
My first thought was SQLite. Apparently, they now have some sort of multiple, concurrent write ability. But there's no way those batch jobs on remote machines are going to be able to get at the locally mounted disk on the workstation. So I dismissed that idea. Then I thought about having 58 PG installs, one per workstation, each serving all the jobs pertaining to that workstation. That could work. But 58 DB instances ? If he didn't like the ideal of one DBA, 58 can't be good. Still, the DB would be on the workstation which seems to be what he wants.
A distributed database is likely to cause failures at some point if not handled properly - and it sounds like there won't be any DBA at all. I would be extremely cautious in this case.