Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

tutiluren@xxxxxxxxxxxx · Wed, 23 Sep 2020 00:28:14 +0200 (CEST)

Sep 21, 2020, 7:53 PM by jd@xxxxxxxxxxxxxxxxx:
I have to agree that pg_dump is largely a step child backup program. It has consistently been found over the years to be lacking in a number of areas. Unfortunately, working on pg_dump isn't sexy and it is difficult to get volunteers or even paid resources to do such a thing. The real solution for pg_dump is a complete refactor which includes pg_dumpall and it is not a small undertaking. It should be noted that it is also a less and less used program. On our team it is normally used for only very specific needs (grabbing a schema) and we use binary backups or logical replication to receive specific data.
Huh? Are you saying that there is another, superior way to back up PostgreSQL databases other than pg_dump? I re-read the manual on it just now, but didn't see a single word about it being "legacy" or "deprecated" or even that there's any other way to do it. What do you mean?

3. The ability to embed PG to run in an automatic, quiet manner as part of something else. I know about SQLite, but it's extremely limited to the point of being virtually useless IMO, which is why I cannot use that for anything nontrivial. I want my familiar PostgreSQL, only not require it to be manually and separately installed on the machine where it is to run as part of some "application". If I could just "embed" it, this would allow me to create a single EXE which I can simply put on a different machine to run my entire "system" which otherwise takes *tons* of tedious, error-prone manual labor to install, set up and maintain. Of course, this is probably much easier said than done, but I don't understand why PG's architecture necessarily dictates that PG must be a stand-alone, separate thing. Or rather, why some "glue" cannot enable it to be used just like SQLite from a *practical* perspective, even if it still is a "server-client model" underneath the hood. (Which doesn't matter at all to me, nor should it matter to anyone else.)

This is really using the wrong tool for the job type of issue. PG was never designed for such a scenario.
I hate the "wrong tool for the job" argument. It assumes that everyone has infinite time, energy and brain capacity to learn endless redundant tools just to "use the right tool for the job" rather than "use what you actually know". I know PG. I don't know SQLite. They are very different. So obviously, I want to use PG.

What exactly makes PG unsuitable for this? I don't get it. But at the same time, I also realize that it's not going to happen at this point. The entire concept of a desktop computer appears to be phased out as we speak...

4. There is no built-in means to have PG manage (or even suggest) indexes on its own. Trying to figure out what indexes to create/delete/fine-tune, and determine all the extremely complex rules for this art (yes, I just called index management an *art*, because it is!), is just utterly hopeless to me. It never gets any easier. Not even after many years. It's the by far worst part of databases to me (combined with point five). Having to use third-party solutions ensures that it isn't done in practice, at least for me. I don't trust, nor do I want to deal with, external software and extensions in my databases. I still have nightmares from PostGIS, which I only keep around, angrily, out of absolute necessity. I fundamentally don't like third-party add-ons to things, but want the core product to properly support things. Besides, this (adding/managing indexes) is not even some niche/obscure use-case, but something which is crucial for basically any nontrivial database of any kind!

I think you are looking at this from a very windows centric way. Open Source has its origins from the Unix paradigm where each tool was designed to solve one type of problem and you used multiple tools to create a "solution". Though we have strayed from that on some items due to the evolving nature of software needs, that is still at our core and for good reason. Having tools, flags etc... to do such things (including your point #3) creates complexity best left to "vendors" not the software project.
While I understand what you mean, and even agree in theory, in practice, this always results in crappy third-party solutions which I don't want to deal with. PostGIS, for example, forces me to use "postgis" for its schema instead of "PostGIS" just because they arrogantly didn't construct their internal queries properly. "Little" things like that.

The practical end result of this is that I've always gone back to using the untouched default configuration file (except for the logging-related options), which, especially in the past on FreeBSD, *severely* crippled my PG database to not even come close to taking advantage of the full power of the hardware. Instead, it felt like I was using maybe 1% of the machine's power, even with a proper database design and indexes and all of that stuff, simply because the default config was so "conservative" and it couldn't be just set to "use whatever resources are available".

Not to be unkind but this does seem lazy. There are literally hundreds of "how to make postgres go fast", "how to optimize postgres" if you take 15 minutes to Google. It is true that the project (outside of the wiki) doesn't have much information in the official documentation but that doesn't mean that the information is not available.
Hundreds of crappy, outdated, confusing, badly written "web tutorialz" are worth nothing. A couple of clear, unambiguous documentation paragraphs are worth their (metaphorical) weight in gold.

Claiming that "the information is out there" is just hand-waving. It's shifting the burden to the user to actively hunt for information, and very likely be misled by all the garbage articles out there. I learned some horrible practices early on from "web tutz" and it took me many years to unlearn that stuff.

I know that writing documentation isn't fun, but it's necessary. Also, my overall point was to not even have to deal with the specifics, but just be able to tell PG with a single config option that it's allowed to "use most of the machine's resources".

I wish so much for PG to have a mode where it self-tunes itself as needed, over time, based on the actual workload, or at least allowed some kind of abstract "performance mode" such as: "you are allowed to use significant system resources, PG", or: "You are one of my most important applications. Just use as much power as you currently need, but at least save about 10% for the rest of the system, will you?" Maybe this is also harder than it sounds to accomplish, but for somebody like me who has zero funding, I cannot hire some professional to sit down with me and fine-tune my system for $899/hour.

See my comment about Google. The information is out there and easy to find.
I guess I'm the worst idiot in the world, then, who can't DuckDuckGo (Google is evil) it even after 15 years.

Seriously, I didn't type my feedback "for fun". It may be difficult for very intelligent people to understand (as often is the case, because you operate on a whole different level), but the performance-related PostgreSQL configuration options are a *nightmare* to me and many others. I spent *forever* reading about them and couldn't make any sense of it all. Each time I tried, I would give up, frustrated and angry, with no real clue what "magic numbers" it wanted.

It's quite baffling to me how this can be so difficult for you all to understand. Even if we disregard the sheer intelligence factor, it's clear that users of PG don't have the same intimate knowledge of PG's internals as the PG developers, nor could possibly be expected to.

As mentioned, I kept going back to the default configuration over and over again. Anyone who doesn't is either a genius or pretends/thinks that they understand it. (Or I'm extremely dumb.)

Very often, I get the feeling that things like that are the way they are on purpose. Work security and whatnot. But it's very frustrating for people like me who can't afford to buy help and don't have the enormous brain capacity necessary to comprehend the complex relations between the numerous performance-related config options. It really is that difficult.

 Discord and Slack
Those modern services don't even let me load them. It's the same thing with everything these days: "verify with phone", "we've detected suspicious activity", "fake error message", etc.