Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Tue, 22 Sep 2020 15:47:56 -0700

On 9/22/20 3:28 PM, tutiluren@xxxxxxxxxxxx wrote:

Sep 21, 2020, 7:53 PM by jd@xxxxxxxxxxxxxxxxx:

    I have to agree that pg_dump is largely a step child backup program.
    It has consistently been found over the years to be lacking in a
    number of areas. Unfortunately, working on pg_dump isn't sexy and it
    is difficult to get volunteers or even paid resources to do such a
    thing. The real solution for pg_dump is a complete refactor which
    includes pg_dumpall and it is not a small undertaking. It should be
    noted that it is also a less and less used program. On our team it
    is normally used for only very specific needs (grabbing a schema)
    and we use binary backups or logical replication to receive specific
    data.

Huh? Are you saying that there is another, superior way to back up 
PostgreSQL databases other than pg_dump? I re-read the manual on it just 
now, but didn't see a single word about it being "legacy" or 
"deprecated" or even that there's any other way to do it. What do you mean?

It is not deprecated or legacy and it is still used by many. The issue 
is that people try to do 'partial' dumps without reading the docs on 
what those switches actually do and the potential conflicts therein. For 
large installations there it is often better to use some form of 
replication(binary and/or logical) to maintain a continuous backup. 
pg_dump can take a long time on very big databases, so it may not keep up.

    This is really using the wrong tool for the job type of issue. PG
    was never designed for such a scenario.

I hate the "wrong tool for the job" argument. It assumes that everyone 
has infinite time, energy and brain capacity to learn endless redundant 
tools just to "use the right tool for the job" rather than "use what you 
actually know". I know PG. I don't know SQLite. They are very different. 
So obviously, I want to use PG.

This comes down to what your definition of embedded is?  A matter of 
determining whether we are talking apples or oranges.

What exactly makes PG unsuitable for this? I don't get it. But at the 
same time, I also realize that it's not going to happen at this point. 
The entire concept of a desktop computer appears to be phased out as we 
speak...

While I understand what you mean, and even agree in theory, in practice, 
this always results in crappy third-party solutions which I don't want 
to deal with. PostGIS, for example, forces me to use "postgis" for its 
schema instead of "PostGIS" just because they arrogantly didn't 
construct their internal queries properly. "Little" things like that.

Huh? A schema is just a name space, why does it matter how the extension 
chooses to define it? I mean you could have number of permutations of 
postgis.

        The practical end result of this is that I've always gone back
        to using the untouched default configuration file (except for
        the logging-related options), which, especially in the past on
        FreeBSD, *severely* crippled my PG database to not even come
        close to taking advantage of the full power of the hardware.
        Instead, it felt like I was using maybe 1% of the machine's
        power, even with a proper database design and indexes and all of
        that stuff, simply because the default config was so
        "conservative" and it couldn't be just set to "use whatever
        resources are available".

    Not to be unkind but this does seem lazy. There are literally
    hundreds of "how to make postgres go fast", "how to optimize
    postgres" if you take 15 minutes to Google. It is true that the
    project (outside of the wiki) doesn't have much information in the
    official documentation but that doesn't mean that the information is
    not available.

Hundreds of crappy, outdated, confusing, badly written "web tutorialz" 
are worth nothing. A couple of clear, unambiguous documentation 
paragraphs are worth their (metaphorical) weight in gold.

The problem is the 'unambiguous' part. There are so many ways Postgres 
is used, writing a configuration doc that suited everyone would have so 
many if/and/or/buts that it would define ambiguity. If you want that 
information post an email here with the information on your server specs 
and proposed usage.

Claiming that "the information is out there" is just hand-waving. It's 
shifting the burden to the user to actively hunt for information, and 
very likely be misled by all the garbage articles out there. I learned 
some horrible practices early on from "web tutz" and it took me many 
years to unlearn that stuff.

I know that writing documentation isn't fun, but it's necessary. Also, 
my overall point was to not even have to deal with the specifics, but 
just be able to tell PG with a single config option that it's allowed to 
"use most of the machine's resources".

That would entail building an AI into  the code that would deal with all 
the possible OS(versions), Postgres(versions), hardware permutations.

        I wish so much for PG to have a mode where it self-tunes itself
        as needed, over time, based on the actual workload, or at least
        allowed some kind of abstract "performance mode" such as: "you
        are allowed to use significant system resources, PG", or: "You
        are one of my most important applications. Just use as much
        power as you currently need, but at least save about 10% for the
        rest of the system, will you?" Maybe this is also harder than it
        sounds to accomplish, but for somebody like me who has zero
        funding, I cannot hire some professional to sit down with me and
        fine-tune my system for $899/hour.

    See my comment about Google. The information is out there and easy
    to find.

I guess I'm the worst idiot in the world, then, who can't DuckDuckGo 
(Google is evil) it even after 15 years.

Seriously, I didn't type my feedback "for fun". It may be difficult for 
very intelligent people to understand (as often is the case, because you 
operate on a whole different level), but the performance-related 
PostgreSQL configuration options are a *nightmare* to me and many 
others. I spent *forever* reading about them and couldn't make any sense 
of it all. Each time I tried, I would give up, frustrated and angry, 
with no real clue what "magic numbers" it wanted.

It's quite baffling to me how this can be so difficult for you all to 
understand. Even if we disregard the sheer intelligence factor, it's 
clear that users of PG don't have the same intimate knowledge of PG's 
internals as the PG developers, nor could possibly be expected to.

As mentioned, I kept going back to the default configuration over and 
over again. Anyone who doesn't is either a genius or pretends/thinks 
that they understand it. (Or I'm extremely dumb.)

Does your server runs to your satisfaction with the default settings?
If not what issues do you see?

Very often, I get the feeling that things like that are the way they are 
on purpose. Work security and whatnot. But it's very frustrating for 
people like me who can't afford to buy help and don't have the enormous 
brain capacity necessary to comprehend the complex relations between the 
numerous performance-related config options. It really is that difficult.

    Discord and Slack

Those modern services don't even let me load them. It's the same thing 
with everything these days: "verify with phone", "we've detected 
suspicious activity", "fake error message", etc.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx