Search Postgresql Archives

Re: Can I get some PostgreSQL developer feedback on these five general issues I have with PostgreSQL and its ecosystem?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/22/20 3:28 PM, tutiluren@xxxxxxxxxxxx wrote:

Sep 21, 2020, 7:53 PM by jd@xxxxxxxxxxxxxxxxx:

    I have to agree that pg_dump is largely a step child backup program.
    It has consistently been found over the years to be lacking in a
    number of areas. Unfortunately, working on pg_dump isn't sexy and it
    is difficult to get volunteers or even paid resources to do such a
    thing. The real solution for pg_dump is a complete refactor which
    includes pg_dumpall and it is not a small undertaking. It should be
    noted that it is also a less and less used program. On our team it
    is normally used for only very specific needs (grabbing a schema)
    and we use binary backups or logical replication to receive specific
    data.

Huh? Are you saying that there is another, superior way to back up PostgreSQL databases other than pg_dump? I re-read the manual on it just now, but didn't see a single word about it being "legacy" or "deprecated" or even that there's any other way to do it. What do you mean?

It is not deprecated or legacy and it is still used by many. The issue is that people try to do 'partial' dumps without reading the docs on what those switches actually do and the potential conflicts therein. For large installations there it is often better to use some form of replication(binary and/or logical) to maintain a continuous backup. pg_dump can take a long time on very big databases, so it may not keep up.



    This is really using the wrong tool for the job type of issue. PG
    was never designed for such a scenario.

I hate the "wrong tool for the job" argument. It assumes that everyone has infinite time, energy and brain capacity to learn endless redundant tools just to "use the right tool for the job" rather than "use what you actually know". I know PG. I don't know SQLite. They are very different. So obviously, I want to use PG.

This comes down to what your definition of embedded is? A matter of determining whether we are talking apples or oranges.



What exactly makes PG unsuitable for this? I don't get it. But at the same time, I also realize that it's not going to happen at this point. The entire concept of a desktop computer appears to be phased out as we speak...


While I understand what you mean, and even agree in theory, in practice, this always results in crappy third-party solutions which I don't want to deal with. PostGIS, for example, forces me to use "postgis" for its schema instead of "PostGIS" just because they arrogantly didn't construct their internal queries properly. "Little" things like that.

Huh? A schema is just a name space, why does it matter how the extension chooses to define it? I mean you could have number of permutations of postgis.


        The practical end result of this is that I've always gone back
        to using the untouched default configuration file (except for
        the logging-related options), which, especially in the past on
        FreeBSD, *severely* crippled my PG database to not even come
        close to taking advantage of the full power of the hardware.
        Instead, it felt like I was using maybe 1% of the machine's
        power, even with a proper database design and indexes and all of
        that stuff, simply because the default config was so
        "conservative" and it couldn't be just set to "use whatever
        resources are available".


    Not to be unkind but this does seem lazy. There are literally
    hundreds of "how to make postgres go fast", "how to optimize
    postgres" if you take 15 minutes to Google. It is true that the
    project (outside of the wiki) doesn't have much information in the
    official documentation but that doesn't mean that the information is
    not available.

Hundreds of crappy, outdated, confusing, badly written "web tutorialz" are worth nothing. A couple of clear, unambiguous documentation paragraphs are worth their (metaphorical) weight in gold.

The problem is the 'unambiguous' part. There are so many ways Postgres is used, writing a configuration doc that suited everyone would have so many if/and/or/buts that it would define ambiguity. If you want that information post an email here with the information on your server specs and proposed usage.


Claiming that "the information is out there" is just hand-waving. It's shifting the burden to the user to actively hunt for information, and very likely be misled by all the garbage articles out there. I learned some horrible practices early on from "web tutz" and it took me many years to unlearn that stuff.

I know that writing documentation isn't fun, but it's necessary. Also, my overall point was to not even have to deal with the specifics, but just be able to tell PG with a single config option that it's allowed to "use most of the machine's resources".

That would entail building an AI into the code that would deal with all the possible OS(versions), Postgres(versions), hardware permutations.


        I wish so much for PG to have a mode where it self-tunes itself
        as needed, over time, based on the actual workload, or at least
        allowed some kind of abstract "performance mode" such as: "you
        are allowed to use significant system resources, PG", or: "You
        are one of my most important applications. Just use as much
        power as you currently need, but at least save about 10% for the
        rest of the system, will you?" Maybe this is also harder than it
        sounds to accomplish, but for somebody like me who has zero
        funding, I cannot hire some professional to sit down with me and
        fine-tune my system for $899/hour.


    See my comment about Google. The information is out there and easy
    to find.

I guess I'm the worst idiot in the world, then, who can't DuckDuckGo (Google is evil) it even after 15 years.

Seriously, I didn't type my feedback "for fun". It may be difficult for very intelligent people to understand (as often is the case, because you operate on a whole different level), but the performance-related PostgreSQL configuration options are a *nightmare* to me and many others. I spent *forever* reading about them and couldn't make any sense of it all. Each time I tried, I would give up, frustrated and angry, with no real clue what "magic numbers" it wanted.

It's quite baffling to me how this can be so difficult for you all to understand. Even if we disregard the sheer intelligence factor, it's clear that users of PG don't have the same intimate knowledge of PG's internals as the PG developers, nor could possibly be expected to.

As mentioned, I kept going back to the default configuration over and over again. Anyone who doesn't is either a genius or pretends/thinks that they understand it. (Or I'm extremely dumb.)

Does your server runs to your satisfaction with the default settings?
If not what issues do you see?


Very often, I get the feeling that things like that are the way they are on purpose. Work security and whatnot. But it's very frustrating for people like me who can't afford to buy help and don't have the enormous brain capacity necessary to comprehend the complex relations between the numerous performance-related config options. It really is that difficult.

    Discord and Slack

Those modern services don't even let me load them. It's the same thing with everything these days: "verify with phone", "we've detected suspicious activity", "fake error message", etc.


--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux