Re: Enhancement request for pg_dump

Sergei Agalakov <Sergei.Agalakov@xxxxxxxxxxx> · Sun, 17 Apr 2016 18:26:54 -0600

I hardly can see that a sorting of the grants by users will create a 
measurable impact on the pg_dump performance in a real database.
One can imaging a database with tens of thousands of objects and tens of 
thousands of users and almost no data, but it would be quite unusual.
Anyway, if a sorting behavior is initiated by a command line parameter, 
and isn't a default behavior of pg_dump then this argument doesn't work.
After all pg_dump isn't the tool to do _just_ reliable backup. It can be 
used for migration, it can be used for schema cloning, to initiate a 
standby...
There are many flags for pg_dump that are absolutely unnecessary for 
full database backup. So they do
"... might also overcomplicate it, making it more difficult to maintain 
reliably" but they do exists, and serve a purpose.

I don't understand why people have started to create the theories about 
our development process? Had I requested a tool to magically synchronize
DEV and PROD? No, I asked about a tool to _find_ the unexpected 
differences between databases. If you never encountered a situation when 
in the
dozens of environments the databases has diverged because somebody has 
done something manually - good for you, you are lucky guy then.
I did.

Sergei
On Sun, 17 Apr 2016 14:10:50 -0600
Sergei Agalakov <Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:

> I don't see how these questions are related to the proposed pg_dump
> improvement.
> I suggest to improve pg_dump so it can be used instead of the third
> party tools like DBSteward and SQLWorkbench/J etc.
> to compare two different databases or existing dumps, and to identify
> the differences. The use cases will be exactly
> the same as for the third party tools. The positive difference will be
> that pg_dump is a very reliable, always available and supports all the
> latest PostgreSQL features.
> Do you imply that there shouldn't be any reasons to compare different
> databases to find the differences between them?

Nobody has weighed in on this, but I have a theory ...

I (personally) worry that adding features like you suggest to pg_dump
would interfere with its ability to perform complete dump of a large
database in a _rapid_ manner. Using pg_dump as a backup tool has an
inherent desire for the tool to be as fast and low-impact on the
operation of the database as possible.

Features that would force pg_dump to care about ordering that isn't
necessary to its core functionality of providing a reliable backup
are liable to slow it down. They might also overcomplicate it, making
it more difficult to maintain reliably.

When you consider that possibility, and the fact that pg_dump isn't
_supposed_ to be a tool to help you with schema maintenance, it's easy
to see why someone would look for different approach to the problem.

And I feel that's what all the answers have attempted to do: suggest
ways to get what you want without asking them to be implemented in a
tool that isn't really the right place for them anyway. While your
arguments toward making this change are valid, I'm not sure that
they are compelling enough to justify adding a feature where it
doesn't really belong.

Another side to this, is that your request suggests that your
development process is suboptimal. Of course, I can't be 100% sure
since you haven't explained your process ... but my experience is
that people who feel the need to automagically sync prod and dev
databases have a suboptimal development process. Thus, the suggestions
are also biased toward helping you improve your process instead of
adjusting a tool to better support a suboptimal process.

Of course, if the people actually doing the work on the code disagree
with me, then they'll make the change. I'm just expressing an opinion.

> Sergei
>
> > > On Apr 17, 2016, at 12:41 PM, Sergei Agalakov 
<Sergei(dot)Agalakov(at)getmyle(dot)com> wrote:
> > >
> > > I know about DBSteward. I don't like to bring PHP infrastructure 
only to be able to compare two dumps,
> > > and to deal with potential bugs in the third party tools. The 
pg_dump in other hand is always here, and is always trusted.
> > > SQLWorkbench/J also can compare two schemas, and requires only 
Java. Again, I trust pg_dump more.
> > >http://www.sql-workbench.net/
> > >
> > > May be pg_dump was never INTENDED to generate the dump files 
with the determined order of the statements,
> > > but it CAN do it with the minor changes, and be more useful to 
administrators. Why rely on the third party tools
> > > for the tasks that can be done with the native, trusted tools?
> > >
> > > Sergei
> > Does it matter if they differ if you cannot recreate the correct 
one exactly from source-controllled DDL?  Or know how they are 
supposed to differ if this is a migration point?
>
>
> --
> Sent via pgsql-general mailing list 
(pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
Bill Moran

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general