Search Postgresql Archives

Re: How can I pushdown of functions used in targetlist with FDW ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Greetings,

* Tom Lane (tgl@xxxxxxxxxxxxx) wrote:
> Stephen Frost <sfrost@xxxxxxxxxxx> writes:
> > * Tom Lane (tgl@xxxxxxxxxxxxx) wrote:
> >> There is, AFAIK, no provision for that.  There's not a lot of
> >> reason to consider adding it either, because there's no reason
> >> to suppose that the remote node can run such a function any
> >> faster than the local node.
> 
> > Also, I don't think I agree with this particular position- it's entirely
> > possible that the remote server is much less loaded/busy than the local
> > one and therefore it'd be beneficial, overall, to run that function on
> > the remote system and not the local one.
> 
> Or the reverse.  We have no way of estimating such effects.

Yeah, it'd be nice if there was an overall solution, but I don't know
that we have to solve that to support this.

> > The function might even have
> > side-effects or use other objects in the system to run.
> 
> In such a case, we *can't* be pushing down.  The entire foundational
> principle of this behavior is that we must be certain that a pushed-down
> operation has exactly the same semantics on either node.  Otherwise,
> the planner's choices are not optimizations but query behavior changes,
> and we can't have that.  If you've got a function like that to run
> on the remote end, you have to embed it in a remote view -- we simply
> don't have another option.

This is where things get complicated because I would argue that we *can*
push down and in some cases we *have* to, but we definitely need to sort
out when to do which and I think that's where the whole SQL/MED stuff
and FOREIGN FUNCTIONs come into play.

In other words, I see options like:

a) Function is a FOREIGN FUNCTION, therefore we must push it down
b) Function is only available locally and therefore we must run it
   locally
c) Function is available in *both* places and expected to have the same
   behavior

'a' and 'b' are pretty clear.  In an ideal world, we'd figure out where
the *best* place to run 'c' is and then run it there.  Perhaps there's a
way to set up a cost model for it, but at least at the moment, we only
have one cost for functions and it'd almost certainly be the same value
for local as for remote in the 'c' case, which means we need something
else.  For my 2c, I'd be inclined to use a heuristic of "if it's
actually possible for us to push it down, we should do so."  Of course
there'll be cases where it's terrible to do so and so it'd be good if
there was some way for users to tell us which to do (maybe we have a GUC
for it?  The enable_* mechanism is an unfortunately large hammer but it
might cover enough cases (enable_funcpushdown?)).

> > The function
> > may also return a much smaller result than pulling back the raw data (of
> > course, the opposite could also possibly be true).
> 
> Yeah, this is a legitimate point, but again we have no very good way
> of estimating which is better.

Yeah, I agree with that.

> In general, there's an awful lot of postgres_fdw's behavior that depends
> on the assumption that the remote and local servers are pretty
> interchangeable, not least that we take the remote's cost numbers at
> face value when preparing cost numbers for a foreign scan.  The only
> consideration that we really can reliably optimize push-down choices with
> is trying to reduce the volume of data transmitted, and we do that without
> consideration for whether the operations pushed across might take more or
> less time when run on the other server.  Maybe someday that could be
> improved, but it seems like a nontrivial research project involving a
> lot more moving parts than just this point.

I agree with all of that- but it seems like we've got people asking for
it, and I can understand why they are, and therefore I'd be inclined to
provide a way for users to get that behavior and I'm even inclined to
say that pushing down should be the 'default' if everything else is
equal and it's possible to do so.  Figuring out a way to decide which is
likely to be better based on stats and other information could then be a
research project to improve on the simple heuristic.

Thanks,

Stephen

Attachment: signature.asc
Description: PGP signature


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux