2016-09-12 15:16 GMT-03:00 Merlin Moncure <mmoncure@xxxxxxxxx>:
On Mon, Sep 12, 2016 at 9:03 AM, Vinicius Segalin <vinisegalin@xxxxxxxxx> wrote:
> Hi everyone,
>
> I'm trying to find a way to predict query runtime (I don't need to be
> extremely precise). I've been reading some papers about it, and people are
> using machine learning to do so. For the feature vector, they use what the
> DBMS's query planner provide, such as operators and their cost. The thing is
> that I haven't found any work using PostgreSQL, so I'm struggling to adapt
> it.
> My question is if anyone is aware of a work that uses machine learning and
> PostgreSQL to predict query runtime, or maybe some other method to perform
> this.
Well, postgres estimates the query runtime in the form of an expected
'cost', where the cost is an arbitrary measure based on time
complexity of query plan. It shouldn't be too difficult to correlate
estimated cost to runtime cost.
That's what I though too. At least it makes sense, I guess. But sometimes logic doesn't work, so I think only giving it a try will say.
A statistical analysis of that
correlation would be incredibly useful work although generating sample
datasets would be a major challenge.
merlin
Indeed. I'm using TPC-B along with pgbench to have some data to test (while I don't have real data), but I'm having a hard time creating queries that give me (very) different performance results so I can train my ML algorithm.