2016-09-12 18:22 GMT-03:00 Istvan Soos <istvan.soos@xxxxxxxxx>:
Hi Vinicius,
At Heap we have non-trivial complexity in our analytical queries, and
some of them can take a long time to complete. We did analyze features
like the query planner's output, our query properties (type,
parameters, complexity) and tried to automatically identify factors
that contribute the most into the total query time. It turns out that
you don't need to use machine learning for the basics, but at this
point we were not aiming for predictions yet.
And how did you do that? Manually analyzing some queries?
But I think, as you said, it wouldn't apply for predictions, but instead for making long queries run faster, right?
As a spoiler: queries take long time because they do a lot of IO.
Features like reachback depth and duration (e.g. what period is the
analytical query about) can contribute a lot to the amount of IO,
thus, the query time. I have a blog post in my queue about our
analysis, would gladly bump its priority if there is interest in such
details.
If it's not too much work, I would like to have some details on your process. It looks it's not exactly what I'm trying to do, but would certainly help me with my work.
I'm also curious: if you had a great way to predict the time/cost of
the queries, how would you use it?
I'm working on something for my master degree (it's the idea, and I really hope I can make it possible) where I'll help the user choosing the resources for the database that will give him the best performance (or at least the performance he thinks it's good enough). So the idea would be to train each machine (with different resources) and then be able to predict for an specific query what the performance would be.
Thank you all for the answers so far. I hope we can clear my mind about this issue.