Search Postgresql Archives

Re: Growth planning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 4 Oct 2021, at 18:22, Israel Brewster <ijbrewster@xxxxxxxxxx> wrote:

(…)

> the script owner is taking about wanting to process and pull in “all the historical data we have access to”, which would go back several years, not to mention the probable desire to keep things running into the foreseeable future.

(…)

> - The largest SELECT workflow currently is a script that pulls all available data for ONE channel of each station (currently, I suspect that will change to all channels in the near future), and runs some post-processing machine learning algorithms on it. This script (written in R, if that makes a difference) currently takes around half an hour to run, and is run once every four hours. I would estimate about 50% of the run time is data retrieval and the rest doing its own thing. I am only responsible for integrating this script with the database, what it does with the data (and therefore how long that takes, as well as what data is needed), is up to my colleague. I have this script running on the same machine as the DB to minimize data transfer times.

I suspect that a large portion of time is spent on downloading this data to the R script, would it help to rewrite it in PL/R and do (part of) the ML calculations at the DB side?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.







[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux