Re: Parallel Scaling of a pgplsql problem

Venki Ramachandran <venki_ramachandran@xxxxxxxxx> · Wed, 25 Apr 2012 14:45:38 -0700 (PDT)

Replacing current_timestamp() with transaction_timestamp() and statement_timestamp() did not help!!!. 

My timestamp values are still the same. I can't believe this is not possible in PG. In
 oracle you can  use 'select sysdate from dual' and insert that value and you can see which sql statement or function is taking a long time during development mode. All I want to do is to find where my pgplsql code is spending its 11 seconds. The overall pseudo code is as follows:

while end_date >= start_date
	select some_columns INTO vars1 from tbl-1
 where emp_id = 'first emp';
	select some_columns INTO vars2 from tbl-1 where emp_id = 'second emp';
	Do some computation with vars1 and vars2;
	select
 calc_comp(emp_1, emp_2) into v_profit;
end while

I want to see for each select and each call to the function what the system date is (or something else) which will tell me where the 11 seconds is being spent. How do I do that?

Thanks, Venki

  From: Pavel Stehule <pavel.stehule@xxxxxxxxx>
 To: Venki Ramachandran <venki_ramachandran@xxxxxxxxx> 
Cc: Samuel Gendler <sgendler@xxxxxxxxxxxxxxxx>; "pgsql-performance@xxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxx> 
 Sent: Wednesday, April 25, 2012 2:26 PM
 Subject: Re:  Parallel Scaling of a pgplsql problem

Hello

http://www.postgresql.org/docs/8.1/static/functions-datetime.html

CURRENT_TIME and CURRENT_TIMESTAMP deliver values with time zone;
LOCALTIME and LOCALTIMESTAMP deliver values without time zone.

CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP can
optionally take a precision parameter, which causes the result to be
rounded to that many fractional digits in the seconds field. Without a
precision parameter, the result is given to the full available
precision.

Some examples:

SELECT CURRENT_TIME;
Result: 14:39:53.662522-05

SELECT CURRENT_DATE;
Result: 2001-12-23

SELECT CURRENT_TIMESTAMP;
Result: 2001-12-23 14:39:53.662522-05

SELECT CURRENT_TIMESTAMP(2);
Result: 2001-12-23 14:39:53.66-05

SELECT LOCALTIMESTAMP;
Result: 2001-12-23 14:39:53.662522

Since these functions return the start time of the current
transaction, their values do not
 change during the transaction. This
is considered a feature: the intent is to allow a single transaction
to have a consistent notion of the "current" time, so that multiple
modifications within the same transaction bear the same time stamp.

    Note: Other database systems might advance these values more frequently.

PostgreSQL also provides functions that return the start time of the
current statement, as well as the actual current time at the instant
the function is called. The complete list of non-SQL-standard time
functions is:

transaction_timestamp()
statement_timestamp()

Regards

Pavel Stehule

2012/4/25 Venki Ramachandran <venki_ramachandran@xxxxxxxxx>:
> Another question (probably a silly mistake) while debugging this problem:
> I put in insert statements into the
 pgplsql code to collect the
> current_timestamp and see where the code was spending most of it time....The
> output is as follows:
>
>                                                                     log_text
>                                                                     |
>  insertdatetime
> -------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------
>  INPUT
>
 VARIABLES,Src,emp_id_1,emp_id_100,StartDate,2012-01-01,EndDate,2012-02-01,Tou,DOnPk,CRR
> Type,ptp-obligations,MinPositivePirce,1.00 | 2012-04-25 20:56:42.691965+00
>  Starting get_DAM_Value function
>
>              | 2012-04-25 20:56:42.691965+00
>  StartDate,2012-01-01 01:00:00,EndDate,2012-02-02 00:00:00
>                                                                         |
> 2012-04-25 20:56:42.691965+00
>  StartHr,vSrcprice,vSinkprice,vDiff,vAvgValue,vTotalDAMValue,vPkTotal,vOffPkTotal
>                                                  
              | 2012-04-25
> 20:56:42.691965+00
>  2012-01-01
> 01:00:00,18.52,16.15,-2.370000,-2.370000,0.000000,0.000000,-2.370000
>                                                         | 2012-04-25
> 20:56:42.691965+00
>  2012-01-01
> 02:00:00,17.22,15.60,-1.620000,-1.620000,0.000000,0.000000,-3.990000
>                                                         | 2012-04-25
> 20:56:42.691965+00
>  2012-01-01
> 03:00:00,18.22,17.55,-0.670000,-0.670000,0.000000,0.000000,-4.660000
>                      
                                   | 2012-04-25
> 20:56:42.691965+00
>  2012-01-01
> 04:00:00,18.53,18.13,-0.400000,-0.400000,0.000000,0.000000,-5.060000
>                                                         | 2012-04-25
> 20:56:42.691965+00
>  ...
>  ...
>  ...
>  ...
>  2012-02-02
> 00:00:00,2,17.13,17.13,0.000000,0.000000,0.000000,-7940.250000,-5216.290000
>                                                          | 2012-04-25
> 20:56:42.691965+00
>
  2012-02-02
> 00:00:00,3,16.54,16.54,0.000000,0.000000,0.000000,-7940.250000,-5216.290000
>                                                          | 2012-04-25
> 20:56:42.691965+00
>  2012-02-02
> 00:00:00,4,16.27,16.27,0.000000,0.000000,0.000000,-7940.250000,-5216.290000
>                                                          | 2012-04-25
> 20:56:42.691965+00
>  TotalPEAKMVal,-7940.250000,NumIntervals,2034,AvgRTMVAlue,-3.903761
>                                                
                      |
> 2012-04-25 20:56:42.691965+00
>  OUTPUT
> VARIABLES,AvgPurValue,-2.84,AvgSaleValue,-3.90,Profit,-1.06,ProfitPercentage,-106.00
>                                                     | 2012-04-25
> 20:56:42.691965+00
> (3832 rows)
>
> Why doesn't the current_timestamp value change within the pgpsql code? For
> every select from a table to compute, I am inserting into my debug_log dummy
> table. For all 3832 rows I have the same current_timestamp value which I was
> hoping to get by the following insert statement:
> insert into debug_log ('some log text', current_timestamp);
>
> But when I run this procedure from the psql command line, it takes 11
>
 seconds....I gave a bigger date range than what I stated as (40 ms) in my
> first post hence the change to 11 seconds.
>
> dev=> select
> calc_comp('emp_id_1','emp_id_100','DAM-RTM',to_date('2012-01-01','yyyy-mm-dd'),to_date('2012-02-01','yyyy-mm-dd'),'DOnPk','ptp-obligations',1.00,
> 0);
>          crr_valuation
> -------------------------------
>  0||-2.84|-3.90|-1.06|-106.00|
> (1 row)
>
> Time: 11685.828 ms.
>
> The last input value 1/0 while calling the above function controls, if any
> log line messages should be inserted or not, 0=insert, 1 = do_not_insert.
> When I toggle the flag the overall timing did not change. Does it not some
> time in ms to write 3832 rows to a table?
> Why is my current_timestamp value not changing. I was expecting the
> difference between the last row's timestamp value MINUS the
 first row's
> tiemstamp value to equal my 11.685 seconds. What is wrong here?
>
>
> -Venki
>
> ________________________________
> From: Samuel Gendler <sgendler@xxxxxxxxxxxxxxxx>
>
> To: Venki Ramachandran <venki_ramachandran@xxxxxxxxx>
> Cc: "pgsql-performance@xxxxxxxxxxxxxx" <pgsql-performance@xxxxxxxxxxxxxx>
> Sent: Wednesday, April 25, 2012 12:36 PM
>
> Subject: Re:  Parallel Scaling of a pgplsql problem
>
>
>
> On Wed, Apr 25, 2012 at
 11:52 AM, Venki Ramachandran
> <venki_ramachandran@xxxxxxxxx> wrote:
>
> Hi all:
> Can someone please guide me as to how to solve this problem? If this is the
> wrong forum, please let me know which one to post this one in. I am new to
> Postgres (about 3 months into it)
>
> I have PostGres 9.0 database in a AWS server (x-large) and a pgplsql program
> that does some computation. It takes in a date range and for one pair of
> personnel (two employees in a company) it calculates some values over the
> time period. It takes about 40ms (milli seconds) to complete and give me the
> answer. All good so far.
>
> Now I have to run the same pgplsql on all possible combinations of employees
> and with 542 employees that is about say 300,000 unique pairs.
>
> So
 (300000 * 40)/(1000 * 60 * 60) = 3.33 hours and I have to rank them and
> show it on a screen. No user wants to wait for 3 hours,  they can probably
> wait for 10 minutes (even that is too much for a UI application). How do I
> solve this scaling problem? Can I have multiple parellel sessions and each
> session have multiple/processes that do a pair each at 40 ms and then
> collate the results. Does PostGres or pgplsql have any parallel computing
> capability.
>
>
> The question is, how much of that 40ms is spent performing the calculation,
> how much is spent querying, and how much is function call overhead, and how
> much is round trip between the client and server with the query and results?
>  Depending upon the breakdown, it is entirely possible that the actual
> per-record multiplier can be kept down to a couple of milliseconds if you
> restructure things
 to query data in bulk and only call a single function to
> do the work.  If you get it down to 4ms, that's a 20 minute query.  Get it
> down to 1ms and you're looking at only 5 minutes for what would appear to be
> a fairly compute-intensive report over a relatively large dataset.
>
>
>
>