Re: using dbt2 postgresql 8.4 - rampup time issue

Mark Wong <markwkm@xxxxxxxxx> · Tue, 6 Jul 2010 17:35:43 -0700

On Mon, Jul 5, 2010 at 10:24 AM, MUHAMMAD ASIF <anaeem.it@xxxxxxxxxxx> wrote:
>> A clarification of terms may help to start. The "terminals per
>> warehouse" in the scripts correlates to the number terminals emulated.
>> An emulated terminal is tied to a warehouse's district. In other
>> words, the number of terminals translates to the number of districts
>> in a warehouse across the entire database. To increase the terminals
>> per warehouse implies you have scaled the database differently, which
>> I'm assuming is not the case here.
>>
>
> Scale the database … Can you please elaborate ? . To increase  "terminals
> per warehouse"  I added only one option ( i.e. "-t" for dbt2-run-workload )
> with normal dbt2 test i.e.
>
>         ./dbt2-pgsql-create-db
>         ./dbt2-pgsql-build-db -d $DBDATA -g -r -w $WAREHOUSES
>         ./dbt2-run-workload -a pgsql -c $DB_CONNECTIONS -d
> $REGRESS_DURATION_SEC -w $WAREHOUSES -o $OUTPUT_DIR -t
> $TERMINAL_PER_WAREHOUSE
>         ./dbt2-pgsql-stop-db
>
> Is this change enough or I am missing some thing ?

This isn't a trivial question even though at face value I do
understand that you want to see what the performance of postgres is on
64-bit linux.  This kit is complex enough where the answer it "it
depends".  If you want to increase the workload following
specification guidelines, then I think you need to understand the
specification referenced above better.  To best use this kit does
involve a fair amount of understanding of the TPC-C specification.  If
you just want to increase the load on the database system there are
several ways to do it.  You can use the '-n' flag for the
dbt2-run-workload so that all database transactions are run
immediately after each other.  If you build the database to a larger
scale factor (using TPC terminology) by increasing the warehouses,
then the scripts will appropriately scale the workload.  Tweaking the
-t flag would be a more advanced method that requires a better
understand of the specification.

Perhaps some more familiarity with the TPC-C specification would help here:

http://www.tpc.org/tpcc/spec/tpcc_current.pdf

Clause 4.1 discusses the scaling rules for sizing the database.
Unfortunately that clause may not directly clarify things for you.
The other thing to understand is that the dbt2 scripts allow you to
break the specification guidelines in some ways, and not in others.  I
don't know how to better explain it.  The database was built one way,
and you told the scripts to run the programs in a way that asked for
data that doesn't exist.

>> > 1.
>> > Settings :
>> >     DATABASE CONNECTIONS: 50
>> >     TERMINALS PER WAREHOUSE: 10
>> >     SCALE FACTOR (WAREHOUSES): 200
>> >     DURATION OF TEST (in sec): 7200
>> > Result :
>> >                              Response Time (s)
>> >      Transaction      %    Average :    90th %        Total
>> > Rollbacks      %
>> >     ------------  -----  ---------------------  -----------
>> > ---------------  -----
>> >         Delivery   3.96      0.285 :     0.023        26883
>> > 0   0.00
>> >        New Order  45.26      0.360 :     0.010       307335
>> > 3082   1.01
>> >     Order Status   3.98      0.238 :     0.003        27059
>> > 0   0.00
>> >          Payment  42.82      0.233 :     0.003       290802
>> > 0   0.00
>> >      Stock Level   3.97      0.245 :     0.002        26970
>> > 0   0.00
>> >     ------------  -----  ---------------------  -----------
>> > ---------------  -----
>> >
>> >     2508.36 new-order transactions per minute (NOTPM)
>> >     120.1 minute duration
>> >     0 total unknown errors
>> >     2000 second(s) ramping up
>> >
>> > 2.
>> > Settings :
>> >     DATABASE CONNECTIONS: 50
>> >     TERMINALS PER WAREHOUSE: 40
>> >     SCALE FACTOR (WAREHOUSES): 200
>> >     DURATION OF TEST (in sec): 7200
>> > Result :
>> >                              Response Time (s)
>> >      Transaction      %    Average :    90th %        Total
>> > Rollbacks      %
>> >     ------------  -----  ---------------------  -----------
>> > ---------------  -----
>> >         Delivery   3.95      8.123 :     4.605        43672
>> > 0   0.00
>> >        New Order  45.19     12.205 :     2.563       499356
>> > 4933   1.00
>> >     Order Status   4.00      7.385 :     3.314        44175
>> > 0   0.00
>> >          Payment  42.89      7.221 :     1.920       473912
>> > 0   0.00
>> >      Stock Level   3.97      7.093 :     1.887        43868
>> > 0   0.00
>> >     ------------  -----  ---------------------  -----------
>> > ---------------  -----
>> >
>> >     7009.40 new-order transactions per minute (NOTPM)
>> >     69.8 minute duration
>> >     0 total unknown errors
>> >     8016 second(s) ramping up
>> >
>
> 8016 (actual rampup time) + ( 69.8 * 60 ) = 12204
> 5010 (estimated rampup time) + 7200 (estimated steady state time) = 12210

I can see where you're pulling numbers from, but I'm having trouble
understanding what correlation you are trying to make.

>> > 3.
>> > Settings :
>> >     DATABASE CONNECTIONS: 50
>> >     TERMINALS PER WAREHOUSE: 40
>> >     SCALE FACTOR (WAREHOUSES): 200
>> >     DURATION OF TEST (in sec): 7200
>> > Result :
>> >                              Response Time (s)
>> >      Transaction      %    Average :    90th %        Total
>> > Rollbacks      %
>> >     ------------  -----  ---------------------  -----------
>> > ---------------  -----
>> >         Delivery   3.98      9.095 :    16.103        15234
>> > 0   0.00
>> >        New Order  45.33      7.896 :    14.794       173539
>> > 1661   0.97
>> >     Order Status   3.96      8.165 :    13.989        15156
>> > 0   0.00
>> >          Payment  42.76      7.295 :    12.470       163726
>> > 0   0.00
>> >      Stock Level   3.97      7.198 :    12.520        15198
>> > 0   0.00
>> >     ------------  -----  ---------------------  -----------
>> > ---------------  -----
>> >
>> >     10432.09 new-order transactions per minute (NOTPM)
>> >     16.3 minute duration
>> >     0 total unknown errors
>> >     11227 second(s) ramping up
>
> 11227 (actual rampup time) + ( 16.3 * 60 ) = 12205
> 5010 (estimated rampup time) + 7200 (estimated steady state time) = 12210

Ditto.

>> >
>> > These results show that dbt2 test actually did not run for 2 hours but
>> > it
>> > start varying with the increase of  "TERMINALS PER WAREHOUSE" value i.e.
>> > 1st
>> > Run ( 120.1 minute duration ), 2nd Run (69.8 minute duration) and 3rd
>> > Run
>> > (16.3 minute duration).
>>
>> The ramp up times are actually as expected (explained below). What
>> you are witnessing is more likely that the driver is crashing because
>> the values are out of range from the scale of the database. You have
>> effectively told the driver that there are more than 10 districts per
>> warehouse, and have likely not built the database that way. I'm
>> actually surprised the driver actually ramped up completely.
>>
>
> I run the dbt2 test with the following configuration i.e.
>
>     WAREHOUSES=100
>     DB_CONNECTIONS=20
>     REGRESS_DURATION=7200 #HOURS
>     TERMINAL_PER_WAREHOUSE=32
>
>     Or
>
>     WAREHOUSES=100
>     DB_CONNECTIONS=20
>     REGRESS_DURATION=7200 #HOURS
>     TERMINAL_PER_WAREHOUSE=40
>
>     Or
>
>     WAREHOUSES=100
>     DB_CONNECTIONS=20
>     REGRESS_DURATION=7200 #HOURS
>     TERMINAL_PER_WAREHOUSE=56
>
> I always end up estimate the same rampup timei.e.
>
>     estimated rampup time: Sleeping 5010 seconds
>     estimated steady state time: Sleeping 7200 seconds
>
> It means it expects thats rampup time will be able to complete in 5010
> seconds and wait for 501 (Stage 1. Starting up client) +  5010 (estimated
> rampup time) + 7200 (estimated steady state time) seconds to complete the
> test and then kill dbt2-driver and dbt2-client and generate report etc.

Sorry, I used "estimate" to mean "in a perfect world, it will be
exactly this time".  The reality is that it will be no sooner than the
calculated values.

> Rampup time is increasing with the increase in TERMINAL_PER_WAREHOUSE but on
> the other end dbt2 estimated time (501+5010+7200 seconds) is not increasing
> and rampup time end up consuming stread state time.. ( There is no process
> crash observed in any dbt2 or postgresql related process )
> To sync up the dbt2-run-workload script with rampup time, it now checks
> mix.log.

Again, I don't think this will be clear without understanding the
scaling rules in the TPC-C specification.  I can reiterate that
TERMINAL_PER_WAREHOUSE tells the scripts how to run the test, now how
to build the database.  Perhaps that is part of the confusion?

>> > To fix and sync with the rampup time, I have made a minor change in the
>> > dbt2-run-workload script i.e.
>> >
>> >     --- dbt2-run-workload      2010-07-02 08:18:06.000000000 -0400
>> >     +++ dbt2-run-workload   2010-07-02 08:20:11.000000000 -0400
>> >     @@ -625,7 +625,11 @@
>> >      done
>> >
>> >      echo -n "estimated rampup time: "
>> >     -do_sleep $SLEEP_RAMPUP
>> >     +#do_sleep $SLEEP_RAMPUP
>> >     +while ! grep START ${DRIVER_OUTPUT_DIR}/*/mix.log ; do
>> >     +       sleep 1
>> >     +done
>> >     +date
>> >      echo "estimated rampup time has elapsed"
>> >
>> >      # Clear the readprofile data after the driver ramps up.
>> >
>> > What is rempup time ? And what do you think about the patch?. Can you
>> > please
>> > guide me?. Thanks.
>>
>> The ramp up time is supposed to be the multiplication of the terminals
>> per warehouse, the number of warehouses with the sleep time between
>> the creation of each terminal. The only problem with your patch is
>> that the latest scripts (in the source code repo) breaks out the
>> client load into multiple instances of the driver program. Thus there
>> is a log file per instance of the driver so your patch work work as
>> is. Well, and there is that the ramp up calculation doesn't appear to
>> be broken. ;)
>
> It seems that a driver handles upto 500 warehouses and there will be more
> drivers if warehouse # is greater than this i.e.
>     W_CHUNK=500  #(default)

It's kludgey, I can't offer any better excuse for the lack of clarity
here, but I think you have the general idea.  I don't think what I
intended to do here was done very well.

> I have some other question too.
>  >How I can get maximum TPM value for postgresql ?, what dbt2 parameters I
> should play with ?

Unfortunately I must give you a non-answer here.  The kit is designed
to be used as a tool to stress the system to characterize the system
for development, so I can't answer how to get the maximum TPM value.
Getting the maximum TPM value isn't a indicator for how well the
system is performing because there are many way to inflate that value
without stressing the system in a meaningful way.  The TPM values are
only helpful as a gage for measuring changes to the system.

Regards,
Mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance