Re: dbt-2 tuning results with postgresql-8.3.5

"Mark Wong" <markwkm@xxxxxxxxx> · Mon, 22 Dec 2008 00:50:20 -0800

On Sun, Dec 21, 2008 at 10:56 PM, Gregory Stark <stark@xxxxxxxxxxxxxxxx> wrote:
> Mark Wong <markwkm@xxxxxxxxx> writes:
>
>> On Dec 20, 2008, at 5:33 PM, Gregory Stark wrote:
>>
>>> "Mark Wong" <markwkm@xxxxxxxxx> writes:
>>>
>>>> To recap, dbt2 is a fair-use derivative of the TPC-C benchmark.  We
>>>> are using a 1000 warehouse database, which amounts to about 100GB of
>>>> raw text data.
>>>
>>> Really? Do you get conforming results with 1,000 warehouses? What's  the 95th
>>> percentile response time?
>>
>> No, the results are not conforming.  You and others have pointed that  out
>> already.  The 95th percentile response time are calculated on each  page of the
>> previous links.
>
> Where exactly? Maybe I'm blind but I don't see them.

Here's an example:

http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/

The links on the blog entries should be pointing to their respective
reports.  I spot checked a few and it seems I got some right.  I
probably didn't make it clear you needed to click on the results to
see the reports.

>> I find your questions a little odd for the input I'm asking for.  Are  you
>> under the impression we are trying to publish benchmarking  results?  Perhaps
>> this is a simple misunderstanding?
>
> Hm, perhaps. The "conventional" way to run TPC-C is to run it with larger and
> larger scale factors until you find out the largest scale factor you can get a
> conformant result at. In other words the scale factor is an output, not an
> input variable.
>
> You're using TPC-C just as an example workload and looking to see how to
> maximize the TPM for a given scale factor. I guess there's nothing wrong with
> that as long as everyone realizes it's not a TPC-C benchmark.

Perhaps, but we're not trying to run a TPC-C benchmark.  We're trying
to illustrate how performance changes with an understood OLTP
workload.  The purpose is to show how the system bahaves more so than
what the maximum transactions are.  We try to advertise the kit the
and work for self learning, we never try to pass dbt-2 off as a
benchmarking kit.

> Except that if the 95th percentile response times are well above a second I
> have to wonder whether the situation reflects an actual production OLTP system
> well. It implies there are so many concurrent sessions that any given query is
> being context switched out for seconds at a time.
>
> I have to imagine that a real production system would consider the system
> overloaded as soon as queries start taking significantly longer than they take
> on an unloaded system. People monitor the service wait times and queue depths
> for i/o systems closely and having several seconds of wait time is a highly
> abnormal situation.

We attempt to illustrate the response times on the reports.  For
example, there is a histogram (drawn as a scatter plot) illustrating
the number of transactions vs. the response time for each transaction.
 This is for the New Order transaction:

http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/dist_n.png

We also plot the response time for a transaction vs the elapsed time
(also as a scatter plot).  Again, this is for the New Order
transaction:

http://207.173.203.223/~markwkm/community6/dbt2/baseline.1000.1/report/rt_n.png

> I'm not sure how bad that is for the benchmarks. The only effect that comes to
> mind is that it might exaggerate the effects of some i/o intensive operations
> that under normal conditions might not cause any noticeable impact like wal
> log file switches or even checkpoints.

I'm not sure I'm following.  Is this something than can be shown by
any stats collection or profiling?  This vaguely reminds me of the the
significant spikes in system time (and dips everywhere else) when the
operating system is fsyncing during a checkpoint that we've always
observed when running this in the past.

> If you have a good i/o controller it might confuse your results a bit when
> you're comparing random and sequential i/o because the controller might be
> able to sort requests by physical position better than in a typical oltp
> environment where the wait queues are too short to effectively do that.

Thanks for the input.

Regards,
Mark

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance