Re: about multiprocessingmassdata

Merlin Moncure <mmoncure@xxxxxxxxx> · Mon, 9 Apr 2012 17:37:54 -0500

On Thu, Apr 5, 2012 at 9:47 AM, Tomas Vondra <tv@xxxxxxxx> wrote:
> On 5.4.2012 15:44, superman0920 wrote:
>> Sure, i will post that at tomorrow.
>>
>> Today I install PG and MySQL at a  Server. I insert 850000 rows record
>> to each db.
>> I execute "select count(*) from poi_all_new" at two db.
>> MySQL takes 0.9s
>> PG takes 364s
>
> First of all, keep the list (pgsql-performance@xxxxxxxxxxxxxx) on the
> CC. You keep responding to me directly, therefore others can't respond
> to your messages (and help you).
>
> Are you sure the comparison was fair, i.e. both machines containing the
> same amount of data (not number of rows, amount of data), configured
> properly etc.? Have you used the same table structure (how did you
> represent geometry data type in MySQL)?
>
> For example I bet you're using MyISAM. In that case, it's comparing
> apples to oranges (or maybe cats, so different it is). MyISAM does not
> do any MVCC stuff (visibility checking, ...) and simply reads the number
> of rows from a catalogue. PostgreSQL actually has to scan the whole
> table - that's a big difference. This is probably the only place where
> MySQL (with MyISAM beats PostgreSQL). But once you switch to a proper
> storage manager (e.g. InnoDB) it'll have to scan the data just like
> PostgreSQL - try that.
>
> Anyway, this benchmark is rubbish because you're not going to do this
> query often - use queries that actually make sense for the application.
>
> Nevertheless, it seems there's something seriously wrong with your
> machine or the environment (OS), probably I/O.
>
> I've done a quick test - I've created the table (without the 'geometry'
> column because I don't have postgis installed), filled it with one
> million of rows and executed 'select count(*)'. See this:
>
>    http://pastebin.com/42cAcCqu
>
> This is what I get:
>
> ======================================================================
> test=# SELECT pg_size_pretty(pg_relation_size('test_table'));
>  pg_size_pretty
> ----------------
>  1302 MB
> (1 row)
>
> test=#
> test=# \timing on
> Timing is on.
> test=#
> test=# SELECT count(*) from test_table;
>  count
> ---------
>  1000000
> (1 row)
>
> Time: 2026,695 ms
> ======================================================================
>
> so it's running the 'count(*)' in two seconds. If I run it again, I get
> this:
>
> ======================================================================
> test=# SELECT count(*) from test_table;
>  count
> ---------
>  1000000
> (1 row)
>
> Time: 270,020 ms
> ======================================================================
>
> Yes, that's 0,27 seconds. And this is *only* my workstation - Core i5 (4
> cores), 8GB of RAM, nothing special.
>
> These results obviously depend on the data being available in page
> cache. If that's not the case, PostgreSQL needs to read them from the
> drive (and then it's basically i/o bound) - I can get about 250 MB/s
> from my drives, so I get this:
>
> ======================================================================
> test=# SELECT count(*) from test_table;
>  count
> ---------
>  1000000
> (1 row)
>
> Time: 5088,739 ms
> ======================================================================
>
> If you have slower drives, the dependency is about linear (half the
> speed -> twice the time). So either your drives are very slow, or
> there's something rotten.
>
> I still haven's seen iostat / vmstat output ... that'd tell us much more
> about the causes.

geometry column can potentially quite wide.  one thing we need to see
is the table has any indexes -- in particular gist/gin on the
geometry.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance