Re: MongoDB 3.2 beating Postgres 9.5.1?

Oleg Bartunov <obartunov@xxxxxxxxx> · Sun, 13 Mar 2016 21:30:17 +0300

On Mar 11, 2016 4:40 PM, "Paul Jones" <pbj@xxxxxxxxxx> wrote:

>

> I have been running the EDB benchmark that compares Postgres and MongoDB.

> I believe EDB ran it against PG 9.4 and Mongo 2.6.  I am running it

> against PG 9.5.1 and Mongo 3.2 with WiredTiger storage using 10,000,000

> JSON records generated by the benchmark.  It looks like Mongo is winning,

> and apparently because of its cache management.
Dmitry was working on the same benchmarks. I think edb benchmark is broken by design. Better,  use ycsb benchmarks. I hope, Dmitry will share his 

results.
>

> The first queries on both run in ~30 min.  And, once PG fills its cache,

> it whips Mongo on repeats of the *same* query (vmstat shows no disk

> reads for PG).

>

> However, when different query on the same table is issued to both,

> vmstat shows that PG has to read the *entire* table again, and it takes

> ~30 min.  Mongo does a lot of reads initially but after about 5 minutes,

> it stops reading and completes the query, most likely because it is

> using its cache very effectively.

>

> Host:   Virtual Machine

>         4 CPUs

>         16 Gb RAM

>         200 Gb Disk

>         RHEL 6.6

>

> PG:     9.5.1 compiled from source

>         shared_buffers = 7GB

>         effectve_cache_size = 12GB

>

> Mongo:  3.2 installed with RPM from Mongo

>

> In PG, I created the table by:

>

> CREATE TABLE json_tables

> (

>         data    JSONB

> );

>

> After loading, it creates the index:

>

> CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops);

>

> After a lot of experimentation, I discovered that the benchmark was not

> using PG's index, so I modified the four queries to be:

>

> SELECT data FROM json_tables WHERE data @> '{"brand": "ACME"}';

> SELECT data FROM json_tables WHERE data @> '{"name": "Phone Service Basic Plan"}';

> SELECT data FROM json_tables WHERE data @> '{"name": "AC3 Case Red"}';

> SELECT data FROM json_tables WHERE data @> '{"type": "service"}';

>

> Here are two consecutive explain analyze for PG, for the same query.

> No functional difference in the plans that I can tell, but the effect

> of PG's cache on the second is dramatic.

>

> If anyone has ideas on how I can get PG to more effectively use the cache

> for subsequent queries, I would love to hear them.

>

> -------

>

> benchmark=# explain analyze select data from json_tables where data @> '{"name": "AC3 Case Red"}';

>

>                                                                QUERY PLAN

>

>

> -----------------------------------------------------------------------------------------------------------------------------------------

>

> Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=10000 width=1261)

> (actual time=2157.118..1259550.327 rows=909091 loops=1)

>    Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)

>    Rows Removed by Index Recheck: 4360296

>    Heap Blocks: exact=37031 lossy=872059

>    ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00 rows=10000 width =0) (actual time=2141.250..2141.250 rows=909091 loops=1)

>          Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)

> Planning time: 291.932 ms

> Execution time: 1259886.920 ms

> (8 rows)

>

> Time: 1261191.844 ms

>

> benchmark=# explain analyze select data from json_tables where data @> '{"name": "AC3 Case Red"}';

>                                                               QUERY PLAN

>

> ---------------------------------------------------------------------------------------------------------------------------------------

>

> Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=10000 width=1261) (actual time=779.261..29815.262 rows=909091 loops=1)

>    Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)

>    Rows Removed by Index Recheck: 4360296

>    Heap Blocks: exact=37031 lossy=872059

>    ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00 rows=10000 width =0) (actual time=769.081..769.081 rows=909091 loops=1)

>          Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)

> Planning time: 33.967 ms

> Execution time: 29869.381 ms

>

> (8 rows)

>

> Time: 29987.122 ms

>

>

> --

> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)

> To make changes to your subscription:

> http://www.postgresql.org/mailpref/pgsql-general