Search Postgresql Archives

Re: MongoDB 3.2 beating Postgres 9.5.1?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Of course, I do not create GIN index.
Maybe the problem is related to checkpoint and WAL.
I don't know how to make the comparison with MongoDB fair enough.

                                                                                                                                                       


(C)Bitnine, Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : kskim@xxxxxxxxxxx
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192


2016-07-19 11:23 GMT+09:00 Sameer Kumar <sameer.kumar@xxxxxxxxxx>:


On Fri, 11 Mar 2016, 9:39 p.m. Paul Jones, <pbj@xxxxxxxxxx> wrote:
I have been running the EDB benchmark that compares Postgres and MongoDB.
I believe EDB ran it against PG 9.4 and Mongo 2.6.  I am running it
against PG 9.5.1 and Mongo 3.2 with WiredTiger storage using 10,000,000
JSON records generated by the benchmark.  It looks like Mongo is winning,
and apparently because of its cache management.

The first queries on both run in ~30 min.  And, once PG fills its cache,
it whips Mongo on repeats of the *same* query (vmstat shows no disk
reads for PG).

However, when different query on the same table is issued to both,
vmstat shows that PG has to read the *entire* table again, and it takes
~30 min.  Mongo does a lot of reads initially but after about 5 minutes,
it stops reading and completes the query, most likely because it is
using its cache very effectively.

Host:   Virtual Machine
        4 CPUs
        16 Gb RAM
        200 Gb Disk
        RHEL 6.6

PG:     9.5.1 compiled from source
        shared_buffers = 7GB
        effectve_cache_size = 12GB

Mongo:  3.2 installed with RPM from Mongo

In PG, I created the table by:

CREATE TABLE json_tables
(
        data    JSONB
);

After loading, it creates the index:

CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops);

This would create one GIN index which is going to be a bit larger than usual btree /n-tree index on a specific JSON field. And would be slower too. I suggest that you create an index on the specific _expression_ using JSON operators. In my opinion that index would be much more nearer to mongoDB indexes.



After a lot of experimentation, I discovered that the benchmark was not
using PG's index, so I modified the four queries to be:

SELECT data FROM json_tables WHERE data @> '{"brand": "ACME"}';
SELECT data FROM json_tables WHERE data @> '{"name": "Phone Service Basic Plan"}';
SELECT data FROM json_tables WHERE data @> '{"name": "AC3 Case Red"}';
SELECT data FROM json_tables WHERE data @> '{"type": "service"}';

Here are two consecutive explain analyze for PG, for the same query.
No functional difference in the plans that I can tell, but the effect
of PG's cache on the second is dramatic.

If anyone has ideas on how I can get PG to more effectively use the cache
for subsequent queries, I would love to hear them.

-------

benchmark=# explain analyze select data from json_tables where data @> '{"name": "AC3 Case Red"}';

                                                               QUERY PLAN


-----------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=10000 width=1261)
(actual time=2157.118..1259550.327 rows=909091 loops=1)
   Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
   Rows Removed by Index Recheck: 4360296
   Heap Blocks: exact=37031 lossy=872059
   ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00 rows=10000 width =0) (actual time=2141.250..2141.250 rows=909091 loops=1)
         Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
Planning time: 291.932 ms
Execution time: 1259886.920 ms
(8 rows)

Time: 1261191.844 ms

benchmark=# explain analyze select data from json_tables where data @> '{"name": "AC3 Case Red"}';
                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=10000 width=1261) (actual time=779.261..29815.262 rows=909091 loops=1)
   Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
   Rows Removed by Index Recheck: 4360296
   Heap Blocks: exact=37031 lossy=872059
   ->  Bitmap Index Scan on json_tables_idx  (cost=0.00..111.00 rows=10000 width =0) (actual time=769.081..769.081 rows=909091 loops=1)
         Index Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
Planning time: 33.967 ms
Execution time: 29869.381 ms

(8 rows)

Time: 29987.122 ms


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
--
Best Regards
Sameer Kumar | DB Solution Architect 
ASHNIK PTE. LTD.

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux