Re: H800 + md1200 Performance problem

Merlin Moncure <mmoncure@xxxxxxxxx> · Tue, 3 Apr 2012 08:11:32 -0500

On Tue, Apr 3, 2012 at 7:20 AM, Cesar Martin <cmartinp@xxxxxxxxx> wrote:
> Hello there,
>
> I am having performance problem with new DELL server. Actually I have this
> two servers
>
> Server A (old - production)
> -----------------
> 2xCPU Six-Core AMD Opteron 2439 SE
> 64GB RAM
> Raid controller Perc6 512MB cache NV
>   - 2 HD 146GB SAS 15Krpm RAID1 (SO Centos 5.4 y pg_xlog) (XFS no barriers)
>   - 6 HD 300GB SAS 15Krpm RAID10 (DB Postgres 8.3.9) (XFS no barriers)
>
> Server B (new)
> ------------------
> 2xCPU 16 Core AMD Opteron 6282 SE
> 64GB RAM
> Raid controller H700 1GB cache NV
>   - 2HD 74GB SAS 15Krpm RAID1 stripe 16k (SO Centos 6.2)
>   - 4HD 146GB SAS 15Krpm RAID10 stripe 16k XFS (pg_xlog) (ext4 bs 4096, no
> barriers)
> Raid controller H800 1GB cache nv
>   - MD1200 12HD 300GB SAS 15Krpm RAID10 stripe 256k (DB Postgres 8.3.18)
> (ext4 bs 4096, stride 64, stripe-width 384, no barriers)
>
> Postgres DB is the same in both servers. This DB has 170GB size with some
> tables partitioned by date with a trigger. In both shared_buffers,
> checkpoint_segments... settings are similar because RAM is similar.
>
> I supposed that, new server had to be faster than old, because have more
> disk in RAID10 and two RAID controllers with more cache memory, but really
> I'm not obtaining the expected results
>
> For example this query:
>
> EXPLAIN ANALYZE SELECT c.id AS c__id, c.fk_news_id AS c__fk_news_id,
> c.fk_news_group_id AS c__fk_news_group_id, c.fk_company_id AS
> c__fk_company_id, c.import_date AS c__import_date, c.highlight AS
> c__highlight, c.status AS c__status, c.ord AS c__ord, c.news_date AS
> c__news_date, c.fk_media_id AS c__fk_media_id, c.title AS c__title,
> c.search_title_idx AS c__search_title_idx, c.stored AS c__stored, c.tono AS
> c__tono, c.media_type AS c__media_type, c.fk_editions_news_id AS
> c__fk_editions_news_id, c.dossier_selected AS c__dossier_selected,
> c.update_stats AS c__update_stats, c.url_news AS c__url_news, c.url_image AS
> c__url_image, m.id AS m__id, m.name AS m__name, m.media_type AS
> m__media_type, m.media_code AS m__media_code, m.fk_data_source_id AS
> m__fk_data_source_id, m.language_iso AS m__language_iso, m.country_iso AS
> m__country_iso, m.region_iso AS m__region_iso, m.subregion_iso AS
> m__subregion_iso, m.media_code_temp AS m__media_code_temp, m.url AS m__url,
> m.current_rank AS m__current_rank, m.typologyid AS m__typologyid,
> m.fk_platform_id AS m__fk_platform_id, m.page_views_per_day AS
> m__page_views_per_day, m.audience AS m__audience, m.last_stats_update AS
> m__last_stats_update, n.id AS n__id, n.fk_media_id AS n__fk_media_id,
> n.fk_news_media_id AS n__fk_news_media_id, n.fk_data_source_id AS
> n__fk_data_source_id, n.news_code AS n__news_code, n.title AS n__title,
> n.searchfull_idx AS n__searchfull_idx, n.news_date AS n__news_date,
> n.economical_value AS n__economical_value, n.audience AS n__audience,
> n.media_type AS n__media_type, n.url_news AS n__url_news, n.url_news_old AS
> n__url_news_old, n.url_image AS n__url_image, n.typologyid AS n__typologyid,
> n.author AS n__author, n.fk_platform_id AS n__fk_platform_id, n2.id AS
> n2__id, n2.name AS n2__name, n3.id AS n3__id, n3.name AS n3__name, f.id AS
> f__id, f.name AS f__name, n4.id AS n4__id, n4.opentext AS n4__opentext, i.id
> AS i__id, i.name AS i__name, i.ord AS i__ord, i2.id AS i2__id, i2.name AS
> i2__name FROM company_news_internet c LEFT JOIN media_internet m ON
> c.fk_media_id = m.id AND m.media_type = 4 LEFT JOIN news_internet n ON
> c.fk_news_id = n.id AND n.media_type = 4 LEFT JOIN news_media_internet n2 ON
> n.fk_news_media_id = n2.id AND n2.media_type = 4 LEFT JOIN
> news_group_internet n3 ON c.fk_news_group_id = n3.id AND n3.media_type = 4
> LEFT JOIN feed_internet f ON n3.fk_feed_id = f.id LEFT JOIN
> news_text_internet n4 ON c.fk_news_id = n4.fk_news_id AND n4.media_type = 4
> LEFT JOIN internet_typology i ON n.typologyid = i.id LEFT JOIN
> internet_media_platform i2 ON n.fk_platform_id = i2.id WHERE
> (c.fk_company_id = '16073' AND c.status <> '-3' AND n3.fk_feed_id = '30693'
> AND n3.status = '1' AND f.fk_company_id = '16073') AND n.typologyid IN ('6',
> '7', '1', '2', '3', '5', '4') AND c.id > '49764393' AND c.news_date >=
> '2012-04-02'::timestamp - INTERVAL '4 months' AND n.news_date >=
> '2012-04-02'::timestamp - INTERVAL '4 months' AND c.fk_news_group_id IN
> ('43475') AND (c.media_type = 4) ORDER BY c.news_date DESC, c.id DESC LIMIT
> 200
>
> Takes about 20 second in server A but in new server B takes 150 seconds...
> In EXPLAIN I have noticed that sequential scan on table news_internet_201112
> takes 2s:
>       ->  Seq Scan on news_internet_201112 n  (cost=0.00..119749.12
> rows=1406528 width=535) (actual time=0.046..2186.379 rows=1844831 loops=1)
>           Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without
> time zone) AND (media_type = 4) AND (typologyid = ANY
> ('{6,7,1,2,3,5,4}'::integer[])))
>
> While in Server B, takes 11s:
>       ->  Seq Scan on news_internet_201112 n  (cost=0.00..119520.12
> rows=1405093 width=482) (actual time=0.177..11783.621 rows=1844831 loops=1)
>           Filter: ((news_date >= '2011-12-02 00:00:00'::timestamp without
> time zone) AND (media_type = 4) AND (typologyid = ANY
> ('{6,7,1,2,3,5,4}'::integer[])))
>
> Is notorious that, while in server A, execution time vary only few second
> when I execute the same query repeated times, in server B, execution time
> fluctuates between 30 and 150 second despite the server dont have any
> client.
>
> In other example, when I query entire table, running twice the same query:
> Server 1
> ------------
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
>                                                                  QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..457010.37
> rows=6731337 width=318) (actual time=0.042..19665.155 rows=6731337 loops=1)
>  Total runtime: 20391.555 ms
> -
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
>                                                                  QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..457010.37
> rows=6731337 width=318) (actual time=0.012..2171.181 rows=6731337 loops=1)
>  Total runtime: 2831.028 ms
>
> Server 2
> ------------
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
>                                                                  QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..369577.79
> rows=6765779 width=323) (actual time=0.110..10010.443 rows=6765779 loops=1)
>  Total runtime: 11552.818 ms
> -
> EXPLAIN ANALYZE SELECT * from company_news_internet_201111 ;
>                                                                  QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------------------------------------
>  Seq Scan on company_news_internet_201111  (cost=0.00..369577.79
> rows=6765779 width=323) (actual time=0.023..8173.801 rows=6765779 loops=1)
>  Total runtime: 12939.717 ms
>
> It seems that Server B don cache the table¿?¿?
>
> I'm lost, I had tested different file systems, like XFS, stripe sizes... but
> I not have had results
>
> Any ideas that could be happen?
>
> Thanks a lot!!

That's a significant regression.  Probable hardware issue -- have you
run performance tests on it such as bonnie++?  dd?  What's iowait
during the scan?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance