Search Postgresql Archives

Re: ERROR: too many dynamic shared memory segments

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Thomas, 
doubling the max_connection has the problem gone away for now! Yay!

As for the crash. I dug up the initial log and it looks like a segmentation fault...

2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: ERROR:  too many dynamic shared memory segments
2017-11-23 07:26:53 CET:192.168.10.83(35238):user@db:[30003]: STATEMENT:  SELECT ....
2017-11-23 07:26:55 CET::@:[29398]: LOG:  server process (PID 26992) was terminated by signal 11: Segmentation fault
2017-11-23 07:26:55 CET::@:[29398]: DETAIL:  Failed process was running: SELECT .....
2017-11-23 07:26:55 CET::@:[29398]: LOG:  terminating any other active server processes
2017-11-23 07:26:55 CET:192.168.10.83(35242):user@db:[30009]: WARNING:  terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35242):user@db:[30009]: DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(35242):user@db:[30009]: HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-11-23 07:26:55 CET:192.168.10.83(35268):user@db:[30097]: WARNING:  terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35268):user@db:[30097]: DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(35268):user@db:[30097]: HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-11-23 07:26:55 CET:192.168.10.83(46614):user@db:[31080]: WARNING:  terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(46614):user@db:[31080]: DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2017-11-23 07:26:55 CET:192.168.10.83(46614):user@db: HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2017-11-23 07:26:55 CET:192.168.10.83(35238):user@db:[30003]: WARNING:  terminating connection because of crash of another server process
2017-11-23 07:26:55 CET:192.168.10.83(35238):user@db:[30003]: DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory
2017-11-23 07:26:55 CET:192.168.10.83(35300):user@db:[30152]: FATAL:  the database system is in recovery mode






--
regards,
pozdrawiam,
Jakub Glapa

On Mon, Nov 27, 2017 at 7:53 PM, Thomas Munro <thomas.munro@xxxxxxxxxxxxxxxx> wrote:
On Tue, Nov 28, 2017 at 1:13 AM, Jakub Glapa <jakub.glapa@xxxxxxxxx> wrote:
> The queries are somehow special.
> We are still using the old style partitioning (list type) but we abuse it a
> bit when querying.
> When querying a set of partitions instead of doing it via parent table we
> stitch together the required tables with UNION ALL (this requires less
> locking) and was more performant in our benchmark (the new native
> partitioning might improve this but we didn't research that yet).
>
> The queries are in form of
> SELECT col1,col2,col3 FROM
> (SELECT *
>       FROM par1
>       WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
> '2017-11-26 23:59:59.999'
>       UNION ALL SELECT *
>                 FROM par2
>                 WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
> '2017-11-26 23:59:59.999'
>       UNION ALL SELECT *
>                 FROM  par2
>                 WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
> '2017-11-26 23:59:59.999'
>
>       WHERE rangestart >= '2017-01-01 00:00:00.0' AND rangeend <=
> '2017-11-26 23:59:59.999'
>
> UNION ALL
> ...
> ) unionalias
> WHERE ....
>
>
>
> and indeed the query planner shows the usage of Bitmap Heap Scan
> ....
>                                  ->  Subquery Scan on "*SELECT* 2"
> (cost=3068.58..19793.94 rows=1 width=1284)
>                                        ->  Gather  (cost=3068.58..19793.93
> rows=1 width=5014)
>                                              Workers Planned: 2
>                                              ->  Parallel Bitmap Heap Scan
> on par_6  (cost=2068.58..18793.83 rows=1 width=5014)
>                                                    Recheck Cond:
> <<CONDITION>>
>                                                    Filter: <<CONDITION>>
>                                                    ->  BitmapAnd
> (cost=2068.58..2068.58 rows=30955 width=0)
>                                                          ->  BitmapOr
> (cost=999.30..999.30 rows=42989 width=0)
>                                                                ->  Bitmap
> Index Scan on par_6_datasource  (cost=0.00..990.21 rows=42922 width=0)
>                                                                      Index
> Cond: ((datasource)::text = 'one'::text)
>                                                                ->  Bitmap
> Index Scan on par_6_datasource  (cost=0.00..4.30 rows=1 width=0)
>                                                                      Index
> Cond: ((datasource)::text = 'two'::text)
>                                                                ->  Bitmap
> Index Scan on par_6_datasource  (cost=0.00..4.79 rows=67 width=0)
>                                                                      Index
> Cond: ((datasource)::text = 'three'::text)
>                                                          ->  Bitmap Index
> Scan on par_6_rangestart  (cost=0.00..1069.02 rows=47564 width=0)
>                                                                Index Cond:
> (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone)
>                                  ->  Subquery Scan on "*SELECT* 3"
> (cost=761.33..7944.99 rows=1 width=1086)
>                                        ->  Bitmap Heap Scan on par_7
> (cost=761.33..7944.98 rows=1 width=4816)
>                                                    Recheck Cond:
> <<CONDITION>>
>                                                    Filter: <<CONDITION>>
>                                              ->  BitmapAnd
> (cost=761.33..761.33 rows=7045 width=0)
>                                                    ->  Bitmap Index Scan on
> par_7_rangestart  (cost=0.00..380.35 rows=14942 width=0)
>                                                          Index Cond:
> (rangestart >= '2017-01-01 00:00:00+01'::timestamp with time zone)
>                                                    ->  BitmapOr
> (cost=380.72..380.72 rows=12248 width=0)
>                                                          ->  Bitmap Index
> Scan on par_7_datasource  (cost=0.00..372.00 rows=12228 width=0)
>                                                                Index Cond:
> ((datasource)::text = 'one'::text)
>                                                          ->  Bitmap Index
> Scan on par_7_datasource  (cost=0.00..4.36 rows=10 width=0)
>                                                                Index Cond:
> ((datasource)::text = 'two'::text)
>                                                          ->  Bitmap Index
> Scan on par_7_datasource  (cost=0.00..4.36 rows=10 width=0)
>                                                                Index Cond:
> ((datasource)::text = 'three'::text)
>
> ....
>
>
> In this particular query there were over _100_ partitions connected with the
> UNION ALL operator.

Ah, so you have many Gather nodes under Append?  That's one way to eat
arbitrarily many DSM slots.  We allow for 64 + 2 * max_backends.  Does
it help if you increase max_connections?  I am concerned about the
crash failure mode you mentioned in the first email though: we should
always be able to handle that condition gracefully.

--
Thomas Munro
http://www.enterprisedb.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux