On Sun, Apr 14, 2019 at 9:06 PM Gunther <raj@xxxxxxxx> wrote:
Thanks for looking at my problem Tom Lane and Jeff Janes. Sorry for not having given enough detail.
The version is 10.2 latest. The database was originally built with 10.1 and then just started with 10.2.
Do you mean 11.2? The latest in the 10 series is 10.7. If you do mean 10.2, there a fix for a memory leak bug since then that might plausibly be relevant (bdc7f686d1b8f423cb)
I said "crash" and that is wrong. Not a signal nor core dump. It is the ERROR: out of memory. Only the query crashes. Although I don't know if may be the backend server might have left a core dump?
I don't think there would be a core dump on only an ERROR, and probably not worthwhile to trick it into generating one.
The short version is:
Grand total: 1437014672 bytes in 168424 blocks; 11879744 free (3423 chunks); 1425134928 used 2019-04-14 16:38:26.355 UTC [11061] ERROR: out of memory 2019-04-14 16:38:26.355 UTC [11061] DETAIL: Failed on request of size 8272 in memory context "ExecutorState".
I don't know why a 8GB system with a lot of cache that could be evicted would get an OOM when something using 1.5GB asks for 8272 bytes more. But that is a question of how the kernel works, rather than how PostgreSQL works. But I also think the log you quote above belongs to a different event than the vmstat trace in your first email.
ExecutorState: 1416621920 total in 168098 blocks; 8494152 free (3102 chunks); 1408127768 used HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used HashBatchContext: 57432 total in 3 blocks; 16072 free (6 chunks); 41360 used
This does not seem to match your query plan. Why would a plan with no Hash Joins have a HashBatchContext? I think this log must be from a different query than the one that generated the plan you posted. Perhaps one was before you lowered work_mem and one was after?
Cheers,
Jeff