On Mon, Feb 4, 2019 at 6:52 PM Jakub Glapa <jakub.glapa@xxxxxxxxx> wrote: > I see the error showing up every night on 2 different servers. But it's a bit of a heisenbug because If I go there now it won't be reproducible. Huh. Ok well that's a lot more frequent that I thought. Is it always the same query? Any chance you can get the plan? Are there more things going on on the server, like perhaps concurrent parallel queries? > It was suggested by Justin Pryzby that I recompile pg src with his patch that would cause a coredump. Small correction to Justin's suggestion: don't abort() after elog(ERROR, ...), it'll never be reached. > But I don't feel comfortable doing this especially if I would have to run this with prod data. > My question is. Can I do anything like increasing logging level or enable some additional options? > It's a production server but I'm willing to sacrifice a bit of it's performance if that would help. If you're able to run a throwaway copy of your production database on another system that you don't have to worry about crashing, you could just replace ERROR with PANIC and run a high-speed loop of the query that crashed in product, or something. This might at least tell us whether it's reach that condition via something dereferencing a dsa_pointer or something manipulating the segment lists while allocating/freeing. In my own 100% unsuccessful attempts to reproduce this I was mostly running the same query (based on my guess at what ingredients are needed), but perhaps it requires a particular allocation pattern that will require more randomness to reach... hmm. -- Thomas Munro http://www.enterprisedb.com