On Thursday 08 May 2014 16:56:25 Adrian Klaver wrote: > On 05/08/2014 03:11 PM, Vincent de Phily wrote: > > On Thursday 08 May 2014 06:30:39 Adrian Klaver wrote: > >> On 05/08/2014 04:09 AM, Vincent de Phily wrote: > > The histogram shows a large amount of small values, progressively becoming > > rarer for bigger values, up to value 5000 which is very frequent again > > (depending on the day, between 0.5 and 5% of queries return the maximum > > number of rows), then a handfull freak values of ~9k, 11k, 15k, 33k, 46k, > > 107k, etc which cause my problems. > > So just to be clear this the value for rlen and that value and log > message only appear in the loop above? Yes, and the problematic query is only executed in this loop. For clarity's sake I had left out bits of the code concering the notification. In case it changes anything, this is the uncut (but still anonimized) function: def enqueue_loop(q): global DO_LOOP poll = select.poll() poll.register(DB_HANDLER.fileno(), select.POLLIN) curs = DB_HANDLER.cursor() query = """UPDATE foo SET processing = 't' WHERE id IN (SELECT id FROM foo WHERE processing = 'f' ORDER BY id ASC LIMIT %d FOR UPDATE) RETURNING *""" % (conf_getint('DEFAULT', 'push_count', 5000),) while DO_LOOP: # Fetch data and pop notifications. curs.execute(query) results = curs.fetchall() rlen = len(results) nlen = len(DB_HANDLER.notifies) while DB_HANDLER.notifies: DB_HANDLER.notifies.pop() if rlen > 0 or nlen > 0: LOG.debug("Fetched %d rows, poped %d notifs, %d batches in queue", rlen, nlen, q.qsize()) # If we get nothing, wait for notifications. Try again anyway once a # minute to make sure that connection is ok. if rlen == 0: LOG.debug("Waiting for a notification") poll.poll(60000) continue # Got a batch of rows, enqueue them. results.sort() # Sort is needed here (compares the 1st element of the # tuple) because we do not have the guarantee that # "UPDATE ... (SELECT ORDER BY ..) RETURNING *" returns # ordered rows. q.put(results) Before you grow suspicious of that conf_getint, the config is loaded once at program startup, and the overlarge results hapen together with normal results without having left the loop. Just in case I'm now logging the query string anyway; who knows... > >>> if rlen == 0: > >>> # [...] wait for notification... > >>> continue > >>> > >>> # [...] Enqueue batch and let other threads process it. > >>> # [...] Those threads will eventually delete the processed > >>> rows > >>> from > >>> # the foo table. > >>> > >>> The problem is that sometimes (once every few days at about 2-300K > >>> queries > >>> per day) I get many more rows than the max 5000 I asked for (I've seen > >>> up > >>> to 25k). And I'm getting timeouts and other problems as a result. > > And there is no instance of the UPDATE query that is unconstrained and a > code path to that query? > > Or as Sim suggested another copy of this code without the LIMIT? Again, no. I'm now logging the query string just in case, but I can't imagine that the value of the limit would change at runtime unless I have somehow managed to corrupt the memory of a pure python program. > Also what happens in the rlen == 0 case? See the unabridged code above. We wait up to 1 minute for a notification to arrive, and then we restart the loop. -- Vincent de Phily