Search Postgresql Archives

Re: Receiving many more rows than expected

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday 08 May 2014 06:30:39 Adrian Klaver wrote:
> On 05/08/2014 04:09 AM, Vincent de Phily wrote:
> > Hello,
> > 
> > I'm processing records in batches using PG 9.1.12, python 2.7, and psycopg
> 
> > 2.5.2 :
> Comments in the code below:
> > def enqueue_loop(q):
> >      curs = DB_HANDLER.cursor()
> >      query = """UPDATE foo SET processing = 't' WHERE id IN
> >      
> >             (SELECT id FROM foo WHERE processing = 'f' ORDER BY id ASC
> >             LIMIT
> >             
> >              5000 FOR UPDATE)  RETURNING *"""
> 
>         # Where is this query actually run?

Sorry, I edited my code too much, it's actually executed at the first line of 
the loop, I added it below.

> >      while DO_LOOP: #the whole program eventually stops if this is false
> 
>         # What cause DO_LOOP to go false?

Either when receiving a signal from the OS (registered with 
"signal.signal(signal.SIGINT, stop_main)") or when the topmost try-catch-
reinitialize-retry loop has caught a quick sucession of exceptions.

DO_LOOP is tested in a few places where we can make a clean exit. A cronjob 
will restart the process if it is not or badly running.

> >          curs.execute(query)
> >          results = curs.fetchall()
> >          rlen = len(results)
> >          
> >          if rlen > 0:
> >              LOG.debug("Fetched %d rows", rlen)
> 
>             # What do you see in LOG for rlen values?

The histogram shows a large amount of small values, progressively becoming 
rarer for bigger values, up to value 5000 which is very frequent again 
(depending on the day, between 0.5 and 5% of queries return the maximum number 
of rows), then a handfull freak values of ~9k, 11k, 15k, 33k, 46k, 107k, etc 
which cause my problems.

> >          if rlen == 0:
> >              # [...] wait for notification...
> >              continue
> >          
> >          # [...] Enqueue batch and let other threads process it.
> >          # [...] Those threads will eventually delete the processed rows
> >          from
> >          #       the foo table.
> > 
> > The problem is that sometimes (once every few days at about 2-300K queries
> > per day) I get many more rows than the max 5000 I asked for (I've seen up
> > to 25k). And I'm getting timeouts and other problems as a result.
> > 
> > The id column is your typical primary key integer with a unique index.
> > I've
> > checked the problematic cases and there are no id gaps or duplicate rows.
> > There are multiple threads in the program, but only the main thread is
> > running enqueue_loop(). I'm not sure if this is a server or a driver
> > issue.
> > 
> > 
> > Any idea ? Thanks.

-- 
Vincent de Phily



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux