Re: Out of Memory and Configuration Problems (Big Computer)

Tom Wilcox <hungrytom@xxxxxxxxxxxxxx> · Wed, 02 Jun 2010 10:14:26 +0100

I have now hit a new query that produces Out of memory errors in a 
similar way to the last ones. Can anyone please suggest why I might be 
getting this error and any way I can go about diagnosing or fixing it..

The error I get is:

ERROR: out of memory
SQL state: 53200
Detail: Failed on request of size 114.
Context: SQL statement "ANALYZE VERBOSE nlpg.match_data" PL/pgSQL 
function "mk_tbls_4e" line 8 at SQL .. (see log below)

The offending function is called using:

-- Modify NLPG matchdata (addr_str tssearch lookup field)
BEGIN;
SELECT nlpg.mk_tbls_4e();
COMMIT;

The function is:

CREATE OR REPLACE FUNCTION nlpg.mk_tbls_4e() RETURNS BOOLEAN AS $$
BEGIN
    ALTER TABLE nlpg.match_data ADD COLUMN tssearch_addr_str tsvector;
    UPDATE nlpg.match_data SET tssearch_addr_str = 
to_tsvector(meta_addr_str(addr_str));
    DROP INDEX IF EXISTS nlpg.index_match_data_tssearch_addr_str;
    CREATE INDEX index_match_data_tssearch_addr_str ON nlpg.match_data 
USING gin(tssearch_addr_str);
    ANALYZE VERBOSE nlpg.match_data;
    RETURN true;
END;
$$ LANGUAGE 'plpgsql';

Since the query failed on line 8: "ANALYZE VERBOSE nlpg.match_data" I 
hope you won't need to know much more about the inner workings of 
meta_addr_str. However, here they are (featuring the normalise function 
from earlier conversations):

CREATE OR REPLACE FUNCTION metaphoneExt(word text) RETURNS text AS $$
BEGIN
    IF is_alnum(word) THEN
        RETURN word;
    ELSE
        RETURN metaphone(word,100);
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Return a normalised metaphone-encoded string containing all the valid 
words for text searching
DROP FUNCTION IF EXISTS meta_addr_str(addr_str text) ;
CREATE OR REPLACE FUNCTION meta_addr_str(addr_str text) RETURNS text AS $$
DECLARE
    meta_addr_str text;
    meta_word text;
BEGIN
    meta_addr_str = '';
    FOR meta_word IN
        SELECT * FROM
        (
            SELECT
            metaphoneExt(
            regexp_split_to_table(
            regexp_replace(
            normalise(
                $1
            )
            ,'[^\\w]', ' ', 'g')
            , E'\\\s+')
            ) AS meta
        ) AS x
        WHERE meta IS NOT NULL and length(trim(meta))>0
    LOOP
        meta_addr_str = meta_addr_str || ' ' || COALESCE(meta_word,'');
    END LOOP;

    RETURN meta_addr_str;
END;
$$ LANGUAGE 'plpgsql';

Finally, here is the end of the log file where the error occurs:

...(more of the same above)...
2010-06-02 03:09:32 BSTHINT:  Consider increasing the configuration 
parameter "checkpoint_segments".
2010-06-02 03:09:41 BSTLOG:  checkpoints are occurring too frequently (9 
seconds apart)
2010-06-02 03:09:41 BSTHINT:  Consider increasing the configuration 
parameter "checkpoint_segments".
2010-06-02 03:09:49 BSTLOG:  checkpoints are occurring too frequently (8 
seconds apart)
2010-06-02 03:09:49 BSTHINT:  Consider increasing the configuration 
parameter "checkpoint_segments".
2010-06-02 03:09:56 BSTLOG:  checkpoints are occurring too frequently (7 
seconds apart)
2010-06-02 03:09:56 BSTHINT:  Consider increasing the configuration 
parameter "checkpoint_segments".
2010-06-02 03:10:03 BSTLOG:  checkpoints are occurring too frequently (7 
seconds apart)
2010-06-02 03:10:03 BSTHINT:  Consider increasing the configuration 
parameter "checkpoint_segments".
2010-06-02 03:10:09 BSTLOG:  checkpoints are occurring too frequently (6 
seconds apart)
2010-06-02 03:10:09 BSTHINT:  Consider increasing the configuration 
parameter "checkpoint_segments".
TopMemoryContext: 66200 total in 8 blocks; 4144 free (13 chunks); 62056 used
  PL/PgSQL function context: 8192 total in 1 blocks; 6480 free (4 
chunks); 1712 used
  TopTransactionContext: 516096 total in 6 blocks; 183384 free (26 
chunks); 332712 used
    Combo CIDs: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
    ExecutorState: 8192 total in 1 blocks; 2424 free (4 chunks); 5768 used
      ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
    SPI Exec: 24576 total in 2 blocks; 24544 free (12 chunks); 32 used
    SPI Proc: 8192 total in 1 blocks; 7264 free (2 chunks); 928 used
  Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 
6392 used
  Tsearch dictionary cache: 8192 total in 1 blocks; 5384 free (0 
chunks); 2808 used
  Tsearch parser cache: 8192 total in 1 blocks; 4872 free (0 chunks); 
3320 used
  Tsearch configuration cache: 8192 total in 1 blocks; 5384 free (0 
chunks); 2808 used
  PL/PgSQL function context: 8192 total in 1 blocks; 7128 free (4 
chunks); 1064 used
  PL/PgSQL function context: 24576 total in 2 blocks; 19616 free (10 
chunks); 4960 used
  PL/PgSQL function context: 8192 total in 1 blocks; 6544 free (4 
chunks); 1648 used
  CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used
  Rendezvous variable hash: 8192 total in 1 blocks; 3848 free (0 
chunks); 4344 used
  PLpgSQL function cache: 24328 total in 2 blocks; 5904 free (0 
chunks); 18424 used
  Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 
4344 used
  Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 
chunks); 10504 used
  MessageContext: 24576 total in 2 blocks; 20408 free (6 chunks); 4168 used
  smgr relation table: 8192 total in 1 blocks; 760 free (0 chunks); 
7432 used
  TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 
chunks); 16 used
  Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  PortalMemory: 8192 total in 1 blocks; 8040 free (1 chunks); 152 used
    PortalHeapMemory: 1024 total in 1 blocks; 920 free (0 chunks); 104 used
      Analyze: 1689879704 total in 217 blocks; 20451272 free (34962 
chunks); 1669428432 used
        Analyze Column: 310378496 total in 63 blocks; 2168 free (61 
chunks); 310376328 used
          Analyzed lexemes table: 516096 total in 6 blocks; 110216 free 
(21 chunks); 405880 used
      Vacuum: 8192 total in 1 blocks; 7872 free (0 chunks); 320 used
      ExecutorState: 8192 total in 1 blocks; 5984 free (1 chunks); 2208 
used
        ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 8192 total in 1 blocks; 2856 free (0 chunks); 5336 used
  CacheMemoryContext: 667696 total in 20 blocks; 120608 free (43 
chunks); 547088 used
    CachedPlan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlanSource: 1024 total in 1 blocks; 720 free (0 chunks); 304 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    index_match_data_tssearch_addr_str: 1024 total in 1 blocks; 136 
free (0 chunks); 888 used
    CachedPlan: 1024 total in 1 blocks; 568 free (0 chunks); 456 used
    CachedPlanSource: 1024 total in 1 blocks; 384 free (0 chunks); 640 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    CachedPlan: 1024 total in 1 blocks; 672 free (0 chunks); 352 used
    CachedPlanSource: 1024 total in 1 blocks; 552 free (0 chunks); 472 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    CachedPlan: 7168 total in 3 blocks; 2864 free (0 chunks); 4304 used
    CachedPlanSource: 1024 total in 1 blocks; 80 free (0 chunks); 944 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    index_match_data_tssearch_street: 1024 total in 1 blocks; 136 free 
(0 chunks); 888 used
    index_match_data_tssearch_name: 1024 total in 1 blocks; 136 free (0 
chunks); 888 used
    index_match_data_usrn: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    index_match_data_pid: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    index_match_data_pc: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    index_match_data_mid: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    match_data_pkey1: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
    CachedPlan: 1024 total in 1 blocks; 544 free (0 chunks); 480 used
    CachedPlanSource: 1024 total in 1 blocks; 360 free (0 chunks); 664 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    CachedPlan: 1024 total in 1 blocks; 584 free (0 chunks); 440 used
    CachedPlanSource: 1024 total in 1 blocks; 144 free (0 chunks); 880 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    pg_toast_2619_index: 1024 total in 1 blocks; 240 free (0 chunks); 
784 used
    CachedPlan: 1024 total in 1 blocks; 808 free (0 chunks); 216 used
    CachedPlanSource: 1024 total in 1 blocks; 720 free (0 chunks); 304 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0 
chunks); 872 used
    CachedPlan: 1024 total in 1 blocks; 584 free (0 chunks); 440 used
    CachedPlanSource: 1024 total in 1 blocks; 400 free (0 chunks); 624 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    CachedPlan: 1024 total in 1 blocks; 672 free (0 chunks); 352 used
    CachedPlanSource: 1024 total in 1 blocks; 552 free (0 chunks); 472 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    CachedPlan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlanSource: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
    SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
    simple: 1024 total in 1 blocks; 984 free (0 chunks); 40 used
    CachedPlan: 1024 total in 1 blocks; 576 free (0 chunks); 448 used
    CachedPlanSource: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
    SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
    CachedPlan: 1024 total in 1 blocks; 152 free (0 chunks); 872 used
    CachedPlanSource: 3072 total in 2 blocks; 1912 free (1 chunks); 
1160 used
    SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
    CachedPlan: 1024 total in 1 blocks; 376 free (0 chunks); 648 used
    CachedPlanSource: 1024 total in 1 blocks; 120 free (0 chunks); 904 used
    SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
    CachedPlan: 1024 total in 1 blocks; 432 free (0 chunks); 592 used
    CachedPlanSource: 1024 total in 1 blocks; 160 free (0 chunks); 864 used
    SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
    CachedPlan: 3072 total in 2 blocks; 744 free (1 chunks); 2328 used
    CachedPlanSource: 3072 total in 2 blocks; 80 free (0 chunks); 2992 used
    SPI Plan: 1024 total in 1 blocks; 912 free (0 chunks); 112 used
    CachedPlan: 1024 total in 1 blocks; 560 free (0 chunks); 464 used
    CachedPlanSource: 1024 total in 1 blocks; 336 free (0 chunks); 688 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    CachedPlan: 7168 total in 3 blocks; 3104 free (0 chunks); 4064 used
    CachedPlanSource: 1024 total in 1 blocks; 104 free (0 chunks); 920 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0 
chunks); 872 used
    pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free (0 
chunks); 784 used
    CachedPlan: 1024 total in 1 blocks; 560 free (0 chunks); 464 used
    CachedPlanSource: 1024 total in 1 blocks; 440 free (0 chunks); 584 used
    SPI Plan: 1024 total in 1 blocks; 928 free (0 chunks); 96 used
    pg_database_datname_index: 1024 total in 1 blocks; 344 free (0 
chunks); 680 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0 
chunks); 720 used
    pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0 
chunks); 872 used
    pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344 
free (0 chunks); 680 used
    pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 
used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0 
chunks); 784 used
    pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free (0 
chunks); 680 used
    pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0 
chunks); 784 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0 
chunks); 784 used
    pg_language_name_index: 1024 total in 1 blocks; 344 free (0 
chunks); 680 used
    pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0 
chunks); 936 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0 
chunks); 720 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free (0 
chunks); 744 used
    pg_ts_config_map_index: 1024 total in 1 blocks; 152 free (0 
chunks); 872 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 344 free 
(0 chunks); 680 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 280 free 
(0 chunks); 744 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0 
chunks); 896 used
    pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88 free (0 
chunks); 936 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free 
(0 chunks); 784 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0 
chunks); 744 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 304 free (0 
chunks); 720 used
    pg_user_mapping_oid_index: 1024 total in 1 blocks; 344 free (0 
chunks); 680 used
    pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free (0 
chunks); 832 used
    pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 
used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 344 free (0 
chunks); 680 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 344 free (0 
chunks); 680 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 280 free (0 
chunks); 744 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0 
chunks); 744 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 240 free (0 
chunks); 784 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 304 free (0 
chunks); 720 used
    pg_operator_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0 
chunks); 720 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 304 free (0 
chunks); 720 used
    pg_amop_opr_fam_index: 1024 total in 1 blocks; 240 free (0 chunks); 
784 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0 
chunks); 744 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 280 free 
(0 chunks); 744 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0 
chunks); 784 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 
680 used
    pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 152 free (0 
chunks); 872 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free 
(0 chunks); 784 used
    pg_proc_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 
used
    pg_language_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 304 free (0 
chunks); 720 used
    pg_database_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 
720 used
    pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0 
chunks); 936 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0 
chunks); 720 used
    pg_foreign_server_name_index: 1024 total in 1 blocks; 344 free (0 
chunks); 680 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 240 free 
(0 chunks); 784 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0 
chunks); 680 used
    pg_user_mapping_user_server_index: 1024 total in 1 blocks; 280 free 
(0 chunks); 744 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0 
chunks); 744 used
  MdSmgr: 8192 total in 1 blocks; 2128 free (3 chunks); 6064 used
  LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
  Timezones: 79320 total in 2 blocks; 5968 free (0 chunks); 73352 used
  ErrorContext: 8192 total in 1 blocks; 8176 free (10 chunks); 16 used
2010-06-02 03:18:42 BSTERROR:  out of memory
2010-06-02 03:18:42 BSTDETAIL:  Failed on request of size 114.
2010-06-02 03:18:42 BSTCONTEXT:  SQL statement "ANALYZE VERBOSE 
nlpg.match_data"
    PL/pgSQL function "mk_tbls_4e" line 8 at SQL statement
2010-06-02 03:18:42 BSTSTATEMENT:  -- Modify NLPG matchdata (addr_str 
tssearch lookup field)
    BEGIN;
    SELECT nlpg.mk_tbls_4e();
    COMMIT;

My plan now is to try increasing the shared_buffers, work_mem, 
maintenance_work_mem and apparently checkpoint_segments and see if that 
fixes it. However I am still stuck in a situation where the postgres 
service isnt visible to Windows services and I fear that now I really am 
just pulling levers blindly hoping to hit the right button. Also, these 
queries take days to run. Therefore, please help, any and all 
suggestions welcome.

Thanks,
Tom

On 31/05/2010 20:14, Tom Wilcox wrote:
I am having difficulties. I have rerun my update that uses the python 
functions..

(1) UPDATE nlpg.match_data SET org = normalise(org);

And some other similar queries on neighbouring fields in the table. 
They have all now worked. Without any changes to the configuration. I 
have done one thing in an attempt to minimise the risk of memory leak 
normalise() I added "toks = None" to the end of the normalise() 
function. However this was done after query (1) succeeded on the rerun.

Why would I get inconsistent behaviour? Would it have anything to do 
with SQL Server running on the same machine (although not actually 
doing anything at the moment - just idle server running in background).

Tangent: Is there any way to increase the memory allocated to postgres 
by Windows using Job Objects?

Cheers,
Tom

On 29/05/2010 18:55, Bill Moran wrote:
On 5/28/10 8:43:48 PM, Tom Wilcox wrote:
I ran this query:

EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;

And I got this result:

"Seq Scan on match_data  (cost=0.00..9762191.68 rows=32205168 
width=206)
(actual time=76873.592..357450.519 rows=27777961 loops=1)"
"Total runtime: 8028212.367 ms"

That would seem to indicate that the problem is in your Python
functions.

Some ideas for next steps:
* Perhaps it's just a few rows that have data in them that the
  function has difficulty with.  Add some debugging/logging to
  the function and see if the row it bombs on has anything unusual
  in it (such as a very large text field)
* While large, that function is fairly simplistic.  You may want
  to consider rewriting it as an SQL function, which should be
  more efficient in any event.

On 28 May 2010 19:39, Tom Wilcox <hungrytom@xxxxxxxxxxxxxx
<mailto:hungrytom@xxxxxxxxxxxxxx>> wrote:

    Oops. Sorry about that.

    I am having this problem with multiple queries however I am
    confident that a fair number may involve the custom plpython
    "normalise" function which I have made myself. I didn't think it
    would be complicated enough to produce a memory problem.. here 
it is:

    -- Normalises common address words (i.e. 'Ground' maps to 'grd')
    CREATE OR REPLACE FUNCTION normalise(s text) RETURNS text AS $$
    ADDR_FIELD_DELIM = ' '

    # Returns distinct list without null or empty elements
    def distinct_str(list):
         seen = set()
         return [x for x in list if x not in seen and not seen.add(x)
    and x!=None and len(x)>0]

    # normalise common words in given address string
    def normalise(match_data):
         if match_data==None: return ''
         import re
         # Tokenise
         toks = distinct_str(re.split(r'\s', match_data.lower()))
         out = ''
         for tok in toks:
             ## full word replace
             if tok == 'house' : out += 'hse'+ADDR_FIELD_DELIM
             elif tok == 'ground' : out += 'grd'+ADDR_FIELD_DELIM
             elif tok == 'gnd' : out += 'grd'+ADDR_FIELD_DELIM
             elif tok == 'front' : out += 'fnt'+ADDR_FIELD_DELIM
             elif tok == 'floor' : out += 'flr'+ADDR_FIELD_DELIM
             elif tok == 'floors' : out += 'flr'+ADDR_FIELD_DELIM
             elif tok == 'flrs' : out += 'flr'+ADDR_FIELD_DELIM
             elif tok == 'fl' : out += 'flr'+ADDR_FIELD_DELIM
             elif tok == 'basement' : out += 'bst'+ADDR_FIELD_DELIM
             elif tok == 'subbasement' : out += 'sbst'+ADDR_FIELD_DELIM
             elif tok == 'bsmt' : out += 'bst'+ADDR_FIELD_DELIM
             elif tok == 'lbst' : out += 'lower bst'+ADDR_FIELD_DELIM
             elif tok == 'street' : out += 'st'+ADDR_FIELD_DELIM
             elif tok == 'road' : out += 'rd'+ADDR_FIELD_DELIM
             elif tok == 'lane' : out += 'ln'+ADDR_FIELD_DELIM
             elif tok == 'rooms' : out += 'rm'+ADDR_FIELD_DELIM
             elif tok == 'room' : out += 'rm'+ADDR_FIELD_DELIM
             elif tok == 'no' : pass
             elif tok == 'number' : pass
             elif tok == 'and' : out += '&'+ADDR_FIELD_DELIM
             elif tok == 'rear' : out += 'rr'+ADDR_FIELD_DELIM
             elif tok == 'part' : out += 'pt'+ADDR_FIELD_DELIM
             elif tok == 'south' : out += 's'+ADDR_FIELD_DELIM
             elif tok == 'sth' : out += 's'+ADDR_FIELD_DELIM
             elif tok == 'north' : out += 'n'+ADDR_FIELD_DELIM
             elif tok == 'nth' : out += 'n'+ADDR_FIELD_DELIM
             elif tok == 'west' : out += 'w'+ADDR_FIELD_DELIM
             elif tok == 'wst' : out += 'w'+ADDR_FIELD_DELIM
             elif tok == 'east' : out += 'e'+ADDR_FIELD_DELIM
             elif tok == 'est' : out += 'e'+ADDR_FIELD_DELIM
             elif tok == 'first' : out += '1st'+ADDR_FIELD_DELIM
             elif tok == 'second' : out += '2nd'+ADDR_FIELD_DELIM
             elif tok == 'third' : out += '3rd'+ADDR_FIELD_DELIM
             elif tok == 'fourth' : out += '4th'+ADDR_FIELD_DELIM
             elif tok == 'fifth' : out += '5th'+ADDR_FIELD_DELIM
             elif tok == 'sixth' : out += '6th'+ADDR_FIELD_DELIM
             elif tok == 'seventh' : out += '7th'+ADDR_FIELD_DELIM
             elif tok == 'eighth' : out += '8th'+ADDR_FIELD_DELIM
             elif tok == 'ninth' : out += '9th'+ADDR_FIELD_DELIM
             elif tok == 'tenth' : out += '10th'+ADDR_FIELD_DELIM
             elif tok == 'eleventh' : out += '11th'+ADDR_FIELD_DELIM
             elif tok == 'twelfth' : out += '12th'+ADDR_FIELD_DELIM
             elif tok == 'thirteenth' : out += '13th'+ADDR_FIELD_DELIM
             elif tok == 'fourteenth' : out += '14th'+ADDR_FIELD_DELIM
             elif tok == 'fifteenth' : out += '15th'+ADDR_FIELD_DELIM
             elif tok == 'sixteenth' : out += '16th'+ADDR_FIELD_DELIM
             elif tok == 'seventeenth' : out += '17th'+ADDR_FIELD_DELIM
             elif tok == 'eighteenth' : out += '18th'+ADDR_FIELD_DELIM
             elif tok == 'nineteenth' : out += '19th'+ADDR_FIELD_DELIM
             elif tok == 'twentieth' : out += '20th'+ADDR_FIELD_DELIM
             # numbers 0 - 20
             elif tok == 'one' : out += '1'+ADDR_FIELD_DELIM
             elif tok == 'two' : out += '2'+ADDR_FIELD_DELIM
             elif tok == 'three' : out += '3'+ADDR_FIELD_DELIM
             elif tok == 'four' : out += '4'+ADDR_FIELD_DELIM
             elif tok == 'five' : out += '5'+ADDR_FIELD_DELIM
             elif tok == 'six' : out += '6'+ADDR_FIELD_DELIM
             elif tok == 'seven' : out += '7'+ADDR_FIELD_DELIM
             elif tok == 'eight' : out += '8'+ADDR_FIELD_DELIM
             elif tok == 'nine' : out += '9'+ADDR_FIELD_DELIM
             elif tok == 'ten' : out += '10'+ADDR_FIELD_DELIM
             elif tok == 'eleven' : out += '11'+ADDR_FIELD_DELIM
             elif tok == 'twelve' : out += '12'+ADDR_FIELD_DELIM
             elif tok == 'thirteen' : out += '13'+ADDR_FIELD_DELIM
             elif tok == 'fourteen' : out += '14'+ADDR_FIELD_DELIM
             elif tok == 'fifteen' : out += '15'+ADDR_FIELD_DELIM
             elif tok == 'sixteen' : out += '16'+ADDR_FIELD_DELIM
             elif tok == 'seventeen' : out += '17'+ADDR_FIELD_DELIM
             elif tok == 'eighteen' : out += '18'+ADDR_FIELD_DELIM
             elif tok == 'nineteen' : out += '19'+ADDR_FIELD_DELIM
             elif tok == 'twenty' : out += '20'+ADDR_FIELD_DELIM
             # town dictionary items
             elif tok == 'borough' : pass
             elif tok == 'city' : pass
             elif tok == 'of' : pass
             elif tok == 'the' : pass
             # a few extras (from looking at voa)
             elif tok == 'at' : pass
             elif tok == 'incl' : pass
             elif tok == 'inc' : pass
             else: out += tok+ADDR_FIELD_DELIM
         return out

    return normalise(s)
    $$ LANGUAGE plpythonu;

    Here's the create script for the table from pgAdmin (I hope that
    will be good enough instead of \d as I can't do that right now)..

    -- Table: nlpg.match_data

    -- DROP TABLE nlpg.match_data;

    CREATE TABLE nlpg.match_data
    (
       premise_id integer,
       usrn bigint,
       org text,
       sao text,
    "level" text,
       pao text,
    "name" text,
       street text,
       town text,
       pc postcode,
       postcode text,
       match_data_id integer NOT NULL DEFAULT
    nextval('nlpg.match_data_match_data_id_seq1'::regclass),
       addr_str text,
       tssearch_name tsvector,

       CONSTRAINT match_data_pkey1 PRIMARY KEY (match_data_id)
    )
    WITH (
       OIDS=FALSE
    );
    ALTER TABLE nlpg.match_data OWNER TO postgres;
    ALTER TABLE nlpg.match_data ALTER COLUMN "name" SET STATISTICS 
10000;

    -- Index: nlpg.index_match_data_mid

    -- DROP INDEX nlpg.index_match_data_mid;

    CREATE INDEX index_match_data_mid
       ON nlpg.match_data
       USING btree
       (match_data_id);

    -- Index: nlpg.index_match_data_pc

    -- DROP INDEX nlpg.index_match_data_pc;

    CREATE INDEX index_match_data_pc
       ON nlpg.match_data
       USING btree
       (pc);

    -- Index: nlpg.index_match_data_pid

    -- DROP INDEX nlpg.index_match_data_pid;

    CREATE INDEX index_match_data_pid
       ON nlpg.match_data
       USING btree
       (premise_id);

    -- Index: nlpg.index_match_data_tssearch_name

    -- DROP INDEX nlpg.index_match_data_tssearch_name;

    CREATE INDEX index_match_data_tssearch_name
       ON nlpg.match_data
       USING gin
       (tssearch_name);

    -- Index: nlpg.index_match_data_usrn

    -- DROP INDEX nlpg.index_match_data_usrn;

    CREATE INDEX index_match_data_usrn
       ON nlpg.match_data
       USING btree
       (usrn);

    As you can see, no FKs or triggers..

    I am running: EXPLAIN ANALYZE UPDATE nlpg.match_data SET org = org;

    However, as it should take around 90mins (if it is linear) then I
    thought I would send this now and follow up with the results 
once it
    finishes. (Has taken 2hours so far..)

    Thanks very much for your help.

    Tom

    On 28 May 2010 17:54, "Bill Moran" <wmoran@xxxxxxxxxxxxxxxxx
<mailto:wmoran@xxxxxxxxxxxxxxxxx>> wrote:

    In response to Tom Wilcox <hungrytom@xxxxxxxxxxxxxx
<mailto:hungrytom@xxxxxxxxxxxxxx>>:

> In addition, I have discovered that the update query that runs
    on each row
> of a 27million row ta...

    You're not liable to get shit for answers if you omit the mailing
    list from
    the conversation, especially since I know almost nothing about 
tuning
    PostgreSQL installed on Windows.

    Are there multiple queries having this problem?  The original
    query didn't
    have normalise() in it, and I would be highly suspicious that a 
custom
    function may have a memory leak or other memory-intensive
    side-effects.
    What is the code for that function?

    For example, does:
    UPDATE nlpg.match_data SET org = org WHERE match_data_id;
    finish in a reasonable amount of time or exhibit the same out of
    memory
    problem?

    It'd be nice to see a \d on that table ... does it have any
    triggers or
    cascading foreign keys?

    And stop

    --

    Bill Moran
    http://www.potentialtech.com
    http://people.collaborativefusion.com/~wmoran/
<http://people.collaborativefusion.com/%7Ewmoran/>

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general