Hi,
Yes, I agree, 8.3 is out of support for a long time and this is the reason we are trying to migrate to 9.3 using SLONY to minimize downtime.
I eliminated the possibility of data corruption as the limit/offset technique indicated different rows each time it was executed. Actually, the failure is still happening and as it is running in a virtual machine, memory size configuration for this virtual machine was increased from 48GB to 64GB and we have scheduled a server shutdown/restart for the next coming weekend in order to try to get rid of the failure.
The replication activity was aborted: SLONY triggers removed, SLONY processes terminated and SLONY schemas removed.
Ulimit output was appended at the end of this note.
Memory statistics dump from postmaster log resulted from a select * from "8147_spunico"."sincdc"; command:
Ulimit output was appended at the end of this note.
Memory statistics dump from postmaster log resulted from a select * from "8147_spunico"."sincdc"; command:
Thank you!
TopMemoryContext: 80800 total in 9 blocks; 4088 free (10 chunks); 76712 used
Operator class cache: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
TopTransactionContext: 8192 total in 1 blocks; 7648 free (1 chunks); 544 used
MessageContext: 57344 total in 3 blocks; 40760 free (6 chunks); 16584 used
smgr relation table: 24576 total in 2 blocks; 13904 free (4 chunks); 10672 used
TransactionAbortContext: 32768 total in 1 blocks; 32736 free (0 chunks); 32 used
Portal hash: 8192 total in 1 blocks; 1680 free (0 chunks); 6512 used
PortalMemory: 8192 total in 1 blocks; 7888 free (0 chunks); 304 used
PortalHeapMemory: 1024 total in 1 blocks; 720 free (0 chunks); 304 used
ExecutorState: 381096528 total in 6 blocks; 49856 free (30 chunks); 381046672 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 24576 total in 2 blocks; 14912 free (3 chunks); 9664 used
CacheMemoryContext: 817392 total in 20 blocks; 230456 free (3 chunks); 586936 used
pg_toast_729119_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
idx_sincdc_situacao: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
idx_sincdc_esqtab: 2048 total in 1 blocks; 656 free (0 chunks); 1392 used
idx_sincdc_datahoraexp: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pk_sincdc: 2048 total in 1 blocks; 752 free (0 chunks); 1296 used
pg_index_indrelid_index: 2048 total in 1 blocks; 704 free (0 chunks); 1344 used
pg_attrdef_adrelid_adnum_index: 2048 total in 1 blocks; 608 free (0 chunks); 1440 used
pg_ts_dict_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_aggregate_fnoid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_language_name_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_statistic_relid_att_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
pg_ts_dict_dictname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_namespace_nspname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_opfamily_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_opclass_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_ts_parser_prsname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_amop_fam_strat_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
pg_opclass_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
pg_trigger_tgrelid_tgname_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
pg_cast_source_target_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_auth_members_role_member_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_attribute_relid_attnum_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
pg_ts_config_cfgname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_authid_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_ts_config_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_conversion_default_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
pg_language_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_enum_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_proc_proname_args_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
pg_ts_parser_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_database_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_conversion_name_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_class_relname_nsp_index: 3072 total in 2 blocks; 1600 free (2 chunks); 1472 used
pg_attribute_relid_attnam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_class_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_amproc_fam_proc_index: 3072 total in 2 blocks; 1384 free (2 chunks); 1688 used
pg_operator_oprname_l_r_n_index: 3072 total in 2 blocks; 1432 free (3 chunks); 1640 used
pg_index_indexrelid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_type_oid_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_rewrite_rel_rulename_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_authid_rolname_index: 3072 total in 2 blocks; 1696 free (2 chunks); 1376 used
pg_auth_members_member_role_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_enum_typid_label_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_constraint_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_conversion_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_ts_template_tmplname_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_ts_config_map_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
pg_namespace_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_type_typname_nsp_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_operator_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_amop_opr_fam_index: 3072 total in 2 blocks; 1648 free (2 chunks); 1424 used
pg_proc_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
pg_opfamily_am_name_nsp_index: 3072 total in 2 blocks; 1624 free (3 chunks); 1448 used
pg_ts_template_oid_index: 3072 total in 2 blocks; 1744 free (3 chunks); 1328 used
MdSmgr: 8192 total in 1 blocks; 7648 free (0 chunks); 544 used
LOCALLOCK hash: 24576 total in 2 blocks; 15984 free (5 chunks); 8592 used
Timezones: 53584 total in 2 blocks; 3744 free (0 chunks); 49840 used
ErrorContext: 8192 total in 1 blocks; 8160 free (3 chunks); 32 used
Ulimit output:
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 385725
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 385725
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
On Thu, Dec 11, 2014 at 1:30 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Carlos Henrique Reimer <carlos.reimer@xxxxxxxxxxxxx> writes:
> I've facing an out of memory condition after running SLONY several hours to
> get a 1TB database with about 23,000 tables replicated. The error occurs
> after about 50% of the tables were replicated.
I'd try bringing this up with the Slony crew.
> I guess postgresql is trying to perform an atomic allocation (those which
> cannot wait for reclaim) to get a continues memory area and is failing due
> to memory fragmentation.
This theory has nothing to do with reality. More likely it's just a
garden variety memory leak. If it was an out-of-memory error reported
by Postgres, there should have been a memory statistics dump written in
the postmaster log --- can you find that and post it?
Another possible theory is that you're just looking at lots of memory
needed to hold relcache entries for all 23000 tables :-(. If so there
may not be any easy way around it, except perhaps replicating subsets
of the tables. Unless you can boost the memory available to the backend
--- since this is a 64 bit build, the only reason I can see for
out-of-memory failures would be a restrictive ulimit setting.
> After SLONY gets the out of memory condition, select * of the table also
> does not work:
> FiscalWeb=# select * from "8147_spunico"."sincdc";
> ERROR: out of memory
> DETAIL: Failed on request of size 268435456.
That's odd ... looks more like data corruption than anything else.
Does this happen even in a fresh session? What do you have to do
to get rid of the failure?
> PostgreSQL 8.3.21 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
> 4.4.6 20120305 (Red Hat 4.4.6-4)
You realize of course that this version is years out of support, and that
even if this problem traces to a bug in Postgres, 8.3 is not going to get
fixed.
regards, tom lane
--