Hi all,
tl;dr:
is postgresql syncing filesystem caches after stop or should we do it manually to be sure, that all data are on drives?
Long version:
First, we are using Postgresql 9.5 with some simple custom patches. OS is centos 6, data are stored on ext4 fs.
We were moving a bunch of Postgresql instances (I'm talking about tens of nodes with about thousand of DBs on each node) to a new HW. Because of some technical/business/... reasons, we decided to use 'rsync': we prepared new node, sync postgres' data during regular run, than stop DB on the old node, do last rsync to move increments and start DB on the new node. Quite simple and 99% of moved nodes and DBs were ok, but..
... but 1% had problems with a few of their DBs. We met (at least) two errors:
1) toast data for `pg_statistic` table were corrupted ("missing chunk number 0 for toast value....") - the solution was to delete records there and run `analyze` to recreate it.
2) some tables disappeared:
2.1) `\d table_name` says, that the table doesn't exists ("Did not find any relation named...")
2.2) `create table ..` says, that "type 'table_name' already exists"
2.3) sometimes I'm able to `drop type` the type, sometimes the drop fails with "cache lookup failed for relation ..." :(
Is it possible, that Postgresql leave some data in memory after its stop? Should we run `sync` after DB strop and before last rsync or somehow clean FS caches to avoid problems like above? I'd expect, that postgresql will do it during stopping processes, but maybe not.
The memory and FS cache are my biggest suspect, if you have idea what else can be bad, feel free to suggest.
Thanks a lot,
- jj
tl;dr:
is postgresql syncing filesystem caches after stop or should we do it manually to be sure, that all data are on drives?
Long version:
First, we are using Postgresql 9.5 with some simple custom patches. OS is centos 6, data are stored on ext4 fs.
We were moving a bunch of Postgresql instances (I'm talking about tens of nodes with about thousand of DBs on each node) to a new HW. Because of some technical/business/... reasons, we decided to use 'rsync': we prepared new node, sync postgres' data during regular run, than stop DB on the old node, do last rsync to move increments and start DB on the new node. Quite simple and 99% of moved nodes and DBs were ok, but..
... but 1% had problems with a few of their DBs. We met (at least) two errors:
1) toast data for `pg_statistic` table were corrupted ("missing chunk number 0 for toast value....") - the solution was to delete records there and run `analyze` to recreate it.
2) some tables disappeared:
2.1) `\d table_name` says, that the table doesn't exists ("Did not find any relation named...")
2.2) `create table ..` says, that "type 'table_name' already exists"
2.3) sometimes I'm able to `drop type` the type, sometimes the drop fails with "cache lookup failed for relation ..." :(
Is it possible, that Postgresql leave some data in memory after its stop? Should we run `sync` after DB strop and before last rsync or somehow clean FS caches to avoid problems like above? I'd expect, that postgresql will do it during stopping processes, but maybe not.
The memory and FS cache are my biggest suspect, if you have idea what else can be bad, feel free to suggest.
Thanks a lot,
- jj