Hi,
No, it hasn't changed since the first time I looked at it.
root@atlassian:/home/tbadmin# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D /etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering 0000000100000005000000A3
11161 pts/4 S+ 0:00 grep --color=auto post
The /var/log/postgres logs haven't got anything new in them.
Also the screen session hasn't got anything new in them, except it adds a line every time I attempt to connect to it using psql. The last attempt below was my connection attempt for this email, a bit over 8 hours after my last connection attempt.
Screen session:
2017-01-28 12:38:35 AEDT FATAL: the database system is starting up
2017-01-28 23:00:01 AEDT FATAL: the database system is starting up
2017-01-28 23:00:01 AEDT FATAL: the database system is starting up
2017-01-29 07:14:00 AEDT FATAL: the database system is starting up
I also still can't connect.
postgres@atlassian:/home/tbadmin$ psql
psql: FATAL: the database system is starting up
Brian Mills
CTO
Mob: 0410660003
Level 1 | 600 Chapel Street | South Yarra| VIC | 3141 | Australia
On 28 January 2017 at 16:08, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote:
On 01/27/2017 05:40 PM, Brian Mills wrote:
First of all, Thank you for your time to assist me learning. I really
appreciate it.
root# ps ax | grep post
1364 ? Ss 0:00 /usr/lib/postfix/master
5198 pts/3 S 0:00 su postgres
5221 pts/3 S 0:00 /usr/lib/postgresql/9.3/bin/postgres -D
/etc/postgresql/9.3/main
5222 ? Ss 0:10 postgres: startup process recovering
0000000100000005000000A3
7930 pts/4 S+ 0:00 grep --color=auto post
So if you check back does the recovering XXXXXXXX part change?
If so Postgres is walking through the WAL files as it should.
Its a single machine postgres database server, so I'm assuming there is
no cluster log. If there is one, where would I look for it?
The only log in /var/log/postgres is postgresql-9.3-main.log
That would be it. A single Postgres instance has multiple databases in it, by default it starts with template0, template1 and postgres databases. Then add whatever databases you created and you have a cluster of databases.
which shows (tail):
2017-01-27 20:27:01 AEDT LOG: database system was shut down at
2017-01-27 20:26:29 AEDT
2017-01-27 20:27:01 AEDT LOG: MultiXact member wraparound protections
are now enabled
2017-01-27 20:27:01 AEDT LOG: autovacuum launcher started
2017-01-27 20:27:01 AEDT LOG: database system is ready to accept
connections
2017-01-27 20:27:02 AEDT LOG: incomplete startup packet
2017-01-27 20:28:54 AEDT ERROR: unexpected chunk size 104 (expected
1996) in chunk 3 of 22 for toast value 48862 in pg_toast_20028
2017-01-27 20:28:54 AEDT STATEMENT: COPY public.bodycontent
(bodycontentid, body, contentid, bodytypeid) TO stdout;
2017-01-27 20:30:13 AEDT LOG: received fast shutdown request
2017-01-27 20:30:13 AEDT LOG: aborting any active transactions
2017-01-27 20:30:13 AEDT LOG: autovacuum launcher shutting down
2017-01-27 20:30:13 AEDT LOG: shutting down
2017-01-27 20:30:13 AEDT LOG: database system is shut down
That would be your Attempt 1 log.
I ran the screen utility so I could leave the db started using
the pg_ctl command. The later logs in that session have not progressed,
its last entry is still
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
Which is still later datetime than the /var/log/postgres... log.
So it is just logging to stdout and not to the log file.
Connection attempt shows the same log
postgres@atlassian:/home/myuser$ psql *Brian Mills*
psql: FATAL: the database system is starting up
Nothing in the syslog that seems connected.
CTO
*Mob: *0410660003 <tel:0410660003>
*Melbourne* 03 9012 3460 <tel:03%209012%203460> or 03 8376 6327
<tel:03%208376%206327> *|* * **Sydney* 02 8064 3600
<tel:02%208064%203600> *|* *Brisbane* 07 3173 1570 <tel:07%203173%201570>
Level 1 *|* 600 Chapel Street *|* South
Yarra*|* VIC *|* 3141 *|* Australia
<https://www.facebook.com/TryBooking/ > <https://twitter.com/trybooking > <https://www.linkedin.com/company/trybooking-com >
On 28 January 2017 at 12:05, Adrian Klaver <adrian.klaver@xxxxxxxxxxxadrian.klaver@xxxxxxxxxxx <mailto:adrian.klaver@aklaver.<mailto:adrian.klaver@aklaver.com >> wrote:
On 01/27/2017 01:31 PM, Brian Mills wrote:
Hi,
I have a Atlassian Confluence Wiki that depends on postgres, but I
haven't much experience with postgres other than for this purpose.
A few days ago, the hard disk filled, so all services stopped
working.
When the admin realised this he increased the disk size (its in
a cloud,
so that was easy to do) however I think the way it shutdown left
Postgres in an inconsistent state for some reason.
Now when I start it up I get this message in the log over and
over again:
"FATAL: the database system is starting up"
I have a backup, which I have successfully recovered, but it is
24 hours
old, the next backup was the cause of the disk filling. So I'm using
this as exercise in learning a bit more about postgres.
I did some research and found a number of options. I took a file
level
backup with the service not running then tried 2 things. I
haven't found
much else on what to do though.
*Attempt 1 - Reset Log *
It sounded like this shouldn't be my first option (it wasn't)
but it did
sound like what I needed to do.
I ran this command
./pg_resetxlog /var/lib/postgresql/9.3/main -f
It worked a treat, the database did startup ok.
However when I tried to dump the DB:
root@atlassian:/home/myuser# sudo -u postgres pg_dump confluencedb >
$now-confluencedb.sql
pg_dump: Dumping the contents of table "bodycontent" failed:
PQgetResult() failed.
pg_dump: Error message from server: ERROR: unexpected chunk
size 104
(expected 1996) in chunk 3 of 22 for toast value 48862 in
pg_toast_20028
pg_dump: The command was: COPY public.bodycontent
(bodycontentid, body,
contentid, bodytypeid) TO stdout;
The dump failed, so I assume this did leave my database in an
inconsistent state.
*Attempt 2 - startup manually and let it try recovery*
I restored my file level backup and started again.
This time I tried to startup manually on the command line to see the
output (I'd done it as a service startup a number of times to
nearly the
same effect too)
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ ./pg_ctl -D
/etc/postgresql/9.3/main start
server starting
postgres@atlassian:/usr/lib/postgresql/9.3/bin$ 2017-01-27
20:36:08 AEDT
LOG: database system was interrupted while in recovery at
2017-01-27
20:13:26 AEDT
2017-01-27 20:36:08 AEDT HINT: This probably means that some
data is
corrupted and you will have to use the last backup for recovery.
2017-01-27 20:36:08 AEDT LOG: database system was not properly shut
down; automatic recovery in progress
2017-01-27 20:36:08 AEDT LOG: redo starts at 5/528B4558
2017-01-27 20:36:18 AEDT LOG: redo done at 5/A3FFF9E8
2017-01-27 20:36:18 AEDT LOG: last completed transaction was at log
time 2017-01-24 02:08:00.023064+11
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
2017-01-27 23:00:01 AEDT FATAL: the database system is starting up
What does ps ax | grep post show?
Is the cluster set up to log to a file, if so what does it show?
Does the system log show anything relevant?
I've left it that way for 12 hours, and its still not allowing
connections.
I assume its doing some kind of consistency check?
What does it say when you attempt a connection?
Does anyone have any suggestions on what I should be doing to
try and
restore this database?
- The amount of change is minimal in the DB after 6pm it should be
basically no change overnight.
- The log above seems to suggest it has completed a redo ok, is
that right?
- The last completed transaction time 2017-01-24
02:08:00.023064+11 the
log mentions would be fine to use, so loosing even a few hours
before
that would be more than adequate.
I'm just not clear what the database is doing now, or how I
should be
trying to recover it.
Any help anyone can suggest would be great! I've given myself this
weekend to attempt to recover more than the last backup, after
that I
need to restore the service for my team to use and suck up the
lost last
day of updates.
Thanks,
Brian
--
Adrian Klavercom >
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx