On Mar 6, 2014, at 9:03 AM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote: > On 03/06/2014 09:33 AM, Israel Brewster wrote: >> For starters, this happened again this morning (no data prior to 4:45 am and sequence reset), so whatever is going on appears to be reoccurring. Also, I forgot to mention if it is significant: this is running on slackware liunux 14.0 >> >> On Mar 5, 2014, at 1:00 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote: >> >>> On 03/05/2014 10:22 AM, Israel Brewster wrote: >>>> My first thought was "Oh, I must have a typo in my >>>> cleanup routine, such that it is deleting all records rather than only >>>> those a week old, and it's just that no one has noticed until now". So I >>>> looked at that, but changing the delete to a select appeared to produce >>>> the proper results, in that no records were selected: >>> >>> Well it would, if the records only go back to 4 AM this morning. In other words if no records exist before 4 AM today, no records exist before 7 days ago also or am I missing something? >> >> If the delete is correct, you are absolutely right. My first theory, however, was that I made a typo, and the delete was deleting ALL records, in which case changing it to a select would select all records. As it did not, that seems to confirm the delete is correct, and therefore not the problem. >> >>> A sequence is just a special table. >>> >>> So what does SELECT * from the sequence show? >> >> tracking=> SELECT * FROM data_id_seq; >> sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called >> ---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+----------- >> data_id_seq | 1184 | 1 | 1 | 9223372036854775807 | 1 | 1 | 16 | f | t >> >> >>> >>>> >>>> Also odd is that my cleanup script runs at 1am. I have records of there >>>> being new data in the database up to 3:51am, but the oldest record >>>> currently in the DB is from 4:45am (as specified by the default of now() >>>> on the column). So I know records were added after my delete command >>>> ran, but before this reset occurred. >>> >>> I am not sure what you are calling the 'reset'? >>> Did something happen between 3:51 AM and 4:45 AM? >> >> Yes: All my data was deleted and the sequence reset to 1. >> >>> Also not sure why you call the 4:45 AM record the oldest, when you say you can identify records from 3:51 AM? >> >> As I mentioned, I archive the records to permanent storage. This archive process happens every hour (for various reasons). That is how I know we had records for 3:51 am: they exist in the permanent archive. However, they don't exist in the local database any more. > > Well something is happening. See my notes on logging below to help track down the cause. Yep. > >> >>> >>>> >>>> So my question is, aside from someone going in and mucking about in the >>>> wee hours of the morning, what could possibly cause this behavior? What >>>> sort of event could cause all data to be deleted from the table, and the >>>> sequence to be reset? Especially while there is an active connection? >>>> Thanks for any ideas, however wild or off the wall :-) >>> >>> What is in the Postgres/system logs for the time period(s) you mention? >> >> The postgres log has a lot of errors in it, some of which MAY explain the issue. For example: >> >> cp: cannot create regular file '/mnt/pglogs/000000010000000400000094': Permission denied >> LOG: archive command failed with exit code 1 >> DETAIL: The failed archive command was: test ! -f /mnt/pglogs/000000010000000400000094 && cp pg_xlog/000000010000000400000094 /mnt/pglogs/000000010000000400000094 >> WARNING: transaction log file "000000010000000400000094" could not be archived: too many failures >> LOG: received smart shutdown request >> LOG: autovacuum launcher shutting down >> LOG: shutting down >> LOG: database system is shut down > > Might be good to explain your archive setup. Ok, here goes: We have the primary system which receives the data and handles all requests for said data. There is also a hot standby server keep in sync with streaming replication. The WALs are archived to a NFS share on this machine. Once an hour a python script runs that a) Selects all unsynced records from the postgresql db, b) stores a subset of them in our permanent archive, and c) marks the previously selected records as synced (UPDATE data SET syncd=true WHERE id in (...) ) Additionally, I have a) a script that runs at 8:00pm every evening that uses pg_dump to dump the contents of the database to a backup file, and b) a script that runs at 8:00 each morning that rsync's various config files and scripts (such as my data retrieval scripts) from the primary machine to a backup location on the secondary machine. None of the scripts run anywhere near the apparent 4:40ish cutoff time for my data Make sense? Probably not the best setup, but then that's what happens when you figure out stuff for yourself rather than having formal training :-) I'm DEFINITELY open to suggestions :-) > >> >> However, there are no timestamps on any of the entries (can I fix that?), so I don't know if those are current entries, or from back before I got the mount for the logs working. At this time, the mount point IS working correctly, and from what I can tell so is the archive command. The latest entry is from yesterday (modify date on the log shows Mar 5, 9:21, when I was messing with it yesterday), however, so there are no entries from this morning when it happened again. I don't see anything of interest in the syslog or messages log. > > Yes you can, timestamps and a lot more. For all the details go here: > > http://www.postgresql.org/docs/9.2/interactive/runtime-config-logging.html > > At the least I would: > > log_statement = 'mod' > > log_connections = on > log_disconnections = on > > log_line_prefix = '%u-%m-%x' I'll get those in the config, and we'll see what happens tomorrow morning. Hopefully that will give more information. Thanks for the link and information! > >> >>> >>>> >>>> ----------------------------------------------- >>>> Israel Brewster >>>> Computer Support Technician II >>>> Era Alaska >>>> 5245 Airport Industrial Rd >>>> Fairbanks, AK 99709 >>>> (907) 450-7250 x7293 >>>> ----------------------------------------------- >>>> >>>> >>>> >>>> >>> >>> >>> -- >>> Adrian Klaver >>> adrian.klaver@xxxxxxxxxxx ----------------------------------------------- Israel Brewster Computer Support Technician II Era Alaska 5245 Airport Industrial Rd Fairbanks, AK 99709 (907) 450-7250 x7293 -----------------------------------------------
BEGIN:VCARD VERSION:3.0 N:Brewster;Israel;;; FN:Israel Brewster ORG:Frontier Flying Service;MIS TITLE:PC Support Tech II EMAIL;type=INTERNET;type=WORK;type=pref:israel@xxxxxxxxxxxxxxxxxx TEL;type=WORK;type=pref:907-450-7293 item1.ADR;type=WORK;type=pref:;;5245 Airport Industrial Wy;Fairbanks;AK;99701; item1.X-ABADR:us CATEGORIES:General X-ABUID:36305438-95EA-4410-91AB-45D16CABCDDC\:ABPerson END:VCARD
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general