Re: skipstamp db missing errors

"ellie timoney" <ellie@xxxxxxxxxxxx> · Mon, 07 Aug 2023 10:52:32 +1000

Interesting.  I'm not sure what "/usr/sbin/cyrus" is, but I guess it's some wrapper script provided by the debian package.  For now, I'm going to trust that it works and not overthink it.

One other thing that can prevent a skipstamp file being created (and thus cause these errors to happen) even when you are running ctl_cyrusdb -r,  is the existence of a file called "/path/to/db/skipcleanshutdown" (i.e. in the same directory where it looks for "skipstamp").  If a file with this name exists, then ctl_cyrusdb -r will not bother to create or update the skipstamp file.  The skipcleanshutdown file is deleted by ctl_cyrusdb -r as soon as it's been used to bypass the skipstamp update, so you won't have this file on a running system.

Nothing in Cyrus actually creates this file, so I guess it must be a thing that ones initd/systemd scripts create on shutdown.  It looks like the intention is that, if Cyrus was shut down cleanly, then a skiplist db recovery after restart is unnecessary.  If this file doesn't exist, we assume the last shutdown may have been unclean, and then skipstamp lets us determine which databases have been altered since the last known recovery and may need recovering again.

I can see a hole in this implementation though:  if the skipcleanshutdown file always exists, because you only ever have clean shutdowns, and the skipstamp file wasn't created in the first place for whatever reason, then the skipstamp file will never be created, and you'll keep getting these log messages...

You could check whether this is what's causing your problem by shutting down Cyrus as you usually would, then look to see if the skipcleanshutdown file has been created.  If it exists, then your next Cyrus restart would continue to not create a skipstamp file, and the log messages would persist.  You should be able to just remove this file, and then the next restart will create a skipstamp file (finally), and then the log messages will go away.

Can you confirm if this is the cause and the solution for you?

I think I'm going to patch the skipcleanshutdown handler so that it only skips the skipstamp update if some existing skipstamp already existed, cause that'll avoid the nothing-ever-actually-creates-it situation.  But if this wasn't the cause for you, then we'll need more digging...

Do you see this in your logs: "skiplist: clean shutdown file missing, updating recovery stamp"?  This line is logged by ctl_cyrusdb -r when it creates the skipstamp file.  If you DO see this line in your logs, then you should have a skipstamp file -- in which case, where is it going?!

Cheers,

ellie

On Sat, 5 Aug 2023, at 5:14 AM, Phil Dibowitz wrote:
> Yes, I have that:
>
> ```
> START {
>          # do not delete this entry!
>          recover         cmd="/usr/sbin/cyrus ctl_cyrusdb -r"
>
>          # this is only necessary if idlemethod is set to "idled" in 
> imapd.conf
>          idled           cmd="idled"
>
>          # this is useful on backend nodes of a Murder cluster
>          # it causes the backend to syncronize its mailbox list with
>          # the mupdate master upon startup
>          #mupdatepush   cmd="/usr/sbin/cyrus ctl_mboxlist -m"
>
>          # this is recommended if using duplicate delivery suppression
>          delprune        cmd="/usr/sbin/cyrus expire -E 3"
>          # this is recommended if caching TLS sessions
>          tlsprune        cmd="/usr/sbin/cyrus tls_prune"
>
>          # Expire data older than 28 days.
>          deleteprune     cmd="/usr/sbin/cyrus expire -E 4 -D 28" at=0430
>          expungeprune    cmd="/usr/sbin/cyrus expire -E 4 -X 28" at=0445
>
> }
>
> ```
>
> Most of that is default, I beleive.
>
> I don't know if I'm using skiplist dbs or not. I just know my logs are 
> full of those messages.
>
> That said I don't seem to have a skipstamp file anywhere.
>
>
> On 8/3/23 19:01, ellie timoney wrote:
>> Hi again,
>> 
>> I've dug a bit deeper...
>> 
>> My previous email was a little off in some of the specifics, but not 
>> enough for it to matter, and the big picture remains the same.
>> 
>> For skiplist specifically, if "ctl_cyrusdb -r" has not been run, and 
>> therefore the db/skipstamp file doesn't exist, then "recovery" will 
>> happen for every skiplist database every time anything opens it (thus 
>> "assuming the worst").  The housekeeping isn't missed though -- it's 
>> just maybe overdone.
>> 
>> The error message could be better!  While I'm in here anyway, I might 
>> see about putting together a small patch to improve logging around that 
>> condition.  If the problem is just that the skipstamp file is missing, 
>> an informational message about needing to run ctl_cyrusdb -r to create 
>> it seems appropriate.  I think alarm bells only need to sound when the 
>> skipstamp file exists but is unreadable or corrupt.
>> 
>> Cheers,
>> 
>> ellie
>> 
>> On Fri, 4 Aug 2023, at 11:28 AM, ellie timoney wrote:
>>> Hi,
>>>
>>> On Thu, 3 Aug 2023, at 8:33 AM, phil@xxxxxxxx <mailto:phil@xxxxxxxx> 
>>> wrote:
>>>> My errors are full of logs like:
>>>>
>>>>> 2023-08-02T22:28:36.469012+00:00 virt cyrus/imaps[3012149]: DBERROR: 
>>>>> read failed, assuming the worst: 
>>>>> filename=</var/lib/cyrus/db/skipstamp> syserror=<No such file or 
>>>>> directory> func=<myinit>
>>>>
>>>> I can't seem to find anything on this. It appears in lots of people's 
>>>> logs, but never seems to be the culprit for whatever they've 
>>>> reported. But nonetheless I'd like to either create that DB properly, 
>>>> or tell Cyrus not to look for it, or whatever the most appropriate 
>>>> remedy is.
>>>
>>> Hmm.  This happens when initialising the skiplist database backend 
>>> during process startup.  It expects that file to contain a timestamp 
>>> of the last time skiplist databases had their recovery function 
>>> called.  Off the top of my head, I couldn't tell you what exactly 
>>> "recovery()" means, or what is being recovered.  When opening a 
>>> skiplist database, recovery will be run if the database was last 
>>> recovered prior to the time in that timestamp.  Looks like in this 
>>> case, you don't have this file, so you don't have a last recovery 
>>> timestamp, and it's noisily complaining about it.
>>>
>>> This file and timestamp is created when the skiplist database backend 
>>> is initialised in recovery mode, which happens specifically when you 
>>> invoke "ctl_cyrusdb -r".  You usually have Cyrus do this on startup, 
>>> with an entry like this in cyrus.conf:
>>>
>>> START {
>>>   # do not delete this entry!
>>>   recover       cmd="ctl_cyrusdb -r"
>>> }
>>>
>>>>> Do you have such an entry?  If not, does adding one and restarting 
>>> Cyrus make the DBERROR go away?
>>>
>>> If you don't use the skiplist format for any databases, you will still 
>>> see this message each time a Cyrus process starts, because it occurs 
>>> when the skiplist backend is initialised, whether or not it's ever 
>>> actually used.  Even if you don't use skiplist databases specifically, 
>>> you should still have a "ctl_cyrusdb -r" entry like this in your 
>>> cyrus.conf, because the other database formats may have similar 
>>> housekeeping tasks to perform at restart, and this is how that happens.
>>>
>>> Cheers,
>>>
>>> ellie
>> 
>> *Cyrus <https://cyrus.topicbox.com/latest>* / Info / see discussions 
>> <https://cyrus.topicbox.com/groups/info> + participants 
>> <https://cyrus.topicbox.com/groups/info/members> + delivery options 
>> <https://cyrus.topicbox.com/groups/info/subscription> Permalink 
>> <https://cyrus.topicbox.com/groups/info/Tf92f39d4795cc515-Mf974e1b2236230891885494d>

------------------------------------------
Cyrus: Info
Permalink: https://cyrus.topicbox.com/groups/info/Tf92f39d4795cc515-M6f6396e0b9e8bd92921af6ef
Delivery options: https://cyrus.topicbox.com/groups/info/subscription