Just read the 148 file and see if there is corruption. I suspect that's the cause. Bron On Fri, 12 May 2017, at 23:09, Eric Cunningham wrote: > I have snapshot backups of that account, that particular message and the > various cyrus db files. What can I provide for review? > > > On 5/11/17 7:38 PM, Bron Gondwana wrote: > > It looked like the '148.' file was corrupted in some way. I assume you've deleted all the evidence now, so we can't see how! Oh well :( > > > > Bron. > > > > On Fri, 12 May 2017, at 04:55, Eric Cunningham wrote: > >> Just to close the loop on this, once the corrupted cdm-lit account on my > >> copy server successfully reconstructed, the backlog of replication has > >> completed successfully and is now caught up. > >> > >> -Eric > >> > >> On 05/11/2017 02:20 PM, Eric Cunningham wrote: > >>> -G didn't seem to help, but I tried a "reconstruct -f -r" command > >>> without having previously deleted all occurrences of the cyrus.* files > >>> and it was then successful. > >>> > >>> > >>> > >>> On 05/11/2017 01:44 PM, Patrick Boutilier wrote: > >>>> Try a -G with reconstruct? > >>>> > >>>> -G Force re-parsing of the underlying message (checks GUID > >>>> correctness). Reconstruct with -G should fix all possible individual > >>>> message issues, including corrupted data files. > >>>> > >>>> > >>>> On 05/11/2017 02:37 PM, Eric Cunningham wrote: > >>>>> I have to walk this back. In looking slightly further back in my > >>>>> logfiles, before every instance of a failure to sync some folder, I > >>>>> see a common error reported prior to every "bailing out! Bad > >>>>> protocol" error - "IOERROR: GUID mismatch > >>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148." > >>>>> > >>>>> May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch > >>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148. > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to MAILBOX (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update > >>>>> failed: user.cdm-lit.Sent 'Bad protocol' > >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000 > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to MAILBOXES (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000 > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to MAILBOXES (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000 > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to MAILBOXES (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000 > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to MAILBOXES (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000 > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to MAILBOXES (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000 > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to MAILBOXES (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to MAILBOXES (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length > >>>>> response to UNMAILBOX (end of file reached) > >>>>> May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed: > >>>>> user.guest-student-coordinator.Trash.Aarflot 'Bad protocol' > >>>>> May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing > >>>>> out! Bad protocol > >>>>> May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file > >>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol > >>>>> > >>>>> When I recontructed this cdm-lit account, it ran successfully on my > >>>>> master host, but fails on my copy host: > >>>>> > >>>>> [cyrus@imap2 ~]$ reconstruct -f -r user.cdm-lit > >>>>> user.cdm-lit > >>>>> user.cdm-lit.Sea Trail 2013 > >>>>> user.cdm-lit.Sent uid 1 found - adding > >>>>> user.cdm-lit.Sent uid 2 found - adding > >>>>> user.cdm-lit.Sent uid 3 found - adding > >>>>> user.cdm-lit.Sent uid 4 found - adding > >>>>> .... > >>>>> user.cdm-lit.Sent uid 146 found - adding > >>>>> user.cdm-lit.Sent uid 147 found - adding > >>>>> user.cdm-lit.Sent uid 148 found - adding > >>>>> fatal error: Internal error: assertion failed: imap/mailbox.c: 2847: > >>>>> record->size > >>>>> > >>>>> Since I couldn't get a successful reconstruct on this account, I > >>>>> deleted it and recreated it from my master host. However, I'm still > >>>>> unable to get a successful reconstruct with "failed to read index > >>>>> header" for every subfolder and "fatal error: Internal error: > >>>>> assertion failed: imap/mailbox.c: 2847: record->size" > >>>>> > >>>>> Any ideas on how to correct this so I can see if I can then get past > >>>>> the replication error? > >>>>> > >>>>> Thanks! > >>>>> > >>>>> -Eric > >>>>> > >>>>> > >>>>> On 05/11/2017 11:46 AM, Eric Cunningham wrote: > >>>>>> Thanks Bron, but that doesn't seem to work for me, unless I'm > >>>>>> missing something. > >>>>>> > >>>>>> I ran reconstructs for this account on both my master and copy hosts: > >>>>>> > >>>>>> [cyrus@imap1 ~]$ reconstruct -f -r user.scramer > >>>>>> user.scramer > >>>>>> user.scramer.Archive > >>>>>> user.scramer.Archives > >>>>>> user.scramer.Archives.2004 > >>>>>> ... > >>>>>> user.scramer.Trash.IS Networking and Operations > >>>>>> user.scramer.Trash.IS Security > >>>>>> user.scramer.Trash.IS Servers > >>>>>> user.scramer.Trash.IS Staff > >>>>>> user.scramer.Trash.IS Surveys > >>>>>> ... > >>>>>> > >>>>>> > >>>>>> [cyrus@imap2 ~]$ reconstruct -f -r user.scramer > >>>>>> user.scramer > >>>>>> user.scramer.Archive > >>>>>> user.scramer.Archives > >>>>>> user.scramer.Archives.2004 > >>>>>> ... > >>>>>> user.scramer.Trash.IS Networking and Operations > >>>>>> user.scramer.Trash.IS Security > >>>>>> user.scramer.Trash.IS Servers > >>>>>> user.scramer.Trash.IS Staff > >>>>>> user.scramer.Trash.IS Surveys > >>>>>> ... > >>>>>> > >>>>>> > >>>>>> I then restart the replication against the log-run file and it again > >>>>>> fails on a folder that no longer exists in that account: > >>>>>> > >>>>>> ... > >>>>>> May 11 11:40:51 imap1 sync_client[60696]: sync_mailboxes: doing 1000 > >>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length > >>>>>> response to MAILBOXES (end of file reached) > >>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length > >>>>>> response to MAILBOXES (end of file reached) > >>>>>> May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length > >>>>>> response to UNMAILBOX (end of file reached) > >>>>>> May 11 11:40:51 imap1 sync_client[60696]: folder_delete(): failed: > >>>>>> user.scramer.Trash.IS Software Management 'Bad protocol' > >>>>>> May 11 11:40:51 imap1 sync_client[60696]: Error in do_sync(): > >>>>>> bailing out! Bad protocol > >>>>>> May 11 11:40:51 imap1 sync_client[60696]: Processing sync log file > >>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol > >>>>>> > >>>>>> > >>>>>> -Eric > >>>>>> > >>>>>> > >>>>>> On 05/11/2017 02:14 AM, Bron Gondwana wrote: > >>>>>>> Looks like you have a corrupted mailboxes database - if you run a > >>>>>>> reconstruct at each end then it should be fine. Replication won't > >>>>>>> bail for an actually-doesn't-exist mailbox, but it can't fix broken > >>>>>>> mailboxes, that's what reconstruct is for. > >>>>>>> > >>>>>>> Bron. > >>>>>>> > >>>>>>> > >>>>>>> On Thu, 11 May 2017, at 01:01, Eric Cunningham wrote: > >>>>>>>> Hi list, I'm running cyrus replication on cyrus 2.5.10. It seems > >>>>>>>> whenever a folder is encountered in log-run that doesn't exist on the > >>>>>>>> client and/or the server, the replication process logs "Error in > >>>>>>>> do_sync(): bailing out! Bad protocol". Sometimes, it dies > >>>>>>>> completely, > >>>>>>>> other times it restarts at the beginning of the log-run file. But > >>>>>>>> when > >>>>>>>> it re-encounters the folder in question, the same failure occurs > >>>>>>>> and the > >>>>>>>> problem continuously loops until some manual action is taken. I > >>>>>>>> have a > >>>>>>>> checker script that attempts to restart a stopped sync but I haven't > >>>>>>>> tried to code it to get past these types of folder errors. In the > >>>>>>>> meantime, replication effectively stops and the replication log > >>>>>>>> continuously builds in size. You might imagine my horror at > >>>>>>>> discovering > >>>>>>>> yesterday that replication of my production server with 1200 accounts > >>>>>>>> had been failing for over 7 weeks unnoticed, generating 9.5 million > >>>>>>>> lines in the replication log files. I had gotten out of the habit of > >>>>>>>> daily monitoring of replication as it had been running fine and I > >>>>>>>> eventually moved on to other things. I'm trying to get > >>>>>>>> replication to > >>>>>>>> run through the enormous backlog but am running into more similar > >>>>>>>> folder > >>>>>>>> issues that cause the process to fail, I make manual fixes, and start > >>>>>>>> the process again. In my experience with this, reconstruction of the > >>>>>>>> affected folders on the master host and copy host has no effect. > >>>>>>>> > >>>>>>>> Can this otherwise excellent feature be made more robust to work past > >>>>>>>> these folder issues? > >>>>>>>> > >>>>>>>> Here are a couple of examples I've observed thus far of how the > >>>>>>>> process > >>>>>>>> is failing for me. > >>>>>>>> > >>>>>>>> -------------- > >>>>>>>> > >>>>>>>> In this example, the subfolder "Fly America Foreign Travel" no longer > >>>>>>>> existed on the master host but did exist on the copy host. To > >>>>>>>> resolve > >>>>>>>> this, I had to manually delete that folder from the copy host. > >>>>>>>> > >>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: inefficient replication > >>>>>>>> (8 > > >>>>>>>> 5) user.jtaft.PO.Fly America Foreign Travel > >>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: FULLMAILBOX received NO > >>>>>>>> response: IMAP_MAILBOX_NONEXISTENT No Such Mailbox > >>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: do_folders(): update > >>>>>>>> failed: > >>>>>>>> user.jtaft.PO.Fly America Foreign Travel 'Mailbox does not exist' > >>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: IOERROR: Mailbox does > >>>>>>>> not exist > >>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: Error in do_sync(): bailing > >>>>>>>> out! Mailbox does not exist > >>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: Processing sync log file > >>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Mailbox does > >>>>>>>> not exist > >>>>>>>> May 9 15:17:48 imap1 sync_client[34614]: Reprocessing sync log file > >>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run > >>>>>>>> > >>>>>>>> -------------- > >>>>>>>> > >>>>>>>> In this example, the subfolder "Sohn" did not exist on the master > >>>>>>>> host > >>>>>>>> nor on the copy host. To resolve, I had to edit log-run and > >>>>>>>> completely > >>>>>>>> remove all entries for this "Sohn" subfolder. > >>>>>>>> > >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length > >>>>>>>> response > >>>>>>>> to MAILBOXES (end of file reached) > >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length > >>>>>>>> response > >>>>>>>> to MAILBOXES (end of file reached) > >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length > >>>>>>>> response > >>>>>>>> to UNMAILBOX (end of file reached) > >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: folder_delete(): failed: > >>>>>>>> user.gg-chair.Trash.Sohn 'Bad protocol' > >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: Error in do_sync(): bailing > >>>>>>>> out! Bad protocol > >>>>>>>> May 10 10:13:26 imap1 sync_client[47011]: Processing sync log file > >>>>>>>> /var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol > >>>>>>>> > >>>>>>>> -------------- > >>>>>>>> > >>>>>>>> Often, errors are logged without "bailing out!" and replication > >>>>>>>> continues to run: > >>>>>>>> > >>>>>>>> May 10 10:45:00 imap1 sync_client[87438]: sync_mailboxes: doing 1000 > >>>>>>>> May 10 10:45:01 imap1 sync_client[87438]: sync_mailboxes: doing 1000 > >>>>>>>> May 10 10:45:24 imap1 sync_client[87438]: IOERROR: GUID mismatch > >>>>>>>> /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148. > >>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: IOERROR: zero length > >>>>>>>> response > >>>>>>>> to MAILBOX (end of file reached) > >>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: do_folders(): update > >>>>>>>> failed: > >>>>>>>> user.cdm-lit.Sent 'Bad protocol' > >>>>>>>> May 10 10:45:37 imap1 sync_client[87438]: sync_mailboxes: doing 1000 > >>>>>>>> > >>>>>>>> -------------- > >>>>>>>> > >>>>>>>> Thank you, list! > >>>>>>>> > >>>>>>>> -Eric > >>>>>>>> > >>>>>>>> ---- > >>>>>>>> Cyrus Home Page: http://www.cyrusimap.org/ > >>>>>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ > >>>>>>>> To Unsubscribe: > >>>>>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus > >>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> ---- > >>>>> Cyrus Home Page: http://www.cyrusimap.org/ > >>>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ > >>>>> To Unsubscribe: > >>>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus > >>>> > >>>> > >>>> > >>>> ---- > >>>> Cyrus Home Page: http://www.cyrusimap.org/ > >>>> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ > >>>> To Unsubscribe: > >>>> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus > >>>> > >>> > >> > >> ---- > >> Cyrus Home Page: http://www.cyrusimap.org/ > >> List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ > >> To Unsubscribe: > >> https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus > > > > > -- Bron Gondwana brong@xxxxxxxxxxx ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ To Unsubscribe: https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus