Re: "xfer doesn't work", "reconstruct fails", "mbexamine useless"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Quoting Stephan Lauffer <lauffer@xxxxxxxxxxxxxx>:

Hello!

Poorly, sorry, I have not a bug report with a "how to reproduce" for you. Over a longer period I thougth about sending a mail like this to the list, or not.

I love cyrus-imapd and so don't take my mail as a bashing or kind like that. Take it just as a report from a cyrus-imapd murder admin.

We use cyrus-imapd since years... I guess round about 2.0.something. First not in a murder but started with that a long time ago. Atm we have about 20k users with about 10T data (productice one proxy and 7 backend with v. 3.4.7).

My sorrows and my problems are that some important functions and features brakes too often. My feeling is that "no one" (at least not at the side of the developers) knows about cyrus-impad in "bigger" environments. Are we big or small? Depends on the point of view.


we are at a similar situation. We like our cyrus setup. It is not quite as old
as the ph-freiburg setup. But we have mailboxes migrated from UW-Imap to cyrus 2.3 (multiple stand alone servers) to cyrus 3.0/3.2 murder cluster with metadata and
archive partition and replication.  ~47k users, 2x 32T

Minor Cyrus upgrades are fine as you can run the old version if you need to.
My main problem is that during migration some problems most likely will occur,
even with intensive testing. I have to try to debug and fix what ever has happened,
without losing mails, and with minimal downtime.

My long time experience with cyrus has helped me a lot to recover is most cases.
But the fear is always there.

As i have dabbled a bit in debugging cyrus, i am willing to help and
share my knowlage

So what fails here? What are the *main* recurrings problems?

The MUA no longer see new mails.
You can check and try everithing... forget it. The only way: Delete and touch new cyrus.cache and cyrus.index and do a reconstruct and add every mails as a new one. Is this nice, is this "ok"? Hm... let's say "sometimes good, sometimes dad". As sysadmin you do not know if a important person can not see an important mail. But in most cases no other new mails will be shown so you get the notice and call from the user. Seens states are gone, some ohter informatinos, too. In most cases we can live with that.

have you used telemetry logging to see what the client is "requesting" and
what cyrus is sending

xfers stucks in the middle of nowhere
The biggest trouble is an update. Minor updates doesn't care. You will see some more "missing mail" problems but you know how to fix. The xfer ist if you start it and in the middle it breaks down and a part is on the old and another part on the new node. It takes too much time to fix all by hand (whti the hope the mailbox woll work on the new server!).

The last time where i used xfer (migration to metadata and archive partition)
we also had problems with xfer not transfering all subfolders in one go.
If i remember correctly I did not had the time to debug the problem,
and only fixed the missed folders with some scripts.


I remember one upgrade procedure with xfer which work mostly: Before the xfer we started 3(!) reconstructs on the mailbox. Because some errors could be found by reconstruct but some crashed xfer too. So you needed to loop the reconstruct sometimes. Only about 0.1% of the boxes chrashed during this xfer. It was a success! Sidenote: We have experience with local upgrades, with rsync to other machine an upgradt there... every way what taken in the past.


I like using cyrus replication. As there is almost now downtime for
the user even if something goes wrong. @Stephan if you like we can
talk about our setups.


So what are cyrus admins like me missing? Software has bugs. This is normal, this is "ok". It is our job to live with that and find fixes and solutions. But my feeling is that the developers don't use murdes setups, don't know bigger sites... F.e. see the last release where murder was completely broken. "Aha, I am right! They don't have a murder running! it is only know by paper".

I remember the time before brong and fastmail where involved in the cyrus
development. The code and testing has much improved since than, but there
are still some code paths less covered as i would like.
I understand that the main focus for developers paid by fastmail
is to improve/fix the features needed by fastmail.

So what to do? We must step up and get involved regarding the features
and use cases that are important to us. I think it would help to know who
is running murder setups.

Regarding bugs, there are three steps to fixing a bug:
1. discover the bug
2. locate the bug
3. fix the bug.

Not all steps have to be done by the same person.
I am not a programmer, so I am not much of a help regarding step 3.
But Step 1. and Step 2. i can help

And sometimes it helps to send a mail to the list regarding a problem,
even if it is still unclear if it is a bug, or something else.

The other REALY missing thing is a reconstruct or test command, which "fails" or reports if somethin is failed or broken in the mailbox. Maybe no fix for the mailbox but I need to know if something is broken there. I NEED a check to see which mail in the mailbox breakes everything. It is ok for me if I need to remove this mail f.e.x One word to mbexamine. I never understoop why, when for what it helped. Checking my nearby empty inbox with "-c" says "Failed to parse file" on one message. But my MUA can open and show it to me. What mbxeamine not says: Why does an xfer breaks down. And don't think about reconstruct - everythin is perfekt! So what to do with this mail? What to do next?

So it is my fault not to report the bugs and errors. But it is not easy if I can not write down a "how to reproduce". If I can not pass the real data to you because of the privacy. With every new release I hoped I would become bettet. Better tests in reconstruct, less xfer crashes and so on. But I don't know. No I think I give it a try with this mail. The main goal for me would be a "new" mailbox test, a reconstruct which REALY can see or fix problems. but how? Is this possible?

yes reconstruct should be able to discover all problems.
But it is hard to catch problems you did not expect.
So lets collect and debug the problems we did find, so that others can
implement checks in reconstruct

So are there other cyrus admins out there with the same problems? Or are we the only one with problems like this?


You are not alone





--------------------------------------------------------------------------------
Michael Menge                          Tel.: (49) 7071 / 29-70316
Universität Tübingen                   Fax.: (49) 7071 / 29-5912
Zentrum für Datenverarbeitung mail: michael.menge@xxxxxxxxxxxxxxxxxxxx
Wächterstraße 76
72074 Tübingen

Attachment: smime.p7s
Description: S/MIME-Signatur


------------------------------------------
Cyrus: Info
Permalink: https://cyrus.topicbox.com/groups/info/Teea77e88c04149de-Maf2c677c0d4f841a790771ef
Delivery options: https://cyrus.topicbox.com/groups/info/subscription

[Index of Archives]     [Cyrus SASL]     [Squirrel Mail]     [Asterisk PBX]     [Video For Linux]     [Photo]     [Yosemite News]     [gtk]     [KDE]     [Gimp on Windows]     [Steve's Art]
  Powered by Linux