We saw something similar: syslog() messages 'on the wire' (imap, pop3, etcetera) when We've restarted syslog on an in-production cyrus backend. In summary, DONT DO IT (syslog stop) with cyrus runing. On 11/11/2010 07:54 PM, Bron Gondwana wrote: > On Thu, Nov 11, 2010 at 02:24:47PM -0200, Henrique de Moraes Holschuh wrote: >> On Thu, 11 Nov 2010, Paul Dekkers wrote: >>> Uhoh! And then I looked at mailboxes.db: It looks like part completely >>> rewritten, including the skiplist header, and the first line now said: >>> user.bla: System I/O error System I/O error >> This is something that has plagued cyrus for a long time. Can we find a >> way to actually keep tabs on our FDs so it cannot ever happen again, >> please? I recall reports of crap showing inside prot streams 10 years >> ago... if now it is leaking into even worse places, well... > It's a standalone program. Reconstruct was running all by itself. > >> This probably needs a redesign of master/service fd-passing protocol, >> and of prot streams to be fixed for good. While at it, we should >> switch the master/service interaction to a modern design, since the >> operating system worth bothering with nowadays deal sanely with the >> thundering herd effect, and all of them have proper socket event support >> (epoll-like. Would require one of the event abstraction libraries, >> though, so as to support linux/bsd/solaris with minimum fuss). > Since that wasn't the issue - why on earth was it allowed to have fd 2 > in the first place? Is Cyrus closing fd 2, or is truss closing it?? > > There was no issue outside truss, it was when it ran under truss that > the issue happened. > > Here's the start of an strace of a reconstruct run on my machine: > > execve("/usr/cyrus/bin/reconstruct", ["/usr/cyrus/bin/reconstruct", "-C", "/tmp/ct-slot2/etc/imapd.conf", "-s"], [/* 20 vars */]) = 0 > brk(0) = 0x12f1000 > access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) > mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fceb52d8000 > access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) > open("db-4.6/lib/tls/x86_64/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) > open("db-4.6/lib/tls/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) > open("db-4.6/lib/x86_64/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) > open("db-4.6/lib/libsasl2.so.2", O_RDONLY) = -1 ENOENT (No such file or directory) > open("/etc/ld.so.cache", O_RDONLY) = 3 > > > Notice the first fd allocated: 3. > > And here's a run under truss on FreeBSD: > > [root@cyrus1 /var/imap]# sudo -u cyrus truss /usr/local/cyrus/bin/reconstruct user.foo > __sysctl(0x7fffffffe390,0x2,0x7fffffffe3ac,0x7fffffffe3a0,0x0,0x0) = 0 (0x0) > mmap(0x0,672,PROT_READ|PROT_WRITE,MAP_ANON,-1,0x0) = 34366398464 (0x80065a000) > munmap(0x80065a000,672) = 0 (0x0) > __sysctl(0x7fffffffe400,0x2,0x800763428,0x7fffffffe3f8,0x0,0x0) = 0 (0x0) > mmap(0x0,32768,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,-1,0x0) = 34366398464 (0x80065a000) > issetugid(0x80065b015,0x800654cc4,0x80076fc50,0x80076fc20,0x6351,0x0) = 0 (0x0) > open("/etc/libmap.conf",O_RDONLY,0666) ERR#2 'No such file or directory' > access("/usr/lib/libsasl2.so.2",0) ERR#2 'No such file or directory' > access("/usr/local/lib/libsasl2.so.2",0) = 0 (0x0) > open("/usr/local/lib/libsasl2.so.2",O_RDONLY,035431400) = 2 (0x2) > > Note the first fd allocated: 2!!!!! > > > The question is - why is fd 2 being allocated? Is it necessary to explicitly > open stderr? The function that's scribbling all over everything is com_err, > which is supposed to be a BSD error reporting library, it SHOULD know what > it's doing... > > Bron ( a while later, fd 2 gets re-used as the mailboxes.db handle, and hence > the mess is created ) > ---- > Cyrus Home Page: http://www.cyrusimap.org/ > List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/ ---- Cyrus Home Page: http://www.cyrusimap.org/ List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/