On 3/18/13, David C. Rankin <drankinatty@xxxxxxxxxxxxxxxxxx> wrote: > Guys, > > I have a server that will hardlock every week or two. The log entries > always > look the same. There is a postfix/smtp transaction in progress when the > lock > occurs. After the lockup you are dropped to maintenance mode on next reboot > and > there are always 4 inodes that are part of an orphaned link list that are > fixed > with fsck and then the machine reboot normally. The log entries just prior > to > the lockup look like this: > > Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max connection > rate > 1/60s for (smtp:213.199.243.30) at Mar 17 16:01:52 > Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max connection > count 1 > for (smtp:213.199.243.30) at Mar 17 16:01:52 > Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max cache size 1 > at > Mar 17 16:01:52 > Mar 17 16:14:52 phoenix postfix/qmgr[1019]: 81963E9720: > from=<inconsiderableka04@xxxxxxxxxx>, size=7485, nrcpt=1 (queue active) > Mar 17 16:14:52 phoenix postfix/smtp[26899]: 81963E9720: > to=<**snipped**@3111skyline.com>, relay=3111skyline.com[66.76.63.120]:25, > delay=1118, delays=1118/0.02/0.16/0.17, dsn=4.7.1, status=deferred (host > 3111skyline.com[66.76.63.120] said: 450 4.7.1 Client host rejected: cannot > find > your hostname, [66.76.63.60] (in reply to RCPT TO command)) > Mar 18 07:34:19 phoenix kernel: [ 0.000000] Initializing cgroup subsys > cpuset > Mar 18 07:34:19 phoenix kernel: [ 0.000000] Initializing cgroup subsys > cpu > Mar 18 07:34:19 phoenix kernel: [ 0.000000] Linux version 3.4.7-1-ARCH > (tobias@T-POWA-LX) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1 > > I cannot find any connection between the postfix/smtp and the lockup > searching > the web. So I'm asking here, has anyone else seen a lockup where the last > log > entry is a postfix/smtp entry and then experienced a 4 orphaned inode error > on > reboot? This has occurred multiple times over the past year or so. memtest > completes without error and the drives show no other errors or issues. > Drive > temps are stable at: > > /dev/sda: ST3250410AS: 35°C > /dev/sdb: ST3250410AS: 39°C > > Any feedback welcomed. Otherwise, it looks like this has to be hardware. > What about df/tmpfs overflows etc, to cover the obvious sources of error...? Do you have that email 81963E9720 somewhere in lost+found or could otherwise make sure it survives the crash? I would be surprised if that email is making things crash, but who knows. One of the things that caught my eye was the 450 error for which a quick google turned me to [1]... As this is something my boss also was fighting with this week, I thought I'd read it quickly - it doesn't look that hard if you compute English, which the people I work with don't... For examining this stuff mor thoroughly, we'd need your postfix config, said main.cf file would be most likely to be revealing. cheers! mar77i [1] http://www.postfix.org/ADDRESS_VERIFICATION_README.html