Re: Hardlock after postfix/smtp entry in log - leaves 4 lost inodes each time - ideas?

Martti Kühne <mysatyre@xxxxxxxxx> · Mon, 18 Mar 2013 23:39:23 +0100

On 3/18/13, David C. Rankin <drankinatty@xxxxxxxxxxxxxxxxxx> wrote:
> Guys,
>
>   I have a server that will hardlock every week or two. The log entries
> always
> look the same. There is a postfix/smtp transaction in progress when the
> lock
> occurs. After the lockup you are dropped to maintenance mode on next reboot
> and
> there are always 4 inodes that are part of an orphaned link list that are
> fixed
> with fsck and then the machine reboot normally. The log entries just prior
> to
> the lockup look like this:
>
>  Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max connection
> rate
> 1/60s for (smtp:213.199.243.30) at Mar 17 16:01:52
> Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max connection
> count 1
> for (smtp:213.199.243.30) at Mar 17 16:01:52
> Mar 17 16:07:16 phoenix postfix/anvil[26843]: statistics: max cache size 1
> at
> Mar 17 16:01:52
> Mar 17 16:14:52 phoenix postfix/qmgr[1019]: 81963E9720:
> from=<inconsiderableka04@xxxxxxxxxx>, size=7485, nrcpt=1 (queue active)
> Mar 17 16:14:52 phoenix postfix/smtp[26899]: 81963E9720:
> to=<**snipped**@3111skyline.com>, relay=3111skyline.com[66.76.63.120]:25,
> delay=1118, delays=1118/0.02/0.16/0.17, dsn=4.7.1, status=deferred (host
> 3111skyline.com[66.76.63.120] said: 450 4.7.1 Client host rejected: cannot
> find
> your hostname, [66.76.63.60] (in reply to RCPT TO command))
> Mar 18 07:34:19 phoenix kernel: [    0.000000] Initializing cgroup subsys
> cpuset
> Mar 18 07:34:19 phoenix kernel: [    0.000000] Initializing cgroup subsys
> cpu
> Mar 18 07:34:19 phoenix kernel: [    0.000000] Linux version 3.4.7-1-ARCH
> (tobias@T-POWA-LX) (gcc version 4.7.1 20120721 (prerelease) (GCC) ) #1
>
>   I cannot find any connection between the postfix/smtp and the lockup
> searching
> the web. So I'm asking here, has anyone else seen a lockup where the last
> log
> entry is a postfix/smtp entry and then experienced a 4 orphaned inode error
> on
> reboot?  This has occurred multiple times over the past year or so. memtest
> completes without error and the drives show no other errors or issues.
> Drive
> temps are stable at:
>
> /dev/sda: ST3250410AS: 35°C
> /dev/sdb: ST3250410AS: 39°C
>
>   Any feedback welcomed. Otherwise, it looks like this has to be hardware.
>

What about df/tmpfs overflows etc, to cover the obvious sources of error...?
Do you have that email 81963E9720 somewhere in lost+found or could
otherwise make sure it survives the crash? I would be surprised if
that email is making things crash, but who knows.

One of the things that caught my eye was the 450 error for which a
quick google turned me to [1]... As this is something my boss also was
fighting with this week, I thought I'd read it quickly - it doesn't
look that hard if you compute English, which the people I work with
don't...

For examining this stuff mor thoroughly, we'd need your postfix
config, said main.cf file would be most likely to be revealing.

cheers!
mar77i

[1] http://www.postfix.org/ADDRESS_VERIFICATION_README.html