Re: CIFS endless console spammage in 2.6.38.7

Steve French <smfrench@xxxxxxxxx> · Tue, 31 May 2011 15:51:05 -0500

On Tue, May 31, 2011 at 3:44 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> On Tue, 31 May 2011 12:45:37 -0700
> Ben Greear <greearb@xxxxxxxxxxxxxxx> wrote:
>
>> On 05/31/2011 12:36 PM, Steve French wrote:
>> > This is on setting up a session, so could be something like:
>> > - mount
>> > - do write
>> > - server crash
>> > - attempt to reconnect
>> > - socket returns ENOSOCK
>> > - attempt to reconnect ...
>> > - repeat
>> >
>> > Is this repeatable enough that we could modify the client to stop on
>> > the reconnect to see what is causing the socket to go bad and which
>> > operation we are repeating the reconnect on.
>>
>> Well, ENOTSOCK sounds like a pretty serious coding problem.  Maybe
>> a use-after-close or something?
>>
>> At the least, we could look for some particular errors (such as ENOTSOCK)
>> and print more info and do a more thorough job of cleaning up.
>>
>> Maybe a WARN_ON_ONCE() when the rv is ENOTSOCK as well?
>>
>> Seems we can reproduce this only when our open-filer HA system
>> craps itself during failover, but we can get that to happen usually
>> within hours, sometimes maybe about a day.  And, CIFS errors don't always
>> happen when the HA cluster goes bad.
>>
>> So, I'm happy to test patches, but since it's a bit tricky to
>> reproduce this...I'm hoping to get the best info possible with
>> each patch iteration!
>>
>
> I had a report of a similar problem on a RHEL5 (2.6.18) kernel:
>
>    https://bugzilla.redhat.com/show_bug.cgi?id=704921
>
> In this case, it caused an oops as well. Your problem may or may not be
> the same, but if it is, I suspect that the root cause is a lack of
> clear locking rules for the TCP_Server_Info->tcpStatus.
>
> What I think happened in that case was that the client was in the
> middle of a NEGOTIATE request and got a response, and another reconnect
> occurred while it was processing it. While the client was tearing down
> and creating a new socket, the thread that issued the NEGOTIATE on the
> previous socket marked the tcpStatus as CifsGood.
>
> Fixing it looks to be anything but trivial. I'm not even quite sure how
> to approach it at this point. Suggestions welcome.

I thought the kernel was more recent than that - how recent is the kernel here?

I think something related to cifs_sendv returning ENOTSOCK immediately
when not reconnected could be related.

-- 
Thanks,

Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html