Re: Kernel crashes with trace ending in XFS code on RHEL6 variant kernel

Jan Kokoska <jan@xxxxxxx> · Wed, 29 Oct 2014 11:00:02 +0100

Hi Eric,

On 28 October 2014 18:42, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
On 10/28/14 10:38 AM, Jan Kokoska wrote:

> Hi,

>

> I'm running OpenVZ (OS container) kernel variant of RHEL6 kernel on

... for which we have no source code? ;)

Right, I'm sorry, the source code patch on vanilla kernel is linked from
http://openvz.org/Download/kernel/rhel6/042stab084.20
and
http://openvz.org/Download/kernel/rhel6/042stab092.3
for the two kernel versions.

xfs.aops.c differs a bit between the older and the newer version (released 6 months apart), but both kernels crash. 

1031,1032c1031,1034
< 		 * Just skip the page if it is fully outside i_size, e.g. due
< 		 * to a truncate operation that is in progress.
---
> 		 * Skip the page if it is fully outside i_size, e.g. due to a
> 		 * truncate operation that is in progress. We must redirty the
> 		 * page so that reclaim stops reclaiming it. Otherwise
> 		 * xfs_vm_releasepage() is called on it and gets confused.
1034,1037c1036,1037
< 		if (page->index >= end_index + 1 || offset_into_page == 0) {
< 			unlock_page(page);
< 			return 0;
< 		}
---
> 		if (page->index >= end_index + 1 || offset_into_page == 0)
> 			goto redirty;

OpenVZ devs unfortunately don't publish their git tree anymore.

I don't know what's in "2.6.32-openvz-amd64" so can't help much.

What is at line 86 of xfs_aops.c in that kernel?

Stefan is right in that it's the line
bh = head = page_buffers(page);
from 
xfs_count_page_state()

Eric, thanks for the pointer to ef5d437f71afdf4afdbab99213add99f4b1318fd, I'll raise it with OpenVZ devs or with RHEL so the bug trickles downstream to OpenVZ. I simply didn't know how much difference there may be between XFS parts of the kernel trees that you maintain and that are e.g. in RHEL and thought it could be a generally occurring bug. Also wanted to get in touch with the mailing list as I've been using XFS mostly happily for a decade.

Jan

-Eric

> several amd64 machines by different manufacturers (HP and Supermicro)

> and different RAID cards (HP and Areca).

>

> I've started seeing kernel crashes in October, as per the netconsole

> logs attached, on two of the machines (one HP, one Supermicro). The

> traces look quite similar, the machine in question cannot write

> anything to its own filesystem when this happens so the logs are made

> over the network. The XFS filesystem is not root (that's ext4), but

> one for data (OS containers), on both machines. When I run xfs_check

> and xfs_repair on the filesystem after the kernel crash & reboot, no

> issue is ever found.

>

> This may very well have nothing to do with XFS kernel code you wrote

> and maintain, but in that case, could you, from looking at the traces,

> tell me whether it maybe looks like something issue related to

> vm/paging just ending up in XFS related code path?

>

> I'm happy to test any suggestions/fixes for this if it is XFS related.

>

> Thank you,

> --

> Jan Kokoska

> Glow Internet s.r.o.

>

>

> _______________________________________________

> xfs mailing list

> xfs@xxxxxxxxxxx

> http://oss.sgi.com/mailman/listinfo/xfs

>

_______________________________________________

xfs mailing list

xfs@xxxxxxxxxxx

http://oss.sgi.com/mailman/listinfo/xfs

-- 
S pozdravem

Jan Kokoska
Glow Internet s.r.o.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs