Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage

Stefan Priebe <s.priebe@xxxxxxxxxxxx> · Sun, 15 May 2016 13:03:07 +0200

Hi Brian,

here's the new trace:
[310740.407263] XFS (sdf1): ino 0x27c69cd delalloc 0 unwritten 1 pgoff 
0x19f000 size 0x1a0000
[310740.407265] ------------[ cut here ]------------
[310740.407269] WARNING: CPU: 3 PID: 108 at fs/xfs/xfs_aops.c:1241 
xfs_vm_releasepage+0x12e/0x140()
[310740.407270] Modules linked in: netconsole ipt_REJECT nf_reject_ipv4 
xt_multiport iptable_filter ip_tables x_tables bonding coretemp 8021q 
garp fuse sb_edac edac_core i2c_i801 i40e(O) xhci_pci xhci_hcd vxlan 
ip6_udp_tunnel shpchp udp_tunnel ipmi_si ipmi_msghandler button btrfs 
xor raid6_pq dm_mod raid1 md_mod usbhid usb_storage ohci_hcd sg sd_mod 
ehci_pci ehci_hcd ahci usbcore libahci igb usb_common i2c_algo_bit 
i2c_core ptp mpt3sas pps_core raid_class scsi_transport_sas
[310740.407289] CPU: 3 PID: 108 Comm: kswapd0 Tainted: G           O 
4.4.10+25-ph #1
[310740.407290] Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 
1.0b 05/18/2015
[310740.407291]  0000000000000000 ffff880c4da1fa88 ffffffffa13c6d0f 
0000000000000000
[310740.407292]  ffffffffa1a51a1c ffff880c4da1fac8 ffffffffa10837a7 
ffff880c4da1fae8
[310740.407293]  0000000000000000 ffffea0000e38140 ffff8807e20bfd10 
ffffea0000e38160
[310740.407295] Call Trace:
[310740.407299]  [<ffffffffa13c6d0f>] dump_stack+0x63/0x84
[310740.407301]  [<ffffffffa10837a7>] warn_slowpath_common+0x97/0xe0
[310740.407302]  [<ffffffffa108380a>] warn_slowpath_null+0x1a/0x20
[310740.407303]  [<ffffffffa1326f6e>] xfs_vm_releasepage+0x12e/0x140
[310740.407305]  [<ffffffffa11520c2>] try_to_release_page+0x32/0x50
[310740.407308]  [<ffffffffa1166a8e>] shrink_active_list+0x3ce/0x3e0
[310740.407309]  [<ffffffffa1167127>] shrink_lruvec+0x687/0x7d0
[310740.407311]  [<ffffffffa116734c>] shrink_zone+0xdc/0x2c0
[310740.407312]  [<ffffffffa1168499>] kswapd+0x4f9/0x970
[310740.407314]  [<ffffffffa1167fa0>] ? 
mem_cgroup_shrink_node_zone+0x1a0/0x1a0
[310740.407316]  [<ffffffffa10a0d99>] kthread+0xc9/0xe0
[310740.407318]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
[310740.407320]  [<ffffffffa16b58cf>] ret_from_fork+0x3f/0x70
[310740.407321]  [<ffffffffa10a0cd0>] ? kthread_stop+0x100/0x100
[310740.407322] ---[ end trace bf76ad5e8a4d863e ]---


Stefan

Am 11.05.2016 um 17:59 schrieb Brian Foster:
Dropped non-XFS cc's, probably no need to spam other lists at this
point...

On Wed, May 11, 2016 at 04:03:16PM +0200, Stefan Priebe - Profihost AG wrote:

Am 11.05.2016 um 15:34 schrieb Brian Foster:
On Wed, May 11, 2016 at 02:26:48PM +0200, Stefan Priebe - Profihost AG wrote:
Hi Brian,

i'm still unable to grab anything to the trace file? Is there anything
to check if it's working at all?


See my previous mail:

http://oss.sgi.com/pipermail/xfs/2016-March/047793.html

E.g., something like this should work after writing to and removing a
new file:

# trace-cmd start -e "xfs:xfs_releasepage"
# cat /sys/kernel/debug/tracing/trace_pipe
...
rm-8198  [000] ....  9445.774070: xfs_releasepage: dev 253:4 ino 0x69 pgoff 0x9ff000 size 0xa00000 offset 0 length 0 delalloc 0 unwritten 0

arg sorry yes that's working but delalloc is always 0.


Hrm, Ok. That is strange.

May be i have to hook that into my initramfs to be fast enough?


Not sure that would matter.. you said it occurs within 48 hours? I take
that to mean it doesn't occur immediately on boot. You should be able to
tell from the logs or dmesg if it happens before you get a chance to
start the tracing.

Well, the options I can think of are:

- Perhaps I botched matching up the line number to the warning, in which
   case we might want to try 'grep -v "delalloc 0 unwritten 0"' to catch
   any delalloc or unwritten blocks at releasepage() time.

- Perhaps there's a race that the tracepoint doesn't catch. The warnings
   are based on local vars, so we could instrument the code to print a
   warning[1] to try and get the inode number.

Brian

[1] - compile tested diff:

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 40645a4..94738ea 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1038,11 +1038,18 @@ xfs_vm_releasepage(
  	gfp_t			gfp_mask)
  {
  	int			delalloc, unwritten;
+	struct xfs_inode	*ip = XFS_I(page->mapping->host);

  	trace_xfs_releasepage(page->mapping->host, page, 0, 0);

  	xfs_count_page_state(page, &delalloc, &unwritten);

+	if (delalloc || unwritten)
+		xfs_warn(ip->i_mount,
+		"ino 0x%llx delalloc %d unwritten %d pgoff 0x%llx size 0x%llx",
+			 ip->i_ino, delalloc, unwritten, page_offset(page),
+			 i_size_read(page->mapping->host));
+
  	if (WARN_ON_ONCE(delalloc))
  		return 0;
  	if (WARN_ON_ONCE(unwritten))

Stefan

Once that is working, add the grep command to filter out "delalloc 0"
instances, etc. For example:

	cat .../trace_pipe | grep -v "delalloc 0" > ~/trace.out

Brian

This still happens in the first 48 hours after a fresh reboot.

Stefan

Am 24.03.2016 um 13:24 schrieb Brian Foster:
On Thu, Mar 24, 2016 at 01:17:15PM +0100, Stefan Priebe - Profihost AG wrote:

Am 24.03.2016 um 12:17 schrieb Brian Foster:
On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:

Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:

Am 23.03.2016 um 15:07 schrieb Brian Foster:
On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
sorry new one the last one got mangled. Comments inside.

Am 05.03.2016 um 23:48 schrieb Dave Chinner:
On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
Am 04.03.2016 um 20:13 schrieb Brian Foster:
On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:

Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@xxxxxxxxxx>:

On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
...

This has happened again on 8 different hosts in the last 24 hours
running 4.4.6.

All of those are KVM / Qemu hosts and are doing NO I/O except the normal
OS stuff as the VMs have remote storage. So no database, no rsync on
those hosts - just the OS doing nearly nothing.

All those show:
[153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
xfs_vm_releasepage+0xe2/0xf0()


Ok, well at this point the warning isn't telling us anything beyond
you're reproducing the problem. We can't really make progress without
more information. We don't necessarily know what application or
operations caused this by the time it occurs, but perhaps knowing what
file is affected could give us a hint.

We have the xfs_releasepage tracepoint, but that's unconditional and so
might generate a lot of noise by default. Could you enable the
xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
E.g., we could leave a long running 'trace-cmd record -e
"xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
-e "xfs:xfs_releasepage"' and leave something like 'cat
/sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
~/trace.out' running to capture instances.

Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
it in the trace.out even the WARN_ONCE was already triggered?


The tracepoint is independent from the warning (see
xfs_vm_releasepage()), so the tracepoint will fire every invocation of
the function regardless of whether delalloc blocks still exist at that
point. That creates the need to filter the entries.

With regard to performance, I believe the tracepoints are intended to be
pretty lightweight. I don't think it should hurt to try it on a box,
observe for a bit and make sure there isn't a huge impact. Note that the
'trace-cmd record' approach will save everything to file, so that's
something to consider I suppose.

Tests / cat is running. Is there any way to test if it works? Or is it
enough that cat prints stuff from time to time but does not match -v
delalloc 0


What is it printing where delalloc != 0? You could always just cat
trace_pipe and make sure the event is firing, it's just that I suspect
most entries will have delalloc == unwritten == 0.

Also, while the tracepoint fires independent of the warning, it might
not be a bad idea to restart a system that has already seen the warning
since boot, just to provide some correlation or additional notification
when the problem occurs.

Brian

Stefan

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs