Re: xfs trace in 4.4.2 / also in 4.3.3 WARNING fs/xfs/xfs_aops.c:1232 xfs_vm_releasepage

Brian Foster <bfoster@xxxxxxxxxx> · Thu, 24 Mar 2016 07:17:04 -0400

On Thu, Mar 24, 2016 at 09:15:15AM +0100, Stefan Priebe - Profihost AG wrote:
> 
> Am 24.03.2016 um 09:10 schrieb Stefan Priebe - Profihost AG:
> > 
> > Am 23.03.2016 um 15:07 schrieb Brian Foster:
> >> On Wed, Mar 23, 2016 at 02:28:03PM +0100, Stefan Priebe - Profihost AG wrote:
> >>> sorry new one the last one got mangled. Comments inside.
> >>>
> >>> Am 05.03.2016 um 23:48 schrieb Dave Chinner:
> >>>> On Fri, Mar 04, 2016 at 04:03:42PM -0500, Brian Foster wrote:
> >>>>> On Fri, Mar 04, 2016 at 09:02:06PM +0100, Stefan Priebe wrote:
> >>>>>> Am 04.03.2016 um 20:13 schrieb Brian Foster:
> >>>>>>> On Fri, Mar 04, 2016 at 07:47:16PM +0100, Stefan Priebe wrote:
> >>>>>>>> Am 20.02.2016 um 19:02 schrieb Stefan Priebe - Profihost AG:
> >>>>>>>>>
> >>>>>>>>>> Am 20.02.2016 um 15:45 schrieb Brian Foster <bfoster@xxxxxxxxxx>:
> >>>>>>>>>>
> >>>>>>>>>>> On Sat, Feb 20, 2016 at 09:02:28AM +0100, Stefan Priebe wrote:
> >> ...
> >>>
> >>> This has happened again on 8 different hosts in the last 24 hours
> >>> running 4.4.6.
> >>>
> >>> All of those are KVM / Qemu hosts and are doing NO I/O except the normal
> >>> OS stuff as the VMs have remote storage. So no database, no rsync on
> >>> those hosts - just the OS doing nearly nothing.
> >>>
> >>> All those show:
> >>> [153360.287040] WARNING: CPU: 0 PID: 109 at fs/xfs/xfs_aops.c:1234
> >>> xfs_vm_releasepage+0xe2/0xf0()
> >>>
> >>
> >> Ok, well at this point the warning isn't telling us anything beyond
> >> you're reproducing the problem. We can't really make progress without
> >> more information. We don't necessarily know what application or
> >> operations caused this by the time it occurs, but perhaps knowing what
> >> file is affected could give us a hint.
> >>
> >> We have the xfs_releasepage tracepoint, but that's unconditional and so
> >> might generate a lot of noise by default. Could you enable the
> >> xfs_releasepage tracepoint and hunt for instances where delalloc != 0?
> >> E.g., we could leave a long running 'trace-cmd record -e
> >> "xfs:xfs_releasepage" <cmd>' command on several boxes and wait for the
> >> problem to occur. Alternatively (and maybe easier), run 'trace-cmd start
> >> -e "xfs:xfs_releasepage"' and leave something like 'cat
> >> /sys/kernel/debug/tracing/trace_pipe | grep -v "delalloc 0" >
> >> ~/trace.out' running to capture instances.
> 
> Isn't the trace a WARN_ONCE? So it does not reoccur or can i check the
> it in the trace.out even the WARN_ONCE was already triggered?
> 

The tracepoint is independent from the warning (see
xfs_vm_releasepage()), so the tracepoint will fire every invocation of
the function regardless of whether delalloc blocks still exist at that
point. That creates the need to filter the entries.

With regard to performance, I believe the tracepoints are intended to be
pretty lightweight. I don't think it should hurt to try it on a box,
observe for a bit and make sure there isn't a huge impact. Note that the
'trace-cmd record' approach will save everything to file, so that's
something to consider I suppose.

Brian

> Stefan
> 
> 
> > 
> > Stefan
> > 
> >>
> >> Brian
> >>
> >>> Stefan
> >>>
> >>>>
> >>>> -Dave.
> >>>>
> >>>
> >>> _______________________________________________
> >>> xfs mailing list
> >>> xfs@xxxxxxxxxxx
> >>> http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs