Re: xfs bug?

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 8 Jan 2013 09:30:54 +1100

On Mon, Jan 07, 2013 at 06:12:45PM +0100, Linos wrote:
> I am hitting what seems to be a bug in one of my Linux storage layers, this is
> my system:

Something is corrupting the workqueue infrastructure. It's not an
XFS problem, XFs just uses workqueues and it's hung waiting for
workqueues to be scehduled to run, which is not happening due to the
initial ooops.

>  ------------[ cut here ]------------
>  WARNING: at kernel/workqueue.c:1550 worker_enter_idle+0xd5/0x130()
>  Hardware name: System Product Name
>  Modules linked in: tun pci_stub vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 ip_tables x_tables xfs snd_hda_codec_hdmi raid456 async_raid6_recov async_memcpy async_pq iTCO_wdt iTCO_vendor_support snd_hda_codec_realtek raid6_pq async_xor kvm_intel kvm crc32c_intel ghash_clmulni_intel aesni_intel aes_x86_64 ablk_helper cryptd xor xts async_tx lrw gf128mul uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core snd_usb_audio videodev snd_usbmidi_lib snd_rawmidi eeepc_wmi asus_wmi snd_seq_device sparse_keymap pci_hotplug md_mod media mxm_wmi evdev joydev btusb bluetooth nvidia(PO) rfkill psmouse microcode serio_raw pcspkr lpc_ich snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_timer snd i2c_core e1000e soundcore video thermal fan msr wmi mei acpi_cpufreq mperf button processor coretemp pppoe pppox ppp_generic slhc nf_nat_snmp_basic nf_conntrack_snmp nf_conntrack_broadcast nf_nat_p!
 roto_sctp crc32c libcrc32c nf_nat_proto_dccp nf_nat_proto_udplite nf_nat_amanda ts_kmp nf_conntrack_amanda nf_nat_irc nf_conntrack_irc nf_nat_sip nf_conntrack_sip nf_nat_tftp nf_conntrack_tftp nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_ftp nf_conntrack_ftp nf_nat nf_conntrack fuse loop ext4 crc16 jbd2 mbcache usb_storage hid_generic usbhid hid sr_mod cdrom sd_mod ehci_hcd ahci libahci libata xhci_hcd scsi_mod usbcore usb_common
>  Pid: 27771, comm: kworker/0:2 Tainted: P           O 3.7.1-426-bfs #1
>  Call Trace:
>   [<ffffffff8105740f>] warn_slowpath_common+0x7f/0xc0
>   [<ffffffff810728d0>] ? cwq_activate_delayed_work+0xc0/0xc0
>   [<ffffffff8105746a>] warn_slowpath_null+0x1a/0x20
>   [<ffffffff810729c5>] worker_enter_idle+0xd5/0x130
>   [<ffffffff81076481>] worker_thread+0x1f1/0x400
>   [<ffffffff81492fd9>] ? preempt_schedule+0x49/0x70
>   [<ffffffff81076290>] ? rescuer_thread+0x240/0x240
>   [<ffffffff8107aec0>] kthread+0xc0/0xd0
>   [<ffffffff81010000>] ? perf_trace_xen_mmu_set_pte_at+0xb0/0x100
>   [<ffffffff8107ae00>] ? kthread_freezable_should_stop+0x70/0x70
>   [<ffffffff8149b76c>] ret_from_fork+0x7c/0xb0
>   [<ffffffff8107ae00>] ? kthread_freezable_should_stop+0x70/0x70
>  ---[ end trace b0d33c4a02723ea3 ]---

i.e. this one.

But, quite frankly - you've got a tainted kernel:

	vboxpci(O) vboxnetflt(O) vboxnetadp(O) vboxdrv(O) nvidia(PO)

and you are using an out of tree scheduler patch (bfs). Hence I
don't think anyone is going to waste time trying to track this down
as it stands.  If you can reproduce it on an untainted, unpatched
kernel then you'll have somethign we might be able to debug....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs