Daniel Dehennin <daniel.dehennin@xxxxxxxxxxxx> writes: [...] > We are using 3.1.6-0ubuntu1. > > Running an fsck is quite expensive for us, 4 hours with the shared FS > unusable. > > I forgot to say that it stores qcow2 images, so there should not be > concurrency on the file system except on some directories to > create/access sub directories: > > <GFS2 mount point>/<DIRECTORY OF RUNNING VMs>/<VM ID>/<QCOW2 images> > > Only the <DIRECTORY OF RUNNING VMs> should have concurrent write > accesses, everything under <VM ID> is accessed only by one node at a > time, except for monitoring which is read only. > > So “looks like it is trying to free a block that is already marked as > being free” looks strange. Now the kernel gave me a warning, if it could help: Feb 15 14:13:07 nebula3 kernel: [16423.261927] ------------[ cut here ]------------ Feb 15 14:13:07 nebula3 kernel: [16423.261943] WARNING: CPU: 8 PID: 4410 at /build/linux-OTIHGI/linux-3.13.0/mm/page_alloc.c:1604 get_page_from_freelist+0x924/0x930() Feb 15 14:13:07 nebula3 kernel: [16423.261945] Modules linked in: vhost_net vhost macvtap macvlan gfs2 dlm sctp configfs ip6table_filter ip6_tables iptable_filter ip_tables x_tables dm_round_robin openvswitch gre vxlan ip_tunnel nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache bonding x86_pkg_temp_thermal intel_powerclamp ipmi_devintf gpio_ich coretemp dcdbas kvm_intel kvm crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd dm_multipath joydev scsi_dh mei_me shpchp mei sb_edac ipmi_si edac_core lpc_ich acpi_power_meter mac_hid wmi iTCO_wdt iTCO_vendor_support ses enclosure hid_generic qla2xxx usbhid hid ahci scsi_transport_fc libahci bnx2x tg3 megaraid_sas ptp scsi_tgt pps_core mdio libcrc32c Feb 15 14:13:07 nebula3 kernel: [16423.262017] CPU: 8 PID: 4410 Comm: rm Not tainted 3.13.0-78-generic #122-Ubuntu Feb 15 14:13:07 nebula3 kernel: [16423.262019] Hardware name: Dell Inc. PowerEdge M620/0T36VK, BIOS 2.2.7 01/21/2014 Feb 15 14:13:07 nebula3 kernel: [16423.262022] 0000000000000009 ffff882e5f9f7820 ffffffff81725768 0000000000000000 Feb 15 14:13:07 nebula3 kernel: [16423.262028] ffff882e5f9f7858 ffffffff810678bd 0000000000000004 00000000000035de Feb 15 14:13:07 nebula3 kernel: [16423.262033] 0000000000000001 ffff88187fffbf00 0000000000000000 ffff882e5f9f7868 Feb 15 14:13:07 nebula3 kernel: [16423.262037] Call Trace: Feb 15 14:13:07 nebula3 kernel: [16423.262046] [<ffffffff81725768>] dump_stack+0x45/0x56 Feb 15 14:13:07 nebula3 kernel: [16423.262052] [<ffffffff810678bd>] warn_slowpath_common+0x7d/0xa0 Feb 15 14:13:07 nebula3 kernel: [16423.262056] [<ffffffff8106799a>] warn_slowpath_null+0x1a/0x20 Feb 15 14:13:07 nebula3 kernel: [16423.262060] [<ffffffff81159134>] get_page_from_freelist+0x924/0x930 Feb 15 14:13:07 nebula3 kernel: [16423.262091] [<ffffffff8101289e>] ? __switch_to+0x3fe/0x4d0 Feb 15 14:13:07 nebula3 kernel: [16423.262096] [<ffffffff811592c4>] __alloc_pages_nodemask+0x184/0xb80 Feb 15 14:13:07 nebula3 kernel: [16423.262102] [<ffffffff8114f86e>] ? find_get_page+0x1e/0xa0 Feb 15 14:13:07 nebula3 kernel: [16423.262111] [<ffffffff8114fe00>] ? find_lock_page+0x30/0x70 Feb 15 14:13:07 nebula3 kernel: [16423.262115] [<ffffffff81150404>] ? find_or_create_page+0x34/0x90 Feb 15 14:13:07 nebula3 kernel: [16423.262125] [<ffffffff8136aa2e>] ? radix_tree_lookup_slot+0xe/0x10 Feb 15 14:13:07 nebula3 kernel: [16423.262134] [<ffffffff81198153>] alloc_pages_current+0xa3/0x160 Feb 15 14:13:07 nebula3 kernel: [16423.262144] [<ffffffff8115432e>] __get_free_pages+0xe/0x50 Feb 15 14:13:07 nebula3 kernel: [16423.262157] [<ffffffff8117125e>] kmalloc_order_trace+0x2e/0xa0 Feb 15 14:13:07 nebula3 kernel: [16423.262170] [<ffffffff810ab0f5>] ? wake_up_bit+0x25/0x30 Feb 15 14:13:07 nebula3 kernel: [16423.262177] [<ffffffff811a3301>] __kmalloc+0x211/0x230 Feb 15 14:13:07 nebula3 kernel: [16423.262192] [<ffffffffa05c15f6>] gfs2_rlist_alloc+0x26/0x70 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262199] [<ffffffffa059cd5d>] recursive_scan+0x29d/0x6a0 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262206] [<ffffffffa059cf2c>] recursive_scan+0x46c/0x6a0 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262217] [<ffffffffa05bb4f5>] ? gfs2_quota_hold+0x175/0x1f0 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262224] [<ffffffffa059d25a>] trunc_dealloc+0xfa/0x120 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262232] [<ffffffffa05a898e>] ? gfs2_glock_wait+0x3e/0x80 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262240] [<ffffffffa05aa190>] ? gfs2_glock_nq+0x280/0x430 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262247] [<ffffffffa059eef0>] gfs2_file_dealloc+0x10/0x20 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262257] [<ffffffffa05c1db3>] gfs2_evict_inode+0x2b3/0x3e0 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262276] [<ffffffffa05c1c13>] ? gfs2_evict_inode+0x113/0x3e0 [gfs2] Feb 15 14:13:07 nebula3 kernel: [16423.262286] [<ffffffff811d9a40>] evict+0xb0/0x1b0 Feb 15 14:13:07 nebula3 kernel: [16423.262290] [<ffffffff811da255>] iput+0xf5/0x180 Feb 15 14:13:07 nebula3 kernel: [16423.262296] [<ffffffff811cebae>] do_unlinkat+0x18e/0x2b0 Feb 15 14:13:07 nebula3 kernel: [16423.262305] [<ffffffff811bbc06>] ? filp_close+0x56/0x70 Feb 15 14:13:07 nebula3 kernel: [16423.262310] [<ffffffff811cfadb>] SyS_unlinkat+0x1b/0x40 Feb 15 14:13:07 nebula3 kernel: [16423.262315] [<ffffffff8173635d>] system_call_fastpath+0x1a/0x1f Feb 15 14:13:07 nebula3 kernel: [16423.262318] ---[ end trace 346ccba5c58117dc ]--- Regards. -- Daniel Dehennin Récupérer ma clef GPG: gpg --recv-keys 0xCC1E9E5B7A6FE2DF Fingerprint: 3E69 014E 5C23 50E8 9ED6 2AAD CC1E 9E5B 7A6F E2DF
Attachment:
signature.asc
Description: PGP signature
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster