I've noticed one of our postgresql servers having a high load, so I looked into it. Kernel = 3.2.15, disk space is free: # df /dev/sda2 6281216 457888 5823328 8% /disks/work /dev/sdc1 52402180 10135176 42267004 20% /disks/pg-db And I got these warnings since Jul 12 until today: Jul 12 00:31:41 db24 kernel: XFS (sda2): xlog_space_left: head behind tail Jul 12 00:31:41 db24 kernel: tail_cycle = 4, tail_bytes = 248320 Jul 12 00:31:41 db24 kernel: GH cycle = 4, GH bytes = 248312 [snip] Aug 8 03:45:48 db24 kernel: XFS (sdc1): xlog_space_left: head behind tail Aug 8 03:45:48 db24 kernel: tail_cycle = 607, tail_bytes = 21702144 Aug 8 03:45:48 db24 kernel: GH cycle = 607, GH bytes = 21701856 And this morning it changed into warnings: Aug 9 10:34:57 db24 kernel: INFO: task vmtoolsd:1426 blocked for more than 120 seconds. Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:34:57 db24 kernel: ffff880059e0fc78 0000000000000082 ffff880059af3ba0 0000000000000001 Aug 9 10:34:57 db24 kernel: ffff880059e0e010 ffff880058ea5cc0 0000000000010940 ffff880059e0ffd8 Aug 9 10:34:57 db24 kernel: ffff880059e0ffd8 0000000000010940 ffff880056264f80 ffff880058ea5cc0 Aug 9 10:34:57 db24 kernel: Call Trace: Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:34:57 db24 kernel: [<ffffffff8136fb1e>] rwsem_down_failed_common+0xe4/0x116 Aug 9 10:34:57 db24 kernel: [<ffffffff8136fb63>] rwsem_down_write_failed+0x13/0x15 Aug 9 10:34:57 db24 kernel: [<ffffffff811c4503>] call_rwsem_down_write_failed+0x13/0x20 Aug 9 10:34:57 db24 kernel: [<ffffffff8136f35c>] ? down_write+0x20/0x22 Aug 9 10:34:57 db24 kernel: [<ffffffff810d1e44>] thaw_super+0x1c/0xae Aug 9 10:34:57 db24 kernel: [<ffffffff810f6b5e>] thaw_bdev+0x45/0x74 Aug 9 10:34:57 db24 kernel: [<ffffffffa002019c>] VmSyncThawDevices+0x56/0x9e [vmsync] Aug 9 10:34:57 db24 kernel: [<ffffffffa0020542>] VmSyncUnlockedIoctl+0x1b4/0x208 [vmsync] Aug 9 10:34:57 db24 kernel: [<ffffffff81115dd1>] proc_reg_unlocked_ioctl+0x82/0xa1 Aug 9 10:34:57 db24 kernel: [<ffffffff8102c05d>] ? pick_next_task_fair+0xfc/0x10c Aug 9 10:34:57 db24 kernel: [<ffffffffa002038e>] ? VmSyncAddPath+0xff/0xff [vmsync] Aug 9 10:34:57 db24 kernel: [<ffffffff810dcf9f>] do_vfs_ioctl+0x3f6/0x47b Aug 9 10:34:57 db24 kernel: [<ffffffff810dd066>] sys_ioctl+0x42/0x65 Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b Aug 9 10:34:57 db24 kernel: INFO: task postmaster:3112 blocked for more than 120 seconds. Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:34:57 db24 kernel: ffff8800563f5d18 0000000000000082 0000000000000000 0000000000000000 Aug 9 10:34:57 db24 kernel: ffff8800563f4010 ffff8800590d93e0 0000000000010940 ffff8800563f5fd8 Aug 9 10:34:57 db24 kernel: ffff8800563f5fd8 0000000000010940 ffff88005ab9c240 ffff8800590d93e0 Aug 9 10:34:57 db24 kernel: Call Trace: Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:34:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224 Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25 Aug 9 10:34:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106 Aug 9 10:34:57 db24 kernel: [<ffffffff8100875a>] ? read_tsc+0x9/0x1b Aug 9 10:34:57 db24 kernel: [<ffffffff810528b0>] ? timekeeping_get_ns+0x15/0x34 Aug 9 10:34:57 db24 kernel: [<ffffffff81053370>] ? ktime_get_ts+0x80/0x89 Aug 9 10:34:57 db24 kernel: [<ffffffff810dda04>] ? poll_select_copy_remaining+0xc6/0xea Aug 9 10:34:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105 Aug 9 10:34:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69 Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b Aug 9 10:34:57 db24 kernel: INFO: task postmaster:3115 blocked for more than 120 seconds. Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:34:57 db24 kernel: ffff8800563edb38 0000000000000082 0000000000000000 ffff880000000000 Aug 9 10:34:57 db24 kernel: ffff8800563ec010 ffff8800590dcf80 0000000000010940 ffff8800563edfd8 Aug 9 10:34:57 db24 kernel: ffff8800563edfd8 0000000000010940 ffff88005ab9c240 ffff8800590dcf80 Aug 9 10:34:57 db24 kernel: Call Trace: Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:34:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7 Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25 Aug 9 10:34:57 db24 kernel: [<ffffffff8115b99a>] xfs_create+0x18c/0x4c7 Aug 9 10:34:57 db24 kernel: [<ffffffff81150b56>] ? xfs_ilock+0x59/0x5e Aug 9 10:34:57 db24 kernel: [<ffffffff810df065>] ? spin_lock+0x9/0xb Aug 9 10:34:57 db24 kernel: [<ffffffff810df877>] ? _d_rehash+0x1f/0x22 Aug 9 10:34:57 db24 kernel: [<ffffffff811541d1>] xfs_vn_mknod+0xca/0x15e Aug 9 10:34:57 db24 kernel: [<ffffffff81154280>] xfs_vn_create+0xb/0xd Aug 9 10:34:57 db24 kernel: [<ffffffff810da7ad>] vfs_create+0x4e/0x6e Aug 9 10:34:57 db24 kernel: [<ffffffff810db68c>] do_last+0x201/0x502 Aug 9 10:34:57 db24 kernel: [<ffffffff810dba61>] path_openat+0xd4/0x31a Aug 9 10:34:57 db24 kernel: [<ffffffff810c13f4>] ? kmem_cache_free+0x15/0x6e Aug 9 10:34:57 db24 kernel: [<ffffffff810dbd96>] do_filp_open+0x38/0x86 Aug 9 10:34:57 db24 kernel: [<ffffffff810e507a>] ? alloc_fd+0x116/0x129 Aug 9 10:34:57 db24 kernel: [<ffffffff810ce147>] do_sys_open+0x6e/0x107 Aug 9 10:34:57 db24 kernel: [<ffffffff810ce209>] sys_open+0x1b/0x1d Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b Aug 9 10:34:57 db24 kernel: INFO: task flush-259:26214:19297 blocked for more than 120 seconds. Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:34:57 db24 kernel: ffff880052039b10 0000000000000046 ffff880052039aa0 ffff880047536078 Aug 9 10:34:57 db24 kernel: ffff880052038010 ffff880056263500 0000000000010940 ffff880052039fd8 Aug 9 10:34:57 db24 kernel: ffff880052039fd8 0000000000010940 ffff880059620000 ffff880056263500 Aug 9 10:34:57 db24 kernel: Call Trace: Aug 9 10:34:57 db24 kernel: [<ffffffff8104c605>] ? spin_unlock_irqrestore+0x9/0xb Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:34:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7 Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25 Aug 9 10:34:57 db24 kernel: [<ffffffff8109180a>] ? generic_writepages+0x4b/0x57 Aug 9 10:34:57 db24 kernel: [<ffffffff8115906b>] xfs_log_dirty_inode+0x2b/0x9f Aug 9 10:34:57 db24 kernel: [<ffffffff8115716d>] xfs_fs_write_inode+0x47/0x107 Aug 9 10:34:57 db24 kernel: [<ffffffff810ec01b>] writeback_single_inode+0x1b8/0x2ef Aug 9 10:34:57 db24 kernel: [<ffffffff810ec705>] writeback_sb_inodes+0x168/0x201 Aug 9 10:34:57 db24 kernel: [<ffffffff810eccfe>] __writeback_inodes_wb+0x6d/0xab Aug 9 10:34:57 db24 kernel: [<ffffffff810ecea6>] wb_writeback+0xf1/0x18b Aug 9 10:34:57 db24 kernel: [<ffffffff8101f406>] ? default_spin_lock_flags+0x9/0xf Aug 9 10:34:57 db24 kernel: [<ffffffff8136fbc4>] ? _raw_spin_lock_irqsave+0x22/0x2b Aug 9 10:34:57 db24 kernel: [<ffffffff810ed07f>] wb_do_writeback+0x13f/0x1a0 Aug 9 10:34:57 db24 kernel: [<ffffffff8103fdbe>] ? del_timer+0x81/0x81 Aug 9 10:34:57 db24 kernel: [<ffffffff810ed170>] bdi_writeback_thread+0x90/0x164 Aug 9 10:34:57 db24 kernel: [<ffffffff810ed0e0>] ? wb_do_writeback+0x1a0/0x1a0 Aug 9 10:34:57 db24 kernel: [<ffffffff8104c17c>] kthread+0x7d/0x85 Aug 9 10:34:57 db24 kernel: [<ffffffff81371eb4>] kernel_thread_helper+0x4/0x10 Aug 9 10:34:57 db24 kernel: [<ffffffff8104c0ff>] ? kthread_worker_fn+0x153/0x153 Aug 9 10:34:57 db24 kernel: [<ffffffff81371eb0>] ? gs_change+0x13/0x13 Aug 9 10:34:57 db24 kernel: INFO: task postmaster:26077 blocked for more than 120 seconds. Aug 9 10:34:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:34:57 db24 kernel: ffff8800257e7d18 0000000000000086 ffffea00014d6680 0000000000000000 Aug 9 10:34:57 db24 kernel: ffff8800257e6010 ffff880056261a80 0000000000010940 ffff8800257e7fd8 Aug 9 10:34:57 db24 kernel: ffff8800257e7fd8 0000000000010940 ffffffff8160d020 ffff880056261a80 Aug 9 10:34:57 db24 kernel: Call Trace: Aug 9 10:34:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:34:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224 Aug 9 10:34:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25 Aug 9 10:34:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106 Aug 9 10:34:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105 Aug 9 10:34:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69 Aug 9 10:34:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b Aug 9 10:36:57 db24 kernel: INFO: task vmtoolsd:1426 blocked for more than 120 seconds. Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:36:57 db24 kernel: ffff880059e0fc78 0000000000000082 ffff880059af3ba0 0000000000000001 Aug 9 10:36:57 db24 kernel: ffff880059e0e010 ffff880058ea5cc0 0000000000010940 ffff880059e0ffd8 Aug 9 10:36:57 db24 kernel: ffff880059e0ffd8 0000000000010940 ffff880056264f80 ffff880058ea5cc0 Aug 9 10:36:57 db24 kernel: Call Trace: Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:36:57 db24 kernel: [<ffffffff8136fb1e>] rwsem_down_failed_common+0xe4/0x116 Aug 9 10:36:57 db24 kernel: [<ffffffff8136fb63>] rwsem_down_write_failed+0x13/0x15 Aug 9 10:36:57 db24 kernel: [<ffffffff811c4503>] call_rwsem_down_write_failed+0x13/0x20 Aug 9 10:36:57 db24 kernel: [<ffffffff8136f35c>] ? down_write+0x20/0x22 Aug 9 10:36:57 db24 kernel: [<ffffffff810d1e44>] thaw_super+0x1c/0xae Aug 9 10:36:57 db24 kernel: [<ffffffff810f6b5e>] thaw_bdev+0x45/0x74 Aug 9 10:36:57 db24 kernel: [<ffffffffa002019c>] VmSyncThawDevices+0x56/0x9e [vmsync] Aug 9 10:36:57 db24 kernel: [<ffffffffa0020542>] VmSyncUnlockedIoctl+0x1b4/0x208 [vmsync] Aug 9 10:36:57 db24 kernel: [<ffffffff81115dd1>] proc_reg_unlocked_ioctl+0x82/0xa1 Aug 9 10:36:57 db24 kernel: [<ffffffff8102c05d>] ? pick_next_task_fair+0xfc/0x10c Aug 9 10:36:57 db24 kernel: [<ffffffffa002038e>] ? VmSyncAddPath+0xff/0xff [vmsync] Aug 9 10:36:57 db24 kernel: [<ffffffff810dcf9f>] do_vfs_ioctl+0x3f6/0x47b Aug 9 10:36:57 db24 kernel: [<ffffffff810dd066>] sys_ioctl+0x42/0x65 Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b Aug 9 10:36:57 db24 kernel: INFO: task postmaster:3112 blocked for more than 120 seconds. Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:36:57 db24 kernel: ffff8800563f5d18 0000000000000082 0000000000000000 0000000000000000 Aug 9 10:36:57 db24 kernel: ffff8800563f4010 ffff8800590d93e0 0000000000010940 ffff8800563f5fd8 Aug 9 10:36:57 db24 kernel: ffff8800563f5fd8 0000000000010940 ffff88005ab9c240 ffff8800590d93e0 Aug 9 10:36:57 db24 kernel: Call Trace: Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:36:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224 Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25 Aug 9 10:36:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106 Aug 9 10:36:57 db24 kernel: [<ffffffff8100875a>] ? read_tsc+0x9/0x1b Aug 9 10:36:57 db24 kernel: [<ffffffff810528b0>] ? timekeeping_get_ns+0x15/0x34 Aug 9 10:36:57 db24 kernel: [<ffffffff81053370>] ? ktime_get_ts+0x80/0x89 Aug 9 10:36:57 db24 kernel: [<ffffffff810dda04>] ? poll_select_copy_remaining+0xc6/0xea Aug 9 10:36:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105 Aug 9 10:36:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69 Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b Aug 9 10:36:57 db24 kernel: INFO: task postmaster:3115 blocked for more than 120 seconds. Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:36:57 db24 kernel: ffff8800563edb38 0000000000000082 0000000000000000 ffff880000000000 Aug 9 10:36:57 db24 kernel: ffff8800563ec010 ffff8800590dcf80 0000000000010940 ffff8800563edfd8 Aug 9 10:36:57 db24 kernel: ffff8800563edfd8 0000000000010940 ffff88005ab9c240 ffff8800590dcf80 Aug 9 10:36:57 db24 kernel: Call Trace: Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:36:57 db24 kernel: [<ffffffff8118a64a>] xfs_trans_alloc+0x79/0xa7 Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25 Aug 9 10:36:57 db24 kernel: [<ffffffff8115b99a>] xfs_create+0x18c/0x4c7 Aug 9 10:36:57 db24 kernel: [<ffffffff81150b56>] ? xfs_ilock+0x59/0x5e Aug 9 10:36:57 db24 kernel: [<ffffffff810df065>] ? spin_lock+0x9/0xb Aug 9 10:36:57 db24 kernel: [<ffffffff810df877>] ? _d_rehash+0x1f/0x22 Aug 9 10:36:57 db24 kernel: [<ffffffff811541d1>] xfs_vn_mknod+0xca/0x15e Aug 9 10:36:57 db24 kernel: [<ffffffff81154280>] xfs_vn_create+0xb/0xd Aug 9 10:36:57 db24 kernel: [<ffffffff810da7ad>] vfs_create+0x4e/0x6e Aug 9 10:36:57 db24 kernel: [<ffffffff810db68c>] do_last+0x201/0x502 Aug 9 10:36:57 db24 kernel: [<ffffffff810dba61>] path_openat+0xd4/0x31a Aug 9 10:36:57 db24 kernel: [<ffffffff810c13f4>] ? kmem_cache_free+0x15/0x6e Aug 9 10:36:57 db24 kernel: [<ffffffff810dbd96>] do_filp_open+0x38/0x86 Aug 9 10:36:57 db24 kernel: [<ffffffff810e507a>] ? alloc_fd+0x116/0x129 Aug 9 10:36:57 db24 kernel: [<ffffffff810ce147>] do_sys_open+0x6e/0x107 Aug 9 10:36:57 db24 kernel: [<ffffffff810ce209>] sys_open+0x1b/0x1d Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b Aug 9 10:36:57 db24 kernel: INFO: task postmaster:13762 blocked for more than 120 seconds. Aug 9 10:36:57 db24 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 9 10:36:57 db24 kernel: ffff880029025d18 0000000000000082 ffff880000000000 0000000000000000 Aug 9 10:36:57 db24 kernel: ffff880029024010 ffff880056260d40 0000000000010940 ffff880029025fd8 Aug 9 10:36:57 db24 kernel: ffff880029025fd8 0000000000010940 ffffffff8160d020 ffff880056260d40 Aug 9 10:36:57 db24 kernel: Call Trace: Aug 9 10:36:57 db24 kernel: [<ffffffff8136e926>] schedule+0x55/0x57 Aug 9 10:36:57 db24 kernel: [<ffffffff8114ed8b>] xfs_file_aio_write+0x118/0x224 Aug 9 10:36:57 db24 kernel: [<ffffffff8104c59a>] ? wake_up_bit+0x25/0x25 Aug 9 10:36:57 db24 kernel: [<ffffffff810cf526>] do_sync_write+0xc9/0x106 Aug 9 10:36:57 db24 kernel: [<ffffffff810cfe57>] vfs_write+0xa9/0x105 Aug 9 10:36:57 db24 kernel: [<ffffffff810cff69>] sys_write+0x45/0x69 Aug 9 10:36:57 db24 kernel: [<ffffffff813705d2>] system_call_fastpath+0x16/0x1b I then tried # echo 3 >/proc/sys/vm/drop_caches but the system hung completely. This is a VMware VM, and I have no idea how to send SYS-T or similar keys to it. I had to reset it (reboot hung), and everything worked again. This VM is now on kernel 3.5 -- mit freundlichen Grüssen, Michael Monnerie, Ing. BSc it-management Internet Services: Protéger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531
Attachment:
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs