Hi, any idea of the root cause of this, inside a KVM VM, running qcow2 on cephfs dmesg shows: [846193.473396] ata1.00: status: { DRDY } [846196.231058] ata1: soft resetting link [846196.386714] ata1.01: NODEV after polling detection [846196.391048] ata1.00: configured for MWDMA2 [846196.391053] ata1.00: retrying FLUSH 0xea Emask 0x4 [846196.391671] ata1: EH complete [1019646.935659] UDP: bad checksum. From 122.224.153.109:46252 to 193.24.210.48:161 ulen 49 [1107679.421951] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [1107679.423407] ata1.00: failed command: FLUSH CACHE EXT [1107679.424871] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) [1107679.427596] ata1.00: status: { DRDY } [1107684.482035] ata1: link is slow to respond, please be patient (ready=0) [1107689.480237] ata1: device not ready (errno=-16), forcing hardreset [1107689.480267] ata1: soft resetting link [1107689.637701] ata1.00: configured for MWDMA2 [1107689.637707] ata1.00: retrying FLUSH 0xea Emask 0x4 [1107704.638255] ata1.00: qc timeout (cmd 0xea) [1107704.638282] ata1.00: FLUSH failed Emask 0x4 [1107709.687013] ata1: link is slow to respond, please be patient (ready=0) [1107710.095069] ata1: soft resetting link [1107710.246403] ata1.01: NODEV after polling detection [1107710.247225] ata1.00: configured for MWDMA2 [1107710.247229] ata1.00: retrying FLUSH 0xea Emask 0x4 [1107710.248170] ata1: EH complete [1199723.323256] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [1199723.324769] ata1.00: failed command: FLUSH CACHE EXT [1199723.326734] ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 res 40/00:01:00:00:00/00:00:00:00:00/a0 Emask 0x4 (timeout) Hostmachine is running Kernel 4.5.4 Hostmachine dmesg: [1235641.055673] INFO: task qemu-kvm:18287 blocked for more than 120 seconds. [1235641.056066] Not tainted 4.5.4ceph-vps-default #1 [1235641.056315] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [1235641.056583] qemu-kvm D ffff8812f939bb58 0 18287 1 0x00000080 [1235641.056587] ffff8812f939bb58 ffff881034c02b80 ffff881b7044ab80 ffff8812f939c000 [1235641.056590] 0000000000000000 7fffffffffffffff ffff881c7ffd7b70 ffffffff818c1d90 [1235641.056592] ffff8812f939bb70 ffffffff818c1525 ffff88103fa16d00 ffff8812f939bc18 [1235641.056594] Call Trace: [1235641.056603] [<ffffffff818c1d90>] ? bit_wait+0x50/0x50 [1235641.056605] [<ffffffff818c1525>] schedule+0x35/0x80 [1235641.056609] [<ffffffff818c41d1>] schedule_timeout+0x231/0x2d0 [1235641.056613] [<ffffffff8115a19c>] ? ktime_get+0x3c/0xb0 [1235641.056622] [<ffffffff818c1d90>] ? bit_wait+0x50/0x50 [1235641.056624] [<ffffffff818c0b96>] io_schedule_timeout+0xa6/0x110 [1235641.056626] [<ffffffff818c1dab>] bit_wait_io+0x1b/0x60 [1235641.056627] [<ffffffff818c1950>] __wait_on_bit+0x60/0x90 [1235641.056632] [<ffffffff811eb46b>] wait_on_page_bit+0xcb/0xf0 [1235641.056636] [<ffffffff8112c6e0>] ? autoremove_wake_function+0x40/0x40 [1235641.056638] [<ffffffff811eb58f>] __filemap_fdatawait_range+0xff/0x180 [1235641.056641] [<ffffffff811eda61>] ? __filemap_fdatawrite_range+0xd1/0x100 [1235641.056644] [<ffffffff811eb624>] filemap_fdatawait_range+0x14/0x30 [1235641.056646] [<ffffffff811edb9f>] filemap_write_and_wait_range+0x3f/0x70 [1235641.056649] [<ffffffff814383f9>] ceph_fsync+0x69/0x5c0 [1235641.056656] [<ffffffff811678dd>] ? do_futex+0xfd/0x530 [1235641.056663] [<ffffffff812a737d>] vfs_fsync_range+0x3d/0xb0 [1235641.056668] [<ffffffff810038e9>] ? syscall_trace_enter_phase1+0x139/0x150 [1235641.056670] [<ffffffff812a744d>] do_fsync+0x3d/0x70 [1235641.056673] [<ffffffff812a7703>] SyS_fdatasync+0x13/0x20 [1235641.056676] [<ffffffff818c506e>] entry_SYSCALL_64_fastpath+0x12/0x71 This sometimes happens, on a healthy cluster, running ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374) OSD Servers running Kernel 4.5.5 Maybe it will cause the VM to refuse IO and has to be restarted. Maybe not and it will continue. Any input is appriciated. Thank you ! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:info@xxxxxxxxxxxxxxxxx Anschrift: IP Interactive UG ( haftungsbeschraenkt ) Zum Sonnenberg 1-3 63571 Gelnhausen HRB 93402 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic Steuer Nr.: 35 236 3622 1 UST ID: DE274086107 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com