The problem got slightly better when I upgraded all kernels, on host and guest, so that the "MTBF" went from 3-4 days to approx 50. Still, the problem is not solved, yet. A maybe stupid question: If the kernel in the guest sees an I/O error on sda, could this be a real error on the physical disk, even if there are no notices in the physical hosts log files, or is this more of a software problem? As the next step, I'll try to update the physical servers firmware. Any suggestion on this topic is welcome, even more then before. Reagrds, Peter On 02/29/2012 08:53 AM, Peter Hopfgartner wrote: > We have a CentOS 6.2 server with KVM. That server hosts 2 virtual > machines, both with Centos 6.2, too. > > Regularly, one or both of the virtual machines pass to state "pause" > without apparent reason. > On resume, I do get have messages, like the following in /var/log/messages. > > Feb 28 21:50:45 achernar fcoemon: Failed to connect to lldpad > Feb 29 08:23:56 achernar kernel: sd 0:0:0:0: [sda] Unhandled error code > Feb 29 08:23:56 achernar kernel: sd 0:0:0:0: [sda] Result: > hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT > Feb 29 08:23:56 achernar kernel: sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 > 06 db 70 78 00 00 38 00 > Feb 29 08:23:56 achernar kernel: end_request: I/O error, dev sda, sector > 115044472 > Feb 29 08:23:56 achernar kernel: Buffer I/O error on device dm-0, > logical block 14252047 > Feb 29 08:23:56 achernar kernel: lost page write due to I/O error on dm-0 > Feb 29 08:23:56 achernar kernel: Buffer I/O error on device dm-0, > logical block 14252048 > Feb 29 08:23:56 achernar kernel: lost page write due to I/O error on dm-0 > Feb 29 08:23:56 achernar kernel: Buffer I/O error on device dm-0, > logical block 14252049 > Feb 29 08:23:56 achernar kernel: lost page write due to I/O error on dm-0 > Feb 29 08:23:56 achernar kernel: Buffer I/O error on device dm-0, > logical block 14252050 > Feb 29 08:23:56 achernar kernel: lost page write due to I/O error on dm-0 > Feb 29 08:23:56 achernar kernel: Buffer I/O error on device dm-0, > logical block 14252051 > Feb 29 08:23:56 achernar kernel: lost page write due to I/O error on dm-0 > Feb 29 08:23:56 achernar kernel: Buffer I/O error on device dm-0, > logical block 14252052 > Feb 29 08:23:56 achernar kernel: lost page write due to I/O error on dm-0 > Feb 29 08:23:56 achernar kernel: Buffer I/O error on device dm-0, > logical block 14252053 > Feb 29 08:23:56 achernar kernel: lost page write due to I/O error on dm-0 > Feb 29 08:23:57 achernar fcoemon: error 111 Connection refused > > > I could not find any sensible message on the pysical host, neither in > /var/log/messages nor in /var/log/libvirt. > > We do have an almost identical server, same hardware, same software > which does not have this problem. > > How could I proceed to better diagnose the cause of the troubles? > > Regards, > -- Peter Hopfgartner web : http://www.r3-gis.com _______________________________________________ CentOS-virt mailing list CentOS-virt@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos-virt