Re: vm fs corrupt after pgs stuck

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/02/2014 01:40 PM, James Harper wrote:

I just had to restore an ms exchange database after an ceph hiccup (no actual
data lost - Exchange is very good like that with its no loss restore!). The order
of events went something like:

. Loss of connection on osd to the cluster network (public network was okay)
. pgs reported stuck
. stopped osd on the bad server
. resolved network problem
. restarted osd on the bad server
. noticed that the vm running exchange had hung
. rebooted and vm did a chkdsk automatically
. exchange refused to mount the main mailbox store

I'm not using rbd caching or anything, so for ntfs to lose files like that means
something fairly nasty happened. My best guess is that the loss of
connectivity and function while ceph was figuring out what was going on
meant that windows IO was frozen and started timing out, but I still can't see
how that could result in corruption.

NTFS may have gotten confused if some I/Os completed fine but others
timed out. It looks like ntfs journals metadata, but not data, so it
could lose data not written out yet after this kind of failure,
assuming it stops doing I/O after some timeouts are hit, so it's
similar to a sudden power loss. If the application was not doing the
windows equivalent of O_SYNC it could still lose writes. I'm not too
familiar with windows, but perhaps there's a way to configure disk
timeout behavior or NTFS writeback.

Any suggestions on how I could avoid this situation in the future would be
greatly appreciated!


Forgot to mention. This has also happened once previously when the OOM killer targeted ceph-osd.

If this caused I/O timeouts, it would make sense. If you can't adjust
the guest timeouts, you might want to decrease the ceph timeouts for
noticing and marking out osds with network or other issues.

Josh
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux