> > I just had to restore an ms exchange database after an ceph hiccup (no actual > data lost - Exchange is very good like that with its no loss restore!). The order > of events went something like: > > . Loss of connection on osd to the cluster network (public network was okay) > . pgs reported stuck > . stopped osd on the bad server > . resolved network problem > . restarted osd on the bad server > . noticed that the vm running exchange had hung > . rebooted and vm did a chkdsk automatically > . exchange refused to mount the main mailbox store > > I'm not using rbd caching or anything, so for ntfs to lose files like that means > something fairly nasty happened. My best guess is that the loss of > connectivity and function while ceph was figuring out what was going on > meant that windows IO was frozen and started timing out, but I still can't see > how that could result in corruption. > > Any suggestions on how I could avoid this situation in the future would be > greatly appreciated! > Forgot to mention. This has also happened once previously when the OOM killer targeted ceph-osd. James _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com