Hi Sage, Thanks for your information and let me know at least my hardwares are OK...:) I will upgrade my ceph version to the latest one and see what I can find. By the way, could you tell me how to set "osd heartbeat grace" ? Because I can't find how to do it in the ceph wiki. Thanks! -- Best Regards, Sylar Shen 2011/3/28 Sage Weil <sage@xxxxxxxxxxxx>: > On Mon, 28 Mar 2011, Sylar Shen wrote: >> Hi, >> I set an environment of 20 servers which include 2 MDSs, 3 MONs and 18 >> OSDes(3 monitors on 18 OSDes) >> My version is 0.24.3 and OS is Fedora 14. >> There's a problem when I was doing the writing tests. >> Whether I was writing the data or not, some OSDes were randomly marked >> down and out one by one after a period of time. >> And when that happened, the whole performance soon got worse and worse. >> I checked the /var/log/ceph/osd.log but found nothing. >> So I am curious that is there anyone who has the same problem with me? >> Or maybe it's just a problem of my hardware......>< > > Hi Sylar, > > This is/was a known problem. ÂThere's a long thread from a couple weeks > back with Jim Schutt debugging the issue. ÂWe've fixed a few different > things that have significantly improved the situation, but the heartbeats > are still failing from time to time. > > I suspect using a more recent release will be sufficient at your scale, > either 0.25.2 or the latest 'next' branch from git (there are autobuilt > debs for that too). ÂYou can also increase the 'osd heartbeat grace' to > make the system less sensitive to the transient hangs that are preventing > the heartbeats from going out. > > Please let us know what you find, either here or on #ceph. > > Thanks! > sage > > > > -- Best Regards, Sylar Shen -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html