Hi, > Sorry to let this drop for so long, but is this something you've seen > happen before/again or otherwise reproduced? I'm not entirely sure how > to best test for it (other than just jerking the time around), and > while I can come up with scenarios where the OSD leaks memory, I've > got nothing for how that happens to the monitors. We've also fixed a > number of leaks recently that could account for part of the problem. It happenned very reliably at each attempt to restart the OSD and stopped right when I fixed the clock. Just take a working cluster, take an osd out, let it rebalance, set the clock of one of the OSD 50 min too fast, and restart the OSD. I had it occur twice with the same clock sync problems. (once in a test cluster with just 2 osd IIRC and once in the prod cluster). I don't get it anymore because I patched the underlying problem that was causing the clock to jump forward 50 min. If you can't reproduce it locally, I can try to reproduce it again on the test cluster tomorrow. My best guess was that somehow the messages had a timestamp and it refused to process message too much in the future and maybe just queued them while waiting (but 50 min worth of message is a lot of memory). But that's really a wild guess :p Cheers, Sylvain -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html