Thanks Sage! I will definitely try those patches. For this one, I finally managed to bring the new monitor in by increasing the mon_sync_timeout from its default 60 to 60000 to make sure the syncing does not restart and result in an infinite loop.. On Fri, Nov 13, 2015 at 5:04 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Fri, 13 Nov 2015, Guang Yang wrote: >> Thanks Sage! >> >> On Fri, Nov 13, 2015 at 4:15 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> > On Fri, 13 Nov 2015, Guang Yang wrote: >> >> I was wrong the previous analysis, it was not the iterator got reset, >> >> the problem I can see now, is that during the syncing, a new round of >> >> election kicked off and thus it needs to probe the newly added >> >> monitor, however, since it hasn't been synced yet, it will restart the >> >> syncing from there. >> > >> > What version of this? I think this is something we fixed a while back? >> This is on Giant (c51c8f9d80fa4e0168aa52685b8de40e42758578), is there >> a commit I can take a look? > > Hrm, I guess it was way befoer that.. I'm thinking of > b8af38b6fc161691d637631d9ce8ab84fb3d27c7 which was pre-firefly. So I'm > not sure exactly why an election would be restarting the sync in your > case.. > > You mentioned elsewhere that your mon store was very large, though (more > than 10's of GB), which suggests you might be hitting the > min_last_epoch_clean problem (which prevents osdmap trimming).. see > b41408302b6529a7856a3b0a08c35e5fa284882e. This was backported to hammer > and firefly but not giant. > > sage > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html