On Mon, Nov 16, 2015 at 5:42 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Mon, 16 Nov 2015, Guang Yang wrote: >> I spoke to a leveldb expert, it looks like this is a known pattern on >> LSM tree data structure - the tail latency for range scan could be far >> longer than avg/median since it might need to mmap several sst files >> to get the record. >> >> Hi Sage, >> Do you see any harm to increase the default value for this setting >> (e.g. 20 minutes)? Or should I add the advise for monitor >> trouble-shooting? > > The timeout is just for a round trip for the sync process, right? I think > increasing it a bit (2x or 3x?) is okay, but 20 minutes to do a single > chunk is a lot. Yeah the timeout is for a single round trip (there is timeout reset mechanism at both sides). > > The underlying problem in your cases is that your store is huge (by ~2 > orders of magnitude), so I'm not sure we should tune against that :) Ok, let me apply the patches and monitor the db growth. > > sage > > > > >> Thanks, >> Guang >> >> On Fri, Nov 13, 2015 at 9:07 PM, Guang Yang <guangyy@xxxxxxxxx> wrote: >> > Thanks Sage! I will definitely try those patches. >> > >> > For this one, I finally managed to bring the new monitor in by >> > increasing the mon_sync_timeout from its default 60 to 60000 to make >> > sure the syncing does not restart and result in an infinite loop.. >> > >> > On Fri, Nov 13, 2015 at 5:04 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >> On Fri, 13 Nov 2015, Guang Yang wrote: >> >>> Thanks Sage! >> >>> >> >>> On Fri, Nov 13, 2015 at 4:15 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> >>> > On Fri, 13 Nov 2015, Guang Yang wrote: >> >>> >> I was wrong the previous analysis, it was not the iterator got reset, >> >>> >> the problem I can see now, is that during the syncing, a new round of >> >>> >> election kicked off and thus it needs to probe the newly added >> >>> >> monitor, however, since it hasn't been synced yet, it will restart the >> >>> >> syncing from there. >> >>> > >> >>> > What version of this? I think this is something we fixed a while back? >> >>> This is on Giant (c51c8f9d80fa4e0168aa52685b8de40e42758578), is there >> >>> a commit I can take a look? >> >> >> >> Hrm, I guess it was way befoer that.. I'm thinking of >> >> b8af38b6fc161691d637631d9ce8ab84fb3d27c7 which was pre-firefly. So I'm >> >> not sure exactly why an election would be restarting the sync in your >> >> case.. >> >> >> >> You mentioned elsewhere that your mon store was very large, though (more >> >> than 10's of GB), which suggests you might be hitting the >> >> min_last_epoch_clean problem (which prevents osdmap trimming).. see >> >> b41408302b6529a7856a3b0a08c35e5fa284882e. This was backported to hammer >> >> and firefly but not giant. >> >> >> >> sage >> >> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html