Re: Newly added monitor infinitely sync store

Sage Weil <sage@xxxxxxxxxxxx> · Mon, 16 Nov 2015 17:42:45 -0800 (PST)

On Mon, 16 Nov 2015, Guang Yang wrote:
> I spoke to a leveldb expert, it looks like this is a known pattern on
> LSM tree data structure - the tail latency for range scan could be far
> longer than avg/median since it might need to mmap several sst files
> to get the record.
> 
> Hi Sage,
> Do you see any harm to increase the default value for this setting
> (e.g. 20 minutes)? Or should I add the advise for monitor
> trouble-shooting?

The timeout is just for a round trip for the sync process, right?  I think 
increasing it a bit (2x or 3x?) is okay, but 20 minutes to do a single 
chunk is a lot.

The underlying problem in your cases is that your store is huge (by ~2 
orders of magnitude), so I'm not sure we should tune against that :)

sage

 > 
> Thanks,
> Guang
> 
> On Fri, Nov 13, 2015 at 9:07 PM, Guang Yang <guangyy@xxxxxxxxx> wrote:
> > Thanks Sage! I will definitely try those patches.
> >
> > For this one, I finally managed to bring the new monitor in by
> > increasing the mon_sync_timeout from its default 60 to 60000 to make
> > sure the syncing does not restart and result in an infinite loop..
> >
> > On Fri, Nov 13, 2015 at 5:04 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> On Fri, 13 Nov 2015, Guang Yang wrote:
> >>> Thanks Sage!
> >>>
> >>> On Fri, Nov 13, 2015 at 4:15 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >>> > On Fri, 13 Nov 2015, Guang Yang wrote:
> >>> >> I was wrong the previous analysis, it was not the iterator got reset,
> >>> >> the problem I can see now, is that during the syncing, a new round of
> >>> >> election kicked off and thus it needs to probe the newly added
> >>> >> monitor, however, since it hasn't been synced yet, it will restart the
> >>> >> syncing from there.
> >>> >
> >>> > What version of this?  I think this is something we fixed a while back?
> >>> This is on Giant (c51c8f9d80fa4e0168aa52685b8de40e42758578), is there
> >>> a commit I can take a look?
> >>
> >> Hrm, I guess it was way befoer that.. I'm thinking of
> >> b8af38b6fc161691d637631d9ce8ab84fb3d27c7 which was pre-firefly.  So I'm
> >> not sure exactly why an election would be restarting the sync in your
> >> case..
> >>
> >> You mentioned elsewhere that your mon store was very large, though (more
> >> than 10's of GB), which suggests you might be hitting the
> >> min_last_epoch_clean problem (which prevents osdmap trimming).. see
> >> b41408302b6529a7856a3b0a08c35e5fa284882e.  This was backported to hammer
> >> and firefly but not giant.
> >>
> >> sage
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html