Re: osd_snap_trim_sleep keeps locks PG during sleep?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Have you also tried setting osd_snap_trim_cost to be 16777216 (16x the
default value, equal to a 16MB IO) and
osd_pg_max_concurrent_snap_trims to 1 (from 2)?
-Sam

On Thu, Jan 19, 2017 at 7:57 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> Hi Sam,
>
> Thanks for the confirmation on both which thread the trimming happens in and for confirming my suspicion that sleeping is now a bad idea.
>
> The problem I see is that even with setting the priority for trimming down low, it still seems to completely swamp the cluster. The trims seem to get submitted in an async nature which seems to leave all my disks sitting at queue depths of 50+ for several minutes until the snapshot is removed, often also causing several OSD's to get marked out and start flapping. I'm using WPQ but haven't changed the cutoff variable yet as I know you are working on fixing a bug with that.
>
> Nick
>
>> -----Original Message-----
>> From: Samuel Just [mailto:sjust@xxxxxxxxxx]
>> Sent: 19 January 2017 15:47
>> To: Dan van der Ster <dan@xxxxxxxxxxxxxx>
>> Cc: Nick Fisk <nick@xxxxxxxxxx>; ceph-users <ceph-users@xxxxxxxxxxxxxx>
>> Subject: Re:  osd_snap_trim_sleep keeps locks PG during sleep?
>>
>> Snaptrimming is now in the main op threadpool along with scrub, recovery, and client IO.  I don't think it's a good idea to use any of the
>> _sleep configs anymore -- the intention is that by setting the priority low, they won't actually be scheduled much.
>> -Sam
>>
>> On Thu, Jan 19, 2017 at 5:40 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>> > On Thu, Jan 19, 2017 at 1:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >> Hi Dan,
>> >>
>> >> I carried out some more testing after doubling the op threads, it may
>> >> have had a small benefit as potentially some threads are available,
>> >> but latency still sits more or less around the configured snap sleep time. Even more threads might help, but I suspect you are just
>> lowering the chance of IO's that are stuck behind the sleep, rather than actually solving the problem.
>> >>
>> >> I'm guessing when the snap trimming was in disk thread, you wouldn't
>> >> have noticed these sleeps, but now it's in the op thread it will just sit there holding up all IO and be a lot more noticable. It might be
>> that this option shouldn't be used with Jewel+?
>> >
>> > That's a good thought -- so we need confirmation which thread is doing
>> > the snap trimming. I honestly can't figure it out from the code --
>> > hopefully a dev could explain how it works.
>> >
>> > Otherwise, I don't have much practical experience with snap trimming
>> > in jewel yet -- our RBD cluster is still running 0.94.9.
>> >
>> > Cheers, Dan
>> >
>> >
>> >>
>> >>> -----Original Message-----
>> >>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
>> >>> Behalf Of Nick Fisk
>> >>> Sent: 13 January 2017 20:38
>> >>> To: 'Dan van der Ster' <dan@xxxxxxxxxxxxxx>
>> >>> Cc: 'ceph-users' <ceph-users@xxxxxxxxxxxxxx>
>> >>> Subject: Re:  osd_snap_trim_sleep keeps locks PG during sleep?
>> >>>
>> >>> We're on Jewel and your right, I'm pretty sure the snap stuff is also now handled in the op thread.
>> >>>
>> >>> The dump historic ops socket command showed a 10s delay at the
>> >>> "Reached PG" stage, from Greg's response [1], it would suggest that
>> >>> the OSD itself isn't blocking but the PG it's currently sleeping
>> >>> whilst trimming. I think in the former case, it would have a
>> >> high time
>> >>> on the "Started" part of the op? Anyway I will carry out some more
>> >>> testing with higher osd op threads and see if that makes any difference. Thanks for the suggestion.
>> >>>
>> >>> Nick
>> >>>
>> >>>
>> >>> [1]
>> >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2016-March/00865
>> >>> 2.html
>> >>>
>> >>> > -----Original Message-----
>> >>> > From: Dan van der Ster [mailto:dan@xxxxxxxxxxxxxx]
>> >>> > Sent: 13 January 2017 10:28
>> >>> > To: Nick Fisk <nick@xxxxxxxxxx>
>> >>> > Cc: ceph-users <ceph-users@xxxxxxxxxxxxxx>
>> >>> > Subject: Re:  osd_snap_trim_sleep keeps locks PG during sleep?
>> >>> >
>> >>> > Hammer or jewel? I've forgotten which thread pool is handling the
>> >>> > snap trim nowadays -- is it the op thread yet? If so, perhaps all
>> >>> > the op threads are stuck sleeping? Just a wild guess. (Maybe
>> >> increasing #
>> >>> op threads would help?).
>> >>> >
>> >>> > -- Dan
>> >>> >
>> >>> >
>> >>> > On Thu, Jan 12, 2017 at 3:11 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >>> > > Hi,
>> >>> > >
>> >>> > > I had been testing some higher values with the
>> >>> > > osd_snap_trim_sleep variable to try and reduce the impact of
>> >>> > > removing RBD snapshots on our cluster and I have come across
>> >>> > > what I believe to be a possible unintended consequence. The
>> >>> > > value of the sleep seems to keep the
>> >>> > lock on the PG open so that no other IO can use the PG whilst the snap removal operation is sleeping.
>> >>> > >
>> >>> > > I had set the variable to 10s to completely minimise the impact
>> >>> > > as I had some multi TB snapshots to remove and noticed that
>> >>> > > suddenly all IO to the cluster had a latency of roughly 10s as
>> >>> > > well, all the
>> >>> > dumped ops show waiting on PG for 10s as well.
>> >>> > >
>> >>> > > Is the osd_snap_trim_sleep variable only ever meant to be used
>> >>> > > up to say a max of 0.1s and this is a known side effect, or
>> >>> > > should the lock on the PG be removed so that normal IO can
>> >>> > > continue during the
>> >>> > sleeps?
>> >>> > >
>> >>> > > Nick
>> >>> > >
>> >>> > > _______________________________________________
>> >>> > > ceph-users mailing list
>> >>> > > ceph-users@xxxxxxxxxxxxxx
>> >>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@xxxxxxxxxxxxxx
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux