Re: Request for Comments: Weighted Round Robin OP Queue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 5 Nov 2015, Robert LeBlanc wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> Thanks Gregory,
> 
> People are most likely busy and haven't had time to digest this and I
> may be expecting more excitement from it (I'm excited due to the
> results and probably also that such a large change still works). I'll
> keep working towards a PR, this was mostly proof of concept, now that
> there is some data I'll clean up the code.

I'm *very* excited about this.  This is something that almost every 
operator has problems with so it's very encouraging to see that switching 
up the queue has a big impact in your environment.

I'm just following up on this after a week of travel, so apologies if this 
is covered already, but did you compare this implementation to the 
original one with the same tunables?  I see somewhere that you had 
max_backfills=20 at some point, which is going to be bad regardless of the 
queue.

I also see that you chnaged the strict priority threshold from LOW to HIGH 
in OSD.cc; I'm curious how much of an impact was from this vs the queue 
implementation.
 
> I was thinking that a config option to choose the scheduler would be a
> good idea. In terms of the project what is the better approach: create
> a new template and each place the template class is instantiated
> select the queue, or perform the queue selection in the same template
> class, or something else I haven't thought of.

A config option would be nice, but I'd start by just cleaning up the code 
and putting it in a new class (WeightedRoundRobinPriorityQueue or 
whatever).  If we find that it's behaving better I'm not sure how much 
value we get from a tunable.  Note that there is one other user 
(msgr/simple/DispatchQueue) that we might also was to switch over at some 
point.. especially if this implementation is faster.

Once it's cleaned up (remove commented out code, new class) put it up as a 
PR and we can review and get it through testing.

Thanks, Robert!
sage


> 
> Are there public teuthology-openstack systems that could be used for
> testing? I don't remember, I'll have to search back through the
> mailing list archives.
> 
> I appreciate all the direction as I've tried to figure this out.
> 
> Thanks,
> - ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> 
> 
> On Wed, Nov 4, 2015 at 8:20 PM, Gregory Farnum  wrote:
> > On Wed, Nov 4, 2015 at 7:00 PM, Robert LeBlanc  wrote:
> >> -----BEGIN PGP SIGNED MESSAGE-----
> >> Hash: SHA256
> >>
> >> Thanks for your help on IRC Samuel. I think I found where I made a
> >> mistake. I'll do some more testing. So far with max_backfills=1 on
> >> spindles, the impact of setting an OSD out and in on a saturated
> >> cluster seems to be minimal. On my I/O graphs it is hard to tell where
> >> the OSD was out and in recovering. If I/O becomes blocked, it seems
> >> that they don't linger around long. All of the clients report getting
> >> about the same amount of work done with little variance so no one
> >> client is getting indefinitely blocked (or blocked for really long
> >> times) causing the results between clients to be skewed like before.
> >>
> >> So far this queue seems to be very positive. I'd hate to put a lot of
> >> working getting this ready to merge if there is little interest in it
> >> (a lot of things to do at work and some other things I'd like to track
> >> down in the Ceph code as well). What are some of the next steps for
> >> something like this, meaning a pretty significant change to core code?
> >
> > Well, step one is to convince people it's worthwhile. Your performance
> > information and anecdotal evidence of client impact is a pretty good
> > start. For it to get merged:
> > 1) People will need to review it and verify it's not breaking anything
> > they can identify from code. Things are a bit constricted right now,
> > but this is pretty small and of high interest so I make no promises
> > for the core team but submitting a PR will be the way to start.
> > Getting positive buy-in from other contributors who are interested in
> > performance will also push it up the queue.
> > 2) There will need to be a lot of testing on something like this.
> > Everything has to pass a run of the RADOS suite. Unfortunately this is
> > a bad month for that as the lab is getting physically shipped around
> > in a few weeks, so if you can afford to make it happen with the
> > teuthology-openstack stuff that will accelerate the timeline a lot (we
> > will still need to run it ourselves but once it's passed externally we
> > can put it in a lot more test runs we expect to pass, instead of in a
> > bucket with others that will all get blocked on any one failure).
> > 3) For a new queuing system I suspect that rather than a direct merge
> > to default master, Sam will want to keep both in the code for a while
> > with a config value and run a lot of the nightlies on this one to
> > tease out any subtle races and bugs.
> > 4) Eventually we become confident that it's in good shape and it
> > replaces the old queue.
> >
> > Obviously those are the optimistic steps. ;)
> > -Greg
> >
> >>
> >> Thank you to all who took time to help point me in the right direction.
> >> - ----------------
> >> Robert LeBlanc
> >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> >>
> >>
> >> On Wed, Nov 4, 2015 at 12:49 PM, Samuel Just  wrote:
> >>> I didn't look into it closely, but that almost certainly means that
> >>> your queue is reordering primary->replica replicated write messages.
> >>> -Sam
> >>>
> >>
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: Mailvelope v1.2.3
> >> Comment: https://www.mailvelope.com
> >>
> >> wsFcBAEBCAAQBQJWOsY1CRDmVDuy+mK58QAA9LIQALIUgbS4BuDS704HPOpA
> >> XwvGxspelMCaBkLHLgiHU4T/Jc8JaXhgdRMwMiKeLI246Z7hRngSGlIDYc4+
> >> nP4kWZIkwbJeTa/Z6bM6C3itFtJmQpkPvdjI+GiME5ZdYvFgCZQyDD71rqja
> >> H14m0+JsEaIHQF0JZz6OyNxbyRWsM+M68nOvpAx8/fOGHBC/0VwPbLrOUP9O
> >> 3J3NvbhN9xlYJeivXSAyzxmHQDD8mO1c1AUTrHgnTViD2k3fmcH0mOHIJ+jn
> >> ARZbeLN3hlXG0i9PHpnHzBVNSxsfb5VPxX970R3gvRWIt40QV/QL7q2SajWP
> >> ofxgEpkaO48ANQSYDlqSNcM+w46TtgcJljtX0vbrHIW3Skyaz4UZQ/dzX4lX
> >> a5Zzk01oFwXfMd10KgVbJf78qVYHy2r5aq46iFnrFLU43iy+Qve7Kex4XZFi
> >> vPFFVea89Of838NqTxW21+3oJthrz1g7RKHghZAbXaj3WKchuEU+uVG4XTo1
> >> 0PU4a5ZYVTH6zYHpwJo2/89OzdkBe9S6s00+4JmfVWWEhb6+QwUjBQp1TJbB
> >> TnMzSKfzgRyi/wHThv2XcZN12tttZMM2L4Ea3mHG+cxOTTZ1opv8/H2mprm8
> >> 7UuO4vk5K0c4IwPVmt9m5DTVhyn4hZ/QJmc+NARD3zc1u3qWFLkH2WaRMpBb
> >> mRWA
> >> =kgAl
> >> -----END PGP SIGNATURE-----
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> -----BEGIN PGP SIGNATURE-----
> Version: Mailvelope v1.2.3
> Comment: https://www.mailvelope.com
> 
> wsFcBAEBCAAQBQJWO3JgCRDmVDuy+mK58QAAeCcP/jHnG3r257cdcRYZzg9o
> iMOxnuKAXNnwscYzJysCHsoQ2S3dB9SCxt8r+QvDo09IkXzarFaW647nzG6H
> zeCtbhx2NFU/jOqPip/8XDUaDYlDrjHuskDJwz+jzoaZfWPjLfPkmETU/8vh
> mrGZH+kjYuu1WhmM8cGJZJLrKA7C2OPTAU5PRmx5enClHXhdxyskZ7BUxcXp
> uPJJg7pemT/qaJPrO7e7wwhYw43GaeSULp8QGFsqireCwbv9mndB7bbOa40U
> ElHmgWgcG1UkkydW/U9DaJHM52ZbrAuG7XkZRsmB1oTmVriEoOFYSiGv5F+R
> Mjxe9OlqiL9Fd/AQXunAAMdwIU5T3mlkrxMvhroRkW2+EerrRVW3JbJ8gmQ9
> lXPRw9RxcQY5m8S+8+CWikBHvsRBCXEGA8tXUYqLuDJKpRHeCo7PpONS3III
> QB+tgWaMteoeJGZ7nGLFcaKxTGa1tNKju4M2845/L8Fawy8jdYYcLqOTUs80
> M1gpQ0UHzTXdQEdQnufxgaCFfwblF5vIlr6qd89rR5m0eJipElQLi2Uh0Zd3
> 0t0i0xtFdprkxDmzX/bzbARAnlS1cz/yoB85r3JxeNPev671mocQc0uyFkt7
> P04ogGWzLBN5B4nWNWDznOZS52G+vhkFxryUyl9+LDafAKiTTPmhB/LXPMs+
> 7ny7
> =Xg0t
> -----END PGP SIGNATURE-----
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux