Re: External access to the AMQP broker

Clement Verna <cverna@xxxxxxxxxxxxxxxxx> · Thu, 28 Feb 2019 17:15:31 +0100

On Thu, 28 Feb 2019 at 16:04, Jeremy Cline <jeremy@xxxxxxxxxx> wrote:
>
> On 2/27/19 2:10 PM, Clement Verna wrote:
> > On Wed, 27 Feb 2019 at 18:27, Aurelien Bompard
> > <abompard@xxxxxxxxxxxxxxxxx> wrote:
> >>
> >> I'm assuming you're considering the solution where we have a single
> >> broker and we make it publicly accessible (option 1).
> >>
> >>> how easy would it be to turn off the possibility for external
> >>> publisher to flood the broker ?
> >>
> >> External clients won't publish anything, they'll be read-only (with a
> >> few exceptions like the CentOS CI folks). However they can create a
> >> huge amount of queues, subscribe to everything and never consume
> >> anything.
> >> We can mitigate that by setting up another vhost (in the cluster we
> >> already have) for external clients, limit the number of queues on that
> >> vhost, and enforce a time to live on messages in the queues. It'll
> >> require some fine tuning, though, and external clients will still be
> >> able to DoS other external clients if we don't do authentication
> >> (option A).
> >
> >> I value option 2 (separate broker) higher than option 1 (same broker)
> >> because I'm not entirely sure those limits can prevent any kind of DoS
> >> on the broker. Attackers are creative. It's easier to make sure the
> >> resources used by a 2nd cluster don't impact the resources of the 1st
> >> cluster.
> >>
> >>> Can we configure the queues that are critical to have higher
> >>> priority to the external ones ?
> >>
> >> Yes, by  using a different vhost for internal (and CentOS) stuff and
> >> external stuff, and replicating messages from the internal to the
> >> external vhost.
> >>
> >>>   If we have on public broker with authentication can we easily kill the accounts that are flooding us ?
> >>
> >> Yes, that's the main advantage of option B.
> >>
> >>> What are the consequences of the service been down ? What is an
> >>> acceptable down time 1 min, 1h , 1 day , 1 week, 1 month ?
> >>
> >> I would say that the internal messaging service needs a high
> >> availability, while the SLA for the external service can be lower.
> >> That's also a reason for me prefering option 2.
> >>
> >> I hope that clarifies a bit.
> >
> > Yes it does thanks for the answers :-), my overall feeling is that the
> > risk of DoS should be one of the factor we take into account to make
> > the decision but we should also consider how easy is it to use, how
> > easy is it to maintain, how much effort is it to setup. I feel that
> > are focusing on the risk of DoS as the main factor to favour one
> > option against the other and I am not sure this is right but that 's
> > my personal feeling and I am happy to be wrong on that.
>
> I think you are under-estimating how often denial-of-service attacks
> happen, especially in situations where you only need to have kilobytes
> of bandwidth to start causing trouble. People do it just because they
> think it's funny and I don't think it's a matter of *if* someone does
> it, it's just a matter of when. It'd take a couple minutes to create
> a few hundred thousand queues, eating through broker resources until
> no one else can do anything.

I am not under-estimating it was just not obvious how easy it would be
to cause a DoS, if you say that this is very trivial then yes it makes
sense to worry about it.

>
>  From a user perspective, both options are identical (except for the
> possibility of authentication being on). It's really just a question of
> whether the effort of maintaining a second cluster is worth the
> increased isolation.
>
> >
> > On the SLA I really think that in the case of DoS attack we would not
> > have much trouble communicating with the community that we are facing
> > an attack and the service will be down or deprecated for X days.
> > Overall I think we should start being OK with taking the risk to have
> > our services down for multiple hours, days, ... if that allow us to
> > save on the daily maintenance burden.
> >
> > Again just my 2 cents on the subject, so feel free to ignore it :-)
>
> If you're still considering the single broker setup with this approach,
> the cost of being down is that everything grinds to a halt. Package
> signing is message driven. CI/CD is message driven. Pretty much
> everything relies on messages. A big CVE gets announced, and then
> someone attacks the messaging infrastructure to hinder getting the
> package built, signed, and shipped.
>
> I agree that there are plenty of services that are fine with outages of
> hours or even days (and they can recover if they use messages because
> it'll all still be queued!), but the message broker isn't one we should
> allow users to take down.

Sure I also trying to make us realize that this is a community service
and that in most case aiming for enterprise level support and
availability is not needed (also we don't have the resource for that).
So this might not apply in this case but I think it was important to
bring it forward.

>
> - Jeremy
_______________________________________________
infrastructure mailing list -- infrastructure@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to infrastructure-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/infrastructure@xxxxxxxxxxxxxxxxxxxxxxx