draft-mm-wg-effect-encrypt-13 review

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Brandon Williams and I reviewed this draft with Kathleen and Al via the IETF etherpad during last call. The link is here:

https://etherpad.tools.ietf.org/p/krose-review-draft-mm-wg-effect-encrypt-13

I have attempted to cull the discussion down to something consumable on the mailing list. Apologies in advance for formatting issues. Note that I have not subsequently added any comments to the etherpad: I'll respond to this thread instead with any follow-ups.


   this mode is deployed.  IPsec with authentication has many useful
   applications and usage has increased for infrastructure applications
   such as for virtual private networks between data centers.

KR> The above paragraph doesn't seem to have a clear point. It's mostly about opportunistic security, but has a few unrelated points inserted (e.g., the final sentence about IPsec with authentication).

KM> Looks like final sentence provides contrast with the previous sentence
"OS has been implemented as NULL Authentication with IPsec..."


   the application user, and hosting service providers lease computing,
   storage, and communications systems in datacenters.  In practice,
   many companies perform two or more service provider roles, but may be
   historically associated with one.

KR> Honestly, all of section 1.1 seems to be a grab bag lacking a thesis, maybe motivating the vague title "Additional Background". I think the information there could be organized in a better way to answer the question posed in the reader's mind after the introduction, which is: give me some examples of monitoring/manipulation for operability that have been defeated by encryption, with evidence that they aren't solvable without additional telemetry or cooperation with middleboxes. The following section is closer to what I wanted after the intro.

KM> Initially, we had some examples in the draft, but were asked to remove them.  I take your point and am in process of reworking this text to make it more cohesive.  The problem in much of this draft was the number of contributions, so your sweep of the draft for points like this are very helpful.

AM> As I read it, section 1.1 is describing changes to the communication landscape re: encryption. The exception is the very last paragraph, which could be moved above section 1.1. I agree that the purpose of section 1.1. could be made more clear in the introductory paragraph. I do think that it's useful background to high-light these changes in the encryption landscape, since that's what motivates the draft.

KM> Thanks.  It is also to ensure the draft is not read as being just about TLS, so I am making that point more clear as well.  Moving the last paragraph to the introduction makes sense.


   Following the Snowden revelations, application service providers
   responded by encrypting traffic between their data centers (IPsec) to
   prevent passive monitoring from taking place unbeknownst to them
   (Yahoo, Google, etc.).  Large mail service providers also began to

KR> IIRC, companies were already doing this for infrastructure traffic on the public internet: it was their own private backbones on which they started using encryption universally.

KM> Yes, good point.  I clarified the text to distinguish that, but was careful not to say that they pretty all infrastructure traffic over the Internet was encrypted even though I think that's in line with reality - at least where I worked which included an ISP and a big financial data provider early on.

   The EFF reported [EFF2014] several network service providers taking
   steps to prevent the use of SMTP over TLS by breaking STARTTLS

KR> I think it's important to use the phrase "downgrade attack" when describing something like this. The word "downgrade" first appears in the references list.

KM> Done, thanks.


   (section 3.2 of [RFC7525]), essentially preventing the negotiation
   process resulting in fallback to the use of clear text.  In other
   cases, some service providers have relied on middle boxes having
   access to clear text for the purposes of load balancing, monitoring
   for attack traffic, meeting regulatory requirements, or for other
   purposes.  These middle box implementations, whether performing
   functions considered legitimate by the IETF or not, have been
   impacted by increases in encrypted traffic.  Only methods keeping
   with the goal of balancing network management and PM mitigation in
   [RFC7258] should be considered in solution work resulting from this
   document.

KR> I feel like this section could be better organized by:
 * Moving the examples to 1.1 as a bulleted list of sample situations in which network operators attempted to and/or succeeded in defeating encryption to preserve existing operational mechanisms, or in which performance suffered for users (whether of the encrypted flows or of other flows impacted by encrypted flows).

KM> Interesting point, but we'd need more examples.  I'll think about this more and chat with Al in case he has ideas.  For now, I went with Brandon's easier suggestion, but moving to this would be nice for the document readers.

AM> Although I see how these examples could be part of the background, I think those who will
eventually remove their objections will prefer the reduced emphasis on these examples where
they are (in section 2). In one view, the entire memo is background, since nothing new is proposed.

KR>
 * Using this section as an introduction to the methodology for cataloging operational mechanisms depending on cleartext traffic monitoring, with the various caveats on what will be considered (e.g., only mechanisms required heretofore for operability), and for describing the approach to seeking mitigations and/or substitutions.

KM> Hmm, interesting point.  I'll have to think about this more as it could be alot of work at this stage.

AM> Unfortunately, we've already implemented many AD-level suggestions on the organization of Section 2.
We're at the stage of *what can everybody live with", and re-re-re-org falls out now, IMO.


   Network service providers use various techniques to operate, manage,
   and secure their networks.  The following subsections detail the
   purpose of each technique and which protocol fields are used to
   accomplish each task.  In response to increased encryption of these
   fields, some network service providers may be tempted to undertake
   undesirable security practices in order to gain access to the fields
   in unencrypted data flows.  To avoid this situation, ideally new
   methods could be developed to accomplish the same goals without
   service providers having the ability to see session data.

BW> I think the above paragraph is the core point of the section; describing what the whole of section 2 is about.  The previous paragraphs, while important information, don't seem to belong in this section. Perhaps a separate section focused on observed bad behavior would be better.

KM> That would get at both your point and Kyle's, thanks.

AM> I agree this last paragraph could be moved-up (after the definition of Network SP).
The Snowden and EFF paragraphs would be best positioned as footnotes to the
first sentence "Network service providers use various techniques ...", but we don't
have that mechanism available.

Also, the neutral exposition that we've been asked to provide a million times actually
comes from multiple perspectives expressed in contributions that we would combine
in a balanced way, without value judgements (no good or bad).
Where we lack balance, we lack specific contributions.

(So, I think the new -14 section 1.2 does not have an appropriate title.)


   heuristics grows, and accuracy suffers.  For example, the traffic
   patterns between server and browser are dependent on browser supplier
   and version, even when the sessions use the same server application
   (e.g., web e-mail access).  It remains to be seen whether more
   complex inferences can be mastered to produce the same monitoring
   accuracy.

KR> This might be too formal of an approach for this doc, but it might be possible to construct a taxonomy of layers of metadata made unavailable by encryption at each layer to show the completeness/comprehensiveness of the survey. So, for instance:
 * Protocol and port number are still available as a way of characterizing traffic over the public internet even if the payload is encrypted, but this information is lost if (e.g.) the traffic is traversing an IPsec tunnel or if radically different kinds of traffic all use port 443/tcp without any other way to distinguish between them.
 * TCP is open to optimization/measurement even if using TLS, except when tunneled encrypted: congestion signals (like rexmits) previously transparent to the middlebox, for instance, are then lost.
 * Encrypting the payload defeats attempts to survey traffic by user agent (if there's no other way to distinguish, e.g., by fingerprinting).

KM> I think this would be a really helpful follow on document.  I'd be willing to work on it if you're game.  I've been thining about something similar, specific to TLS, but should be broadened.


   It is important to note that the push for encryption by application
   providers has been motivated by the application of the described
   techniques.  Some application providers have noted degraded
   performance and/or user experience when network-based optimization or
   enhancement of their traffic has occurred, and such cases may result
   in additional operator troubleshooting, as well.

KR> Observation: additionally, I think you'll encounter the argument that the responsibility for diagnosing bad interactions between applications and networks falls on the application owner rather than the network operator. Basically, I feel like the desire among protocol designers is for operators to provide a pipe with certain key characteristics that interact well with established transport protocol mechanisms, and otherwise to leave the traffic alone and let the application developers do what they want to within the expected constraints. If that's infeasible (e.g., in edge cases, or with respect to new technologies that interact badly with existing transports, such as the loss=congestion assumption of TCP that interacts badly with wifi), that's precisely the case needs to be made by this document.

KM> We have encountered this argument already.  It's a tough one as SPs have the SLAs with customers, so they are the first call.  Many don't know how to get in touch with APP providers.  I understand the application developers perspecive, but also see that there has to be some ability to troubleshoot.  Sure, providers could wrap the protocols for transport to provide some way of measuring, but information is lost.  IPv6 with flow identifiers is another way to do it, but you might not be able to prioritize a call or protocol that has little tolerance for delay over one that does for instance.  And I realize that app providers just want all traffic to have the same priority, but emergency calls are important.

BW> I think the point made by the document is correct though: operators are nearly always the first call, not the application provider.

KM> We were asked to remove text that said that.  I agree that it is the case as the providers have the SLAs and you don't typlically have a number for App providers.

BW> The operators are looking for ways to demonstrate that they did not cause the problem (or determine that they did) for efficient hand-off to the correct party for resolution. There are certainly problems an approach that changes the behavior of the protocol, but it's difficult to argue with the diagnostic need.

AM> Using Netflix as an example, the first source of problem they mention is the network when
addressing the question "Why doesn't Netflix work?":
    "If Netflix isn’t working, you may be experiencing a network connectivity issue, an issue with your device, or an issue with your Netflix app or account."
    from https://help.netflix.com/en/node/461?ui_action=kb-article-popular-categories
They previously had even stronger wording, something like "First, make sure your network connection meets the Netflix requirements ... URL"
One of the causes of re-buffering are CDN-related pauses when accessing the next segment:  completely hidden from users so far.
Additional frequent cause: the unlicensed WiFi network owned and operated by the customer.

Another way to look at this strategy: App providers are transferring as much overhead cost to the network operators as possible
(troubleshooting customer problems is expensive - rolling a truck negates months of revenue), while preserving as
much value/control/revenue as they can for themselves. The greed-thingy plays poorly over time.
A user-focused strategy would be to form partnerships for troubleshooting of shared customers, but that might result in exposing
the real causes and some would rather hide for now, it seems.


   For example, browser fingerprints are comprised of many
   characteristics, including User Agent, HTTP Accept headers, browser
   plug-in details, screen size and color details, system fonts and time
   zone.  A monitoring system could easily identify a specific browser,
   and by correlating other information, identify a specific user.

KR> Subsections of 2.1 cover the following in what feels like arbitrary and inconsistent order: technique description, justification for the technique, reason why the technique is bad (for privacy), how the technique is defeated by protocol designers, and examples of the technique. It really reads like a laundry list rather than a systematic analysis of the problems faced, the metadata required for diagnostics, and how these techniques are defeated by encryption.

KM> Hmm, this section was reorganized by others in the last IESG review, so that's probbaly part of the problem.  I'll read through and see what I can do to help it out more. It cleared a discuss to make the changes.
** didn't tacle this one and will go back to it.

AM> To re-iterate: this isn't the optimization phase. We've done 10 months of that.
We've reached "what can you live with" phase, IMO.


   packet is able to provide stateless load balancing.  This ability
   confers great reliability and scaleability advantages even if the
   flow remains in a single POP, because the load balancing system is
   not required to keep state of each flow.  Even more importantly,
   there's no requirement to continuously synchronize such state among
   the pool of load balancers.

KR> An important point is that an integrated load balancer repurposing limited existing bits in transport flow state must maintain and synchronize per-flow state occasionally: using the sequence number as a cookie only works for so long given that there aren't that many bits available to divide across a pool of machines.

KM> I added in this point, but have to check back on flow of text.


   Current protocols, such as TCP, allow the development of stateless
   integrated load balancers by availing such load balancers of
   additional plain text information in client-to-server packets.  In
   case of TCP, such information can be encoded by having server-
   generated sequence numbers (that are ACK'd by the client), segment
   values, lengths of the packet sent, etc.

KR> Is it worth mentioning that the use of some of these mechanisms for load balancing negates some of the security assumptions associated with those primitives (e.g., that an off-path attacker guessing valid sequence numbers for a flow is hard)?

KM> I added the above in as it may offer some balance to the discussion.

KR> A dedicated mechanism for storing load balancer state, such as QUIC's proposed connection ID, is strictly better from the load balancer's point of view, and is probably even better from a privacy perspective than bolting it on to an unrelated transport signal because it can be tightly controlled by one of the endpoints and rotated to avoid roving client linkability: in other words, being a specific, separate signal, it can be governed in a way that is finely targeted at that specific use-case. (I'm thinking the advantages of separate mechanisms belongs in a different part of the doc; this section is more like the problem statement than the solution statement.)

KM> This (above) needs to be reworded to be neutral and this does go towards solution space, which we were trying to avoid. How about:

Another possibility is a dedicated mechanism for storing load balancer state, such as QUIC's proposed connection ID to provide visibility to the load balancer.  An identifier could be used for tracking purposes, but this may provide an option that is an improvement from  bolting it on to an unrelated transport signal. This method allows for tight control by one of the endpoints and can be rotated to avoid roving client linkability: in other words, being a specific, separate signal, it can be governed in a way that is finely targeted at that specific use-case.


   In future Network Function Virtualization (NFV) architectures, load
   balancing functions are likely to be more prevalent (deployed at
   locations throughout operators' networks)[.  NFV environments will
   require some type of identifier (IPv6 flow identifiers, the Proposed
    QUIC connection ID, etc.) for managing]
   traffic using encrypted tunnels.[  The shift to increased encryption
   will have an impact to visibility of flow information and will require
   adjustments to perform similar load balancing functions within an NFV.]

KR> I'm not sure what architecture this paragraph is discussing: are you talking about encrypted tunnels between NFV nodes? Is this something obvious to people involved in NFV? A diagram (or informational reference) would be helpeful to me here.

KM> I see your point, the langauage here could be more clear. Do the above adjustments (ed: in []) help?


2.2.2.  Differential Treatment based on Deep Packet Inspection (DPI)
   ...
   These effects and potential alternative solutions have been discussed
   at the accord BoF [ACCORD] at IETF95.

KR> This section is labeled DPI, but really, the underlying issue is what you stated in the first paragraph: different kinds of traffic have different QoS needs, yet a network provider can't rely on a voluntary signal from an untrusted device to decide on QoS or every packet is simply going to be marked "high importance" and so we're back to treating all traffic equivalently. I'd argue against one of the memes I heard at the accord BoF, that it's down to latency vs. throughput, by pointing out that some applications (e.g., live video with low hand-wave latency) need both.

Even after reading this, I'm still skeptical of the need for any more granularity than flow, and using AQM on a per-flow (e.g., 5-tuple) or flow-aggregate (some subset of the 5-tuple) to prevent an application or user from consuming resources unfairly. What, for instance, prevents a carrier from privileging VoIP traffic by looking at endpoints? Would there be a way for someone else to masquerade non-VoIP traffic as VoIP traffic given this kind of setup? This is the kind of question that I need answered by this doc.

BW> It might be useful to note in this section that QUIC and H2 both combine multiple micro-flows, possibly of different types, within a single encrypted transport-layer flow. They share this with IPsec tunnels and the like. IOW, the increased use of encrypted aggregating encapsulation can hide even the the most basic representation of a flow from the differentiated service element. This same concern applies to load balancing elements discussed in section 2.2.1.

KM> **  Want to talk with Al on this set of comments.

AM> We were asked not to refer to QUIC, for various reasons (e.g., still under development).

There will always be areas where network can make the best decision, because of the
information available to the network operators (and the lack of that same info at end-points).

When network resources are constrained, only the network can manage priorities.
This has been organized according to applications that can be identified, but there
can be other solutions requiring cooperation between user devices and the network
according to subscription to a special service (QCI above).


2.2.3.  Network Congestion Management

   For User Plane Congestion Management (3GPP UPCON) - ability to
   understand content and manage network during congestion.  Mitigating
   techniques such as deferred download, off-peak acceleration, and
   outbound roamers.

KR> This seems like a special case of 2.2.2.

KM> Al - is there a reason this shouldn't get moved into

AM> I think there is some text missing here.
The text seems to have been one list item in old
section 7.2, dating back to version 11.  The list decription was:

"7.2.  Effect of Encrypted Transport Headers

   When the Transport Header is encrypted, it prevents the following
   mobile network features from operating:
       <and then a list of many items> "

I suggest to delete this text, but...

Kathleen - if you delete this section, Please leave the section header
marked "Blank - to be deleted" to keep the section numbering as-is,
and the diffs/comments will still correlate easily.  Thanks!


2.2.4.  Performance-enhancing Proxies

   Due to the characteristics of the mobile link, performance-enhancing
   TCP proxies may perform local retransmission at the mobile edge.  In
   TCP, duplicated ACKs are detected and potentially concealed when the
   proxy retransmits a segment that was lost on the mobile link without
   involvement of the far end (see section 2.1.1 of [RFC3135] and
   section 3.5 of [I-D.dolson-plus-middlebox-benefits]).

BW> Starting the first paragraph in this way suggests that such use cases are for mobile links only, which is not correct. Performance enhancing proxies of this sort can be used on any long RTT path to improve performance over a constrained uplink.

KM> How about:  Performance-enhancing TCP proxies may perform local retransmission at the network edge, this also applies to mobile networks.

   This optimization at network edges measurably improves real-time
   transmission over long delay Internet paths or networks with large
   capacity-variation (such as mobile/cellular networks).

AM> FYI -    The folowing sentence was added here in the -14pre version I sent on Nov 19:

        However, such optimizations can also cause problems with performance,
        for example if the characteristics of some packet streams begin to vary
        significantly from those considered in the proxy design.

This was intended to address one of Mark Nottingham's comments.


   An application-type-aware network edge (middlebox) can further
   control pacing, limit simultaneous HD videos, or prioritize active
   videos against new videos, etc.

KR> Observation: This subsection provides the first really compelling argument I've seen for exposing flow metadata to the path. On long paths, physics gets in the way of tight control feedback loops. If nothing else, this should provide motivation for protocol designers and operators to break down the characteristics of different kinds of flows, determine where control points are needed in each of them, and figure out how to implement those.

I think there is this conceit among protocol designers that quality problems can all be solved at the endpoints without any cooperation from path elements; the really killer arguments are examples of where that cannot possibly be the case. ECN is a great example of this, and is a signal explicitly targeted at middleboxes with opt-in by the endpoints: it allows a middlebox to report congestion without dropping packets, which produces measurably better QoS for the user.

KM> Ack, thanks.  You're not looking for additional text here, is that right?  If so, what are you thinking should be added?


   Content replication in caches (for example live video, DRM protected
   content) is used to most efficiently utilize the available limited
   bandwidth and thereby maximize the user's Quality of Experience
   (QoE).  Especially in mobile networks, duplicating every stream
   through the transit network increases backhaul cost for live TV.  The
   Enhanced Multimedia Broadcast/Multicast Services (3GPP eMBMS) -
   trusted edge proxies facilitate delivering same stream to different
   users, using either unicast or multicast depending on channel
   conditions to the user.

KR> There are on-going efforts to support multicast inside carrier networks while preserving end-to-end security: AMT, for instance, allows CDNs to deliver a single (potentially encrypted) copy of a live stream to a carrier network over the public internet and for the carrier to then distribute that live stream as efficiently as possible within its own network using multicast.

KM> Text added, thanks.


   Alternate approaches such as blind caches [I-D.thomson-http-bc] are
   being explored to allow caching of encrypted content; however, they
   still need to intercept the end-to-end transport connection.

KM> [s/need to intercept the end-to-end transport connection/require cooperation between the content owners/CDNs and blind caches and fall outside the scope of what is covered in this document/

Content delegation solves a data visibility problem with the delegated cache, the impact remains for the use case where HTTPS encryption limits visibility to offload from congested links.]

KR> This last point isn't strictly speaking true: many proposals (including I believe Martin's) require cooperation between content owners/CDNs and these blind caches. From Martin's draft:
   q( This document describes a method for conditionally delegating the
   hosting of secure content to the same server.  This delegation allows
   a client to send a request for an "https" resource via a proxy rather
   than insisting on an end-to-end TLS connection.  This enables shared
   caching for a limited set of "https" resources, as selected by the
   server. )

BW> I'm not sure that use cases where there is explicit cooperation between the content provider and the cache are necessarily relevant for this document, since in those cases the cache is an extension of the content provider (by some definition) and the cache will most likely not be inhibited by increased encryption. The more relevant caching case is one meant for network offload on the receiver side where there is no explicit cooperation between the content provider and the cache. That's the case where the use of HTTPS inhibits the cache's ability to offload from congested links. IOW, content delegation solves a data visibility problem with the delegated cache; it does not solve a problem introduced to the cache through the use of encryption.


2.2.6.  Content Compression

   In addition to caching, various applications exist to provide data
   compression in order to conserve the life of the user's mobile data
   plan and optimize delivery over the mobile link.  The compression
   proxy access can be built into a specific user level application,
   such as a browser, or it can be available to all applications using a
   system level application.  The primary method is for the mobile
   application to connect to a centralized server as a proxy, with the
   data channel between the client application and the server using
   compression to minimize bandwidth utilization.  The effectiveness of
   such systems depends on the server having access to unencrypted data
   flows.

KR> Observation: given the side channels exposed by data compression that is blind to content, the inability to compress arbitrary payloads is likely to be regarded as a feature of encryption. (Though I recognize this is a catalog, not an endorsement.) Furthermore, in most cases eliminating compression is still 2-competitive with compression, so I'm not sure it's a really compelling use-case.

BW> Per-object content compression might not be a compelling use case here. Aggregated data stream content compressions that spans objects and data sources is compelling, though. If there is a network element close to the receiver that sees all content destined for the receiver and can treat it all as part of a unified compression scheme (e.g., through the use of a shared segment store) will often be much more effective at providing data off-load.

KM> Thanks, we'll add this text (modified) to make those helpful points clear.

How about:
    Aggregated data stream content compression that spans objects and data sources that can be treated as part of a unified compression scheme (e.g., through the use of a shared segment store) is often effective at providing data offload when there is a network element close to the receiver that has access to see all the content.


   Another form of content filtering is called parental control, where
   some users are deliberately denied access to age-sensitive content as
   a feature to the service subscriber.  Some sites involve a mixture of
   universal and age-sensitive content and filtering software.  In these
   cases, more granular (application layer) metadata may be used to
   analyze and block traffic.  Methods that accessed cleartext
   application-layer metadata no longer work when sessions are
   encrypted.  This type of granular filtering could occur at the
   endpoint.  However, the lack of ability to efficiently manage
   endpoints as a service reduces providers' ability to offer parental
   control.

KR> It might be worth discussing the typical opt-in strategy for these things in the presence of TLS, adding a new intercept CA to willing clients, which has the downside that it potentially exposes every https connection to an active MitM.

BW> +1

KM> OK, we hadn't done that before since the option doesn't change, but you make a good point, so I'll add in text.  Thanks.

I added the following:

    This method is also used by other types of network providers enabling
     traffic inspection, but not modification.</t>

             <t>Content filtering via a proxy can also utilize an intercepting
          certificate where the client's session is terminated at the proxy
          enabling for cleartext inspection of the traffic.  A new session
          is created from the intercepting device to the client's
          destination, this is an opt-in strategy for the client. Changes to
          TLSv1.3 do not impact this more invasive method of interception, where
          this has the potential to expose every HTTPS session to an active
          man in the middle (MitM). </t>

KR> Random comment: especially with respect to government content filtering, I'm worried that the IETF's current approach of playing chicken with regulators on end-to-end encryption is going to result in normalization of intercept CAs, which will be strictly worse than a compromise solution in which a subset of traffic can be inspected (but not modified) with the user's knowledge and consent (e.g., distinct optics in the browser). I wouldn't like either outcome, frankly, but it would be nice if we had a game plan for what to do for user privacy if intercept CAs become a requirement for using the web in large parts of the world (something we might be one "crisis" away from), and an honest evaluation of the alternatives. Fundamentally, I don't like it when discussion gets shut down because people want to bury their heads in the sand in the name of ideology.</rant>

BW> +1. I also note that this concern applies to some of the other performance related use cases too.

KM> I think the real argument here is a control one between the application and management folks and not security/privacy even though that's what is often discussed.  This is all about control.


   In addition, mobile network operator often sell tariffs that allow
   free-data access to certain sites, known as 'zero rating'.  A session
   to visit such a site incurs no additional cost or data usage to the
   user.  This feature is impacted if encryption hides the details of
   the content domain from the network.

KR> There's the related issue that zero-rating by-implementation typically applies only to direct connections to a particular endpoint (e.g., by IP): if a user accidentally tunnels traffic from Spotify through a corporate VPN, that traffic won't be zero-rated, encrypted tunnel or not. (This goes back to the taxonomy of metadata layers comment I made near the top.) Carriers aren't going to trust e.g., a Host header for zero-rating, because that provides a simple way to tunnel traffic for free: consequently, determination of zero-rating will always involve some hard-to-impersonate credential, like an IP address or server certificate in the public trust web.

KM> Not sure what to add here, any ideas, AL?


   When RTSP stream content is encrypted, the 5-tuple information within
   the payload is not visible to these ALG implementations, and
   therefore they cannot provision their associated middelboxes with
   that information.

KR> I would argue that this is a protocol design issue. This was originally a problem with firewalls and NATs, with content inspection as a hack to work around the protocol/network impedance mismatch. I'm not the only one who would argue the right solution today is to design protocols to not require linkage across connections by middleboxes that do basic filtering.

KM> I think we are in agreement here for solution direction, but the document specificly tries to avoid solutions.  This example has been raised in the IESG by Warren and the apps side hadn't considered his view of it previously.  It would be good for protocols to have these considerations in their designs, they were mostly thinking it didn't matter and were end-to-end.  But poor video streaming sessions are an issue.  Not sure we should add any text here???


2.3.4.  HTTP Header Insertion

   Some mobile carriers use HTTP header insertion (see section 3.2.1 of
   [RFC7230]) to provide information about their customers to third
   parties or to their own internal systems [Enrich].  Third parties use
   the inserted information for analytics, customization, advertising,
   to bill the customer, or to selectively allow or block content.  HTTP
   header insertion is also used to pass information internally between
   a mobile service provider's sub-systems, thus keeping the internal
   systems loosely coupled.  When HTTP connections are encrypted, mobile
   network service providers cannot insert headers to accomplish the
   functions above.

KR> See my first comment re: compression. I'm dithering on how best to present these cases that are going to trigger some folks. ;-)

KM> Yes, this one is a hot button.  For the compression one, I clarified with added text and a use case that Brandon provided.


3.1.  Management Access Security
   ...
   Application service providers, by their very nature, control the
   application endpoint.  As such, much of the information gleaned from
   sessions are still available on that endpoint.  However, when a gap
   exists in the application's logging and debugging capabilities, this
   has led the application service provider to access data-in-transport
   for monitoring and debugging.

BW> How is DLP part of the management access discussion? It seems like a separate use case to me. The above two paragraphs seem out of place in this section.

KM> Good point, I moved this to the SP Content monitoring of Applications subsection and added a bullet for DLP.


   Overlay networks (e.g.  VXLAN, Geneve, etc.) may be used to indicate
   desired isolation, but this is not sufficient to prevent deliberate
   attacks that are aware of the use of the overlay network.  It is
   possible to use an overlay header in combination with IPsec, but this
   adds the requirement for authentication infrastructure and may reduce
   packet transfer performance.  Additional extension mechanisms to
   provide integrity and/or privacy protections are being investigated
   for overlay encapsulations.  Section 7 of [RFC7348] describes some of
   the security issues possible when deploying VXLAN on Layer 2
   networks.  Rogue endpoints can join the multicast groups that carry
   broadcast traffic, for example.

BW> I'm a little confused about the overall point of this section. I think that it might be "Hosted environment sometimes use content inspection to differentiate between management traffic and service traffic." but I don't think this point is very clearly stated. Or is there a different central point?

KM> It was supposed to be management access, but text got added and the DLP doesn't fit, so that has been moved.  I can reach out to Alia who contributed the VXLAN text to work that better into the intent of this section as I can see your point on lack of flow.


   Data center operators may also maintain packet recordings in order to
   be able to investigate attacks, breach of internal processes, etc.
   In some industries, organizations may be legally required to maintain
   such information for compliance purposes.  Investigations of this

KR> I think you'll get a "[citation needed]" from folks on the TLS mailing list.

KM> I suspect this is one you have that recorded text, you have to maintain it for chain of custody with investigation handling.  I'll have to figure out if there is anything that would require the capture, I suspect not, but could be wrong.


3.2.  Hosted Applications

   Organizations are increasingly using hosted applications rather than
   in-house solutions that require maintenance of equipment and
   software.  Examples include Enterprise Resource Planning (ERP)
   solutions, payroll service, time and attendance, travel and expense
   reporting among others.  Organizations may require some level of
   management access to these hosted applications and will typically
   require session encryption or a dedicated channel for this activity.

KR> I'm not sure how encryption of the management session is relevant to this doc. The way I've framed this document in my mind is "What information from flows is being used for network management by entities other than the endpoints, who have a need-to-know for the cleartext payload?", where "management" includes things like compliance and satisfying regulatory requirements.

KM> Leaving as-is for now.  While there is no impact since these sessions are already encrypted, it is one more connection point to using encryption successfully?
Al?


3.2.2.  Mail Service Providers
   ...
   STARTTLS ought have zero effect on anti-SPAM efforts for SMTP
   traffic.  Anti-SPAM services could easily be performed on an SMTP
   gateway, eliminating the need for TLS decryption services.  The
   impact to Anti-SPAM service providers should be limited to a change
   in tools, where middle boxes were deployed to perform these
   functions.

KR> Here you're discussing a potential change to the operational technique, which doesn't match the rest of the subsections.

KM> You're right.  This text came in from Stephen when he was the sonsoring AD.  The point is valid, but doen't fit exactly, but I'm hesitant to remove it.


3.3.  Data Storage

BW> I'm having trouble with the Data Storage section at large. For the most part, it seems to be describing use cases for the deployment of encryption, as opposed to things that people are doing today that would be harder if the network flows were encrypted.

KM> Flow were encrypted in recent years per customer demand and the engineers worked to ensure monitoring was possible, improving logging, etc.  I'll try to make that more clear in the introduction.  This is a positive example, at least it was meant that way and for completeness.


3.3.1.  Object-level Encryption

KR> End-to-end or object encryption seems like a better description here: host-level encryption implies that anything on the originating or target host has access to it, when (for instance) it could be encrypted to a TPM resident key or an SGX enclave key. The distinguishing question seems to be "Do middleboxes/intermediate nodes have access to the cleartext?"

KM> You're right.  The term host-level is an internal one and shouldn't be in the document.  I'll fix that, thanks.  No, middleboxes don't have access in the EMC use cases at least.  This is specific to object level encryption.


3.3.1.1.  Monitoring for Hosted Storage

BW> This is one of the few subsections that seems to be describing an method in current use that will be made more difficult if all the data flows are encrypted.


3.3.2.1.  Monitoring Session Flows for DAR Solutions

   Monitoring for transport of data to storage platforms, where object
   level encryption is performed close to or on the storage platform are
   similar to those described in the section on Monitoring for Hosted
   Storage.  The primary difference for these solutions is the possible
   exposure of sensitive information, which could include privacy
   related data, financial information, or intellectual property if
   session encryption via TLS is not deployed.  Session encryption is
   typically used with these solutions, but that decision would be based
   on a risk assessment.

BW> What would be a monitoring use case in current use that will be made more difficult due to the increased use of encryption? The previous sentence seems to be suggesting that session encryption is already prevalent, so I would think that people deploying DAR solutions have already come up with different monitoring approaches.

KM> The storage engineers improved monitoring capabilities through logging, etc.  This was a requirement, so they made it happen with full agreement on direction.


   There are use cases where DAR or disk-level
   encryption is required.  Examples include preventing exposure of data
   if physical disks are stolen or lost.

KR> I don't see these last two sentences are relevant, as they have nothing to do with the network flows.

KM> I'm happy to remove.  DO they help a reader who is not familiar with the technology to understand the layers of encryption used at all or is it better to remove the sentence?


   In the case where TLS is in
   use, monitoring and the exposure of data is limited to a 5-tuple.

KR> This is an example of implicit use of the taxonomy I referred to earlier. I feel like this should be used systematically throughout the survey (i.e., what metadata does your technique rely on, and why?).

KM> That was the intent, but we didn't get that information from enough contributors.  I provided the above in a very early draft.  We'll have to go back through and see if that's possible at this point.  I know it would be helpful.


3.3.3.1.  Monitoring Of IPSec for Data Replication Services

   Monitoring for data replication services are described in this
   subsection.

   Monitoring of data flows between data centers may be performed for
   security and operational purposes and would typically concentrate
   more on operational aspects since these flows are essentially virtual
   private networks (VPN) between data centers.  Operational
   considerations include capacity and availability monitoring.  The
   security monitoring may be to detect anomalies in the data flows,
   similar to what was described in the "Monitoring for Hosted Storage
   Section".  If IPsec tunnel mode is in use, monitoring is limited to a
   2-tuple, or with transport mode, a 5-tuple.

BW> What monitoring is done when encryption is not in use? Is there something being done that requires access to higher layer protocols? If this traffic is already most-often encrypted, then maybe the use case isn't relevant for this document.

KM> It's just here for completeness

   Security monitoring in the enterprise may also be performed at the
   endpoint with numerous current solutions that mitigate the same
   problems as some of the above mentioned solutions.  Since the
   software agents operate on the device, they are able to monitor
   traffic before it is encrypted, monitor for behavior changes, and
   lock down devices to use only the expected set of applications.
   Session encryption does not affect these solutions.  Some might argue
   that scaling is an issue in the enterprise, but some large
   enterprises have used these tools effectively.

KR> This is another example of mixing proposed solutions in among the problem statement. I would argue for a clear separation, which may mean that this document needs to have a single-minded focus on "here are the problems and here's how enterprises currently address them."

BW> Also, enterprises increasingly allow BYOD programs for their employees, and such programs make it more difficult to ensure that adequate endpoint-based defenses are active. This is especially true when the area of risk in question is the above #5 "track misuse and abuse by employees". Note too that endpoint-based defenses can be less effective when the device is already compromised, in which case detection of the compromised device and effective remediation can be made more effective through the additional use of an on-path element.

KM> [made some subsequent edits to this section]


4.1.3.2.  TCP Pipelining/Session Multiplexing

   TCP Pipelining/Session Multiplexing used mainly by middle boxes today
   allow for multiple end user sessions to share the same TCP
   connection. [This rasises several points of interest with an
            increased use of encryption.  TCP session multiplexing should still
            be possible when TLS or TCPcrypt is in use since the TCP header
            information is exposed leaving the 5-tuple accessible.  The use
            TCP session multiplexing of an IP layer encyption, e.g. IPsec,
            that only exposes a 2-tuple would not be possible.  Troubleshooting
            capabilities with encrypted sessions from the middlebox may limit
            troubleshooting to the use of logs from the end points performing
            the TCP multiplexing or from the middleboxes prior to any
            additional encryption that may be added to tunnel the TCP multiplexed
            traffic.]
   KM> I deleted the following, after adding in some of this content above where I think it makes more sense:
   [Today's network troubleshooter often relies upon session
   decryption to tell which packet belongs to which end user, since the
   logs are currently inadequate for the analysis performed.]

KR> It's not clear to me how these two sentences are related.

KM> Good point.  I think deleting the second sentence is important as it's a general point and not specific to this section.  E2E with FS should prevent the use of this multiplexing as I read it, so that is an impact that should be fine to document on it's own. [edits subsequently above; maybe best viewed from the etherpad]


   Increased use of HTTP/2 will likely further increase the prevalence
   of session multiplexing, both on the Internet and in the private data
   center.[  HTTP pipelining requires both the client and server to participate
   and visibilty of packets once encrypted will hide the use of HTTP pipelining for any monitoring that takes place outside of the endpoint or proxy solution.  Visibility for middleboxes includes anything exposed by TLS and the 5-tuple.
   Note: left the text like this as SNI encryption will be optional, so it may also be exposed and this could vary by version used.]

KR> And further complicate analysis of cleartext payloads from individual packets.


4.2.  Techniques for Monitoring Internet Session Traffic
   ...
   (PII), or personal health information (PHI).  Various techniques are
   used to intercept HTTP/TLS sessions for DLP and other purposes, and
   are described in "Summarizing Known Attacks on TLS and DTLS"
   [RFC7457].  Note: many corporate policies allow access to personal
   financial and other sites for users without interception.  Another
   option is to terminate a TLS session prior to the point where
   monitoring is performed.

KR> The last two sentences seem like a non-sequitur.

KM> Leaving as-is for now.


5.4.  Botnets

   Botnet detection and mitigation is complex and may involve hundreds
   or thousands of hosts with numerous Command and Control (C&C)
   servers.  The techniques and data used to monitor and detect each may
   vary.  Connections to C&C servers are typically encrypted, therefore
   a move to an increasingly encrypted Internet may not affect the
   detection and sharing methods used.

KR> This is one of a general category of traffic that is intentionally protected from interference by the application.

BW> For that reason, this one almost seems like a counter example ... botnets encrypt to evade many of the earlier described methods, therefore many of the earlier described methods are already inadequate. On the other hand, maybe the point is that increased use of encryption for botnet C&C demands new methods to handle many of the previously described use cases.

KM> Well, from an incident investigation standpoint, many advanced operators have found detection methods that don't hinder their capabilities when traffic is encrypted.  Maybe I need to make this text more clear as I think I may have written it, not positive.

I think I'm going to leave the text alone as I think it reads as a positive example, one which techniques could be applied to other areas.  The DoS section talks a little about use of fingerprinting traffic already, which is one common technique.


5.7.  Further work

   Although incident response work will continue, new methods to prevent
   system compromise through security automation and continuous
   monitoring [SACM] may provide alternate approaches where system
   security is maintained as a preventative measure.

KR> Not clear how the unknowns relate to the purpose of this document. Being sarcastic for a minute, I'm interpreting this as "Any cleartext metadata just *might* be used in the future for some kind of enterprise security monitoring!"

KM> Hmm, it's meant to say endpoints (which you control) should be used and technology like what is expected out of SACM will help with automating this.  We are open to text suggestions.


6.1.  IP Flow Information Export
   ...
   The collection of IPFIX data itself, of course, provides a point of
   centralization for potentially business- and privacy-critical
   information.  The IPFIX File Format specification [RFC5655]
   recommends encryption for this data at rest, and the IP Flow
   Anonymization specification [RFC6235] defines a metadata format for
   describing the anonymization functions applied to an IPFIX dataset,
   if anonymization is employed for data sharing of IPFIX information
   between enterprises or network operators.

KR> I don't understand how IPFIX relates to the purpose of this document. Each of the IEs should be covered by one of the functional categories described earlier in the document.

KM> Leaving as-is for now.  Was added per Benoit and Brian Trammell for completeness on network management and encryption.


6.4.  Content Length, BitRate and Pacing

   Although block ciphers utilise padding, this makes a
   negligible difference.  Bitrate and pacing are generally application
   specific, and do not change much when the content is encrypted.
   Multiplexed formats (such as HTTP/2 and QUIC) may however incorporate
   several application streams over one connection, which makes the
   bitrate/pacing no longer application-specific.

KR> Are the four items listed here all the application-level flow information available to the network for encrypted flows? I think I need some more comprehensive top-down analysis to be confident of that. It seems like this could be folded into the metadata taxonomy, if you decide to go that route.

This section is also weird in that it doesn't describe a problem caused by encrypted flows. There's a lack of parallelism in structure between the other sections and this one.

KM> Good point, but not my area and am open to suggestions.  Is it just anothe example where things are ok when encrypted?  I think a few of those are a good thing.


7.  Impact on Mobility Network Optimizations and New Services

   This section considers the effects of transport level encryption on
   existing forms of mobile network optimization techniques, as well as
   potential new services.  The material in this section assumes
   familiarity with mobile network concepts, specifications, and
   architectures.

KR> Good warning, but the entire section is very specific to a particular set of technologies, while the rest of the document is much more general. This feels like a case of "writing to what you know", which is fine in principle, but it feels out of place in this document to use so much jargon and to discuss metrics in the context of a single technology when many of these KPIs have equivalents outside of 3GPP.


   c.  Performance-enhancing proxy with low RTT determines the
       responsiveness of TCP flow control, and enables faster adaptation
       in a delay & capacity varying network due to user mobility.  Low
       RTT permits use of a smaller send window, which makes the flow
       control loop more responsive to changing mobile network
       conditions.

KR> Again, as with section 2.2.4, this section provides the clearest and most convincing arguments for the need for middlebox cooperation on flows.


[Index of Archives]     [IETF Annoucements]     [IETF]     [IP Storage]     [Yosemite News]     [Linux SCTP]     [Linux Newbies]     [Fedora Users]