Re: [LARTC] Intelligent P2P detection

Gordan Bobic <lartc@xxxxxxxxxx> · Thu, 27 Mar 2003 10:16:37 +0000

On Thursday 27 Mar 2003 09:24, Luman wrote:
> >Assumptions:
> >  Determine and mark 'good traffic' -- i.e. smtp, ftp, http, ssh, etc.,
> >  everything which uses well known ports.  Probably most people do it
> >  anyway, at least to some level.
>
> The problem is with that currently P2P soft often use these well known
> ports too. So the assumption that port 80 is only for HTTP is bad.

Yes, but you can then analyze the traffic just on those ports for
verification. HTTP patkets have a certain anatomy to them. This can be
detected. Same goes for FTP, SMTP, etc. The problems start with HTTPS, SSH,
IMAPS, POP3S, etc, as they are encrypted, and therefore cannot be analyzed.
With those, you simply have a problem. However, for SSH, IMAPS and POP3S you
don't need lots of bandwidth. You could therefore throttle them to
low-latency low-bandwidth. P2P networks will not like this.

SFTP runs over SSH so you may have a problem with that. HTTPS is also
problematic.

However, you can scan to verify if SSH/HTTPS is being used. You can simply
write a bot that scans the ports when your router detects traffic. It can
send valid SSH/HTTPS connection requests and see if it talks back as
SSH/HTTPS should.

Unfortunately, it gets progressively more difficult when P2P clients learn to
masquerade as the real protocols, and there is at least one P2P application
out there that can operate over SMTP, sending valid requests. :-(

I hope you are prepared to accept that eventually it all comes down to a
battle of wills between the sysadmins writing filters and the P2P developers
finding more ways to outsmart the filters.

> The intention of the bringing forward my problem is to open wider
> discussion aimed to find or if need be to build a "tool" (it might be a
> kernel patch, or whatever), which can thoroughly analyze traffic and its
> content, and on the base of it can take a decision (likely not with 100%
> likelihood) what is the content. For instance it can detect that the
> traffic is HTTP even if it is sent to 46723 port, basing on the content
> of data.

How do you deal with HTTPS/SFTP/SSH/IMAPS/POP3S? Automatically do a man in
 the middle attack on everything, at your router?

> Such tool should based on a modular architecture allowing
> adding new testers or new knowledge trying to guess the protocol.

This can generally be done for unencrypted connections, but once things start
to run over SSL (some already do), the chances of "recognizing" traffic very
soon become adjusted to zero. :-(

> Obviously, it should track connections, session and everything what can
> be used to traffic classification.

In order to write a rule for traffic analysis, you must first know what to
look for. If you can come up with a method to analyze SSL traffic, especially
in real-time (or close to), I am sure a lot of people in the security
industry will want to hear from you.

> As the result packets would be marked by a standardized number
> determining type of a protocol, for instance HTTP, KaZaa, MSN, SSH, SCP
> etc.

If you can get as far as distinguishing packets, then that's great. How to
 get that far is the difficult part.

> This knowledge could be used to traffic shaping and whatever. Can
> you imaging the comfort of administrators' work if at the border router,
> or at the firewall configuration, can work with this well determined
> content. Number or rules would be reduced dramatically. Obviously, the
> classification knowledge would be growing day by day.

Sounds great, in theory.

> Whole idea is very similar to Unix 'file' command. For instance I had on
> my system "a.gz" file. The type of this file is obvious this is gzip.
> However, I changed its name to "a.txt". It should suggest that this is
> text file, however, when I run file a.txt I get the fallowing answer:
> ~# file a.txt
> a.txt: gzip compressed data, deflated, original filename,
> `ucspi-tcp-0.88.tar', last modified: Sat Mar 18 16:21:39 2000, max
> compression, os: Unix.
> This program doesn't care about extensions it tries to guess the type by
> analyzing content. Of course many times it gives wrong answer, but that
> is related to weak of knowledge.

Your knowledge of encrypted traffic is non-existant. That is the whole point
of encryption. How do you disallow valid encrypted traffic while allowing P2P
encrypted traffic? What happens when the method of using SMTP for P2P becomes
more wide spread? You can send perfectly valid looking emails - that are PGP
encrypted, with all the SMTP headers in place to make them indistinguishable
from real PGP encrypted email.

> Summarizing my pretty long mail, I think our present methods are similar
> to determining the content of file basing only on extension of its name.
> But I believe we strongly require something more.

And that will work until encryption is used. As Most P2P networks are now
starting to use encryption on the connectiong streams, this very quickly
becomes unworkable.

> >  All what is left are P2P connections and some other misc connections.
> >  A bit unfair for other protocol using non-standard ports, like
> >  Instant  Messenger style-software, and lots of other stuff.  So here we
> >  introduce
> >  a trick.  IMs and other low bandwidth traffic will use small packets
> >  ( < 512 or even < 256), P2P will use maximum MTA available (usually
> >  1500, but I've seen some using 576 packets, hence i treat > 512 as
> > P2P).
> >
> >  Probably you've notices that I mention round numbers, as 512 or 1024,
> >  that's because I use u32 for marking packets.  How I do it, we leave
> >  as an exercise to the reader. ;-)))
>
> I like your solution very much. I'll try to apply it for my system, as a
> temporary solution.

That sounds like an interesting idea, provided you have some real evidence of
this being the case. And this will only work until P2P network software
starts to randomly change packet sizes to obfuscate itself. :-(

But, I guess we have to work with what we have now, and not worry about the
future advancements before they happen. :-)

I hope you will all forgive me for being... restrained (for want of better
word) in my expectations of the success of such network traffic analysis. It
is a depressing subject to talk about. :-(

I cannot help but think that this is also starting to get slightly off-topic
for this mailing list...

Gordan

-------------------------------------------------------

Re: [LARTC] Intelligent P2P detection

Linux Advanced Routing and Traffic Control