Re: Block IP (Apache Users)

On 6/6/08, Mohit Anchlia <mohitanchlia@xxxxxxxxx> wrote:

On 6/6/08, André Warnier <aw@xxxxxxxxxx> wrote:

Mohit Anchlia wrote:

On 6/6/08, André Warnier <aw@xxxxxxxxxx> wrote:

Mohit Anchlia wrote:

On 6/5/08, André Warnier <aw@xxxxxxxxxx> wrote:

Mohit Anchlia wrote:

On 6/5/08, André Warnier <aw@xxxxxxxxxx> wrote:

Mohit Anchlia wrote:

On 6/5/08, André Warnier <aw@xxxxxxxxxx> wrote:

Mohit Anchlia wrote:

On 6/4/08, Dragon <dragon@xxxxxxxxxxxxxxxxxx> wrote:

André Warnier wrote:

Mohit Anchlia wrote:

2. Another question I had was sometimes we don't get real physical
IP

of

the

machine but the IP of something that's in between like "router",

is
there
a
way to get the real IP so that we don't end up blocking people
coming
from
that "router" or "proxy"

In my opinion, you cannot. The whole point of such routers and
proxies

is

to make the requests look like they are coming from the
router/proxy,
so
that is the sender IP address you are seeing at your server level,
and
that's it. Your server never receives the original requester IP
address.

---------------- End original message. ---------------------

There are legitimate reasons for this to be done as well,

indiscriminately
blocking such access is a bad idea as it will affect legitimate
users.
NAT
and IP address sharing are among the reasons. This allows an
organization
to
have a router with one public IP address to serve a larger internal
network
with private IP addresses. Without this, we would have run out of
IPv4
addresses a long time ago.

Dragon

If there is no way to get the real IP address then how would router

know
which machine to direct the response to. It got to have some
information
in
the packet. For eg: If A send to router B and router sends to C then
when
C
responds how would B know that the response is for A.

You are perfectly right : the router knows the real IP address. But
it

will not tell you, haha.

Seriously, this is how it works :
the original system sends out an "open session" packet, through the
router,
to the final destination.
The router sees this packet, and analyses it. It extracts the IP
address
and port of the original sender, and keeps it in a table.
Then it replaces the IP address by it's own, adds some port number,
and
also memorises this new port number in the same table entry.
Then it sends the modified packet to the external server (yours).
It knows that the server on the other side is going to respond to
this
same
IP address and port (the ones of the router).
When the return packet from the server comes back, the router looks
at
the
port in it, finds the corresponding entry in it's table, and now it
knows
to
whom it should send the packet internally.
And so on.
So :
- the router knows everything
- the internal system thinks it is talking directly to the external
server
- the external server (yours) only sees the router IP and port, so it
thinks that is where the packet comes from.

That's NAT for you, in a nutshell.

Yes ?

---

Thanks for the great explanation. But, I wonder how do people design

app
agains Denial of Service attack. Say Computer A uses Cox/Times warner
(cable) Internet connection and starts attacking B, then how would a
system be configured in a way that not all the users using Times
Warner/Cox
are affected. Should it be granular enough to give IP and source Port
in
IP
blocking rules ?

I think that is quite a different case. Not all users of an ISP (like

the
one you mention I suppose) are "behind" a NAT router that hides their
IP
address. Instead, these ISP's have a large pool of public IP addresses
which they "own", and they attribute them dynamically to users when
they
connect (and put the address back in the pool when the user
disconnects).

If a DOS attack came from a router with a fixed IP address, and
everyone
would know that this IP address belongs to company xyz, I'm sure that
it
would not be long before company xyz would be facing a big lawsuit.

But in the case of an ISP, with tens of thousands of customers, each
one
of
which gets a different IP address each time he turns on his computer
(and
anyway once per 24 hours in general), finding out who exactly was "
a234d-45hjk-dialin-atlanta.cox-t-warner.net" between 17:45 and 17:53
yesterday is a bit more time-consuming.

But in that case anyway, you do have a real individual sender IP
address
when the packet reaches your server, so you can decide to block it.
And keep blocking all packets from this address for the next 24 hours.
And that's exactly what many servers do.
And that is also why sometimes you may turn on your PC at home (getting
a
brand-new IP address) and find out that you cannot connect to some
server
because it is rejecting your IP address. Chances are that you are
unlucky
enough to have received today the IP address that was used yesterday by
someone else who used it to send out 1M emails.

But isn't this getting a bit off-topic ?
If you want to know more about this, I suggest you Google a bit on
"blacklists", "greylists" and "whitelists" for example.
or start here : http://en.wikipedia.org/wiki/DNSBL

Thanks ..it did go off-track a little bit and but it helps me
understand
what I should expect when doing such a blocking. Thanks for your
explanation.

Now coming back on track, out of below 2 approaches which one is better:

1. Use "deny from IP" in <LocationMatch>
2. Use RewriteCond and call a perl script dynamically. This helps me
configure IP dynamically without having to stop and start servers
everytime
I change httpd.conf

Is there any performance impact of using 2 over 1 or any other issues.

There will be a very big difference : in case (1), the IP addresses or
ranges are pre-processed by Apache at startup time, and the comparison
will
be made by an internal (and fast) Apache module, on the base of
information
in memory. In case (2), not only are you using a rewrite of the URI, but
in
addition you will be executing a script, which itself is going to read an
external file. That is going to be several hundred times slower, at
least.
Thousands of times slower if you recompile and execute the script with
perl
each time (if not under mod_perl).
Now wether it matters or not in your case, depends on the load of your
server. If it is doing nothing anyway 90% of the time, it doesn't matter.
An Apache restart may or may not be such a big problem either, it all
depends on your circumstances.

But rather than using a perl script, I would definitely in that case use
a
mod_perl add-on module written as a PerlAccessHandler. But that's
another
story, and one more for the mod_perl list.
I would bet that there exists already such a mod_perl module by the way.
Have a look here :
http://cpan.uwinnipeg.ca/search?query=apache2&mode=dist
or, there is probably an example in the Mod_perl Cookbook

As per your suggestion I looked at PerlAccessHandler, how would this
approach be in terms of performance as compared to have "deny from IP", is
it still going to be really bad.
<Location /URL>
PerlAccessHandler Example::AccessHandler
</Location>
I will try running some test also.

Well again, it all depends on your circumstances, what you want to achieve,
how many accesses you expect, why exactly you want to block or allow some
IPs, how many different IP's or IP ranges you would want to allow/block, how
often they change, in function of what they change, whether it is a big
problem or not for you to do an Apache restart, how loaded your system is
expected to be, etc..
Even if one solution looks like it is 200 times slower than another, but
your server is only loaded at 10% (happens more frequently than you would
think), and it really makes your life easier for the next 3 years, it's
worth looking at.
And even if one solution is 200 times slower than another, that can still
mean 0,1 millisecond, so is it important for you ?

A simple tip :
in the Apache configuration file, you can use an "include" directive, I
believe just about anywhere, to insert at that point another bit of
configuration file.
You could have a simple text file containing all your
Deny from 1.2.3.4
Deny from 2.3.4.5
...
lines, and include it wherever you want.
Then a simple Apache restart would re-read it.
A this file could be written and re-written by some external script which
decides which IPs are allowed or not. Or edited with vi manually, if that is
how often changes happen.

If you have a PerlAccessHandler under mod_perl :
- perl itself is part of the server, so it does not have to be reloaded
each time
- the handler gets compiled once the first time it is run, and the compiled
code is re-used afterward
- it can be smart, and only re-read the IP address list, and rebuild its
internal table when the file changes
- and in the meantime, it uses the table in memory
So in that case you would not have to restart Apache, and any changes would
take effect immediately.

Also, something else :
So far, you have been talking about blocking HTTP accesses at the Apache
level. But maybe you want to block more than port 80 from those IP
addresses, and maybe you should do this outside of Apache, before it even
gets to Apache ?

There are many solutions, but you are the one to decide which one you
implement.

Thanks. You are right we should not even let these people get to Apache. We
have that process in place, but it often takes time to get that request
approved and processed by Network team. Meanwhile we want something that we
can block on ASAP. I am not sure how often this list will change. To begin
with this list is going to be empty. Only when we experience DOS then we
will update the IP.

We expect to get 1000s of requests per second. Since it's going to be highly
loaded server I started to think about something that would change
dynamically. You mentioned the code is compiled when apache restarts, which
means that if I keep list of IPs as an array inside the perl script is not
going to take affect until next restart.

The following is a bit academic, because I believe that with this kind of volume you will be better off with a solution outside of Apache anyway, but for the sake of argument :

That is not exactly what I meant. The list of IP's to block is in an external file, which can change from time to time.
With mod_perl,
- the perl interpreter is "embedded" in Apache from the start. To say it another way, you have an Apache with a built-in perl compiler and run-time. That means that later, to run compiled perl code, Apache does not have to start an instance of the perl run-time anymore, it is already loaded and ready-to-run.
- the perl add-on modules (the code), are also compiled (by perl) when Apache starts, and the "compiled" version is in memory, ready to run. Just like one of the standard C-based Apache modules like mod_mime, mod_rewrite etc..
- however, the list of IP addresses is outside, in a file, and the perl module, at start, has an empty table.
- the first time the module is called, it checks the table and sees that it is empty. Then it reads the file, fills the table, and notes the timestamp of the file. Then it handles the current request, to see if the IP matches or not, and rejects/approves the request.
- the next time the module is called, it checks the table, and it is not empty. It then checks the timestamp of the file. If it has changed, it reloads the table from the file, otherwise not. Then it processes the current request. (If you want to not check the file at each request, but only every 30 seconds or every 10,000 requests, you can do that too.)
You can do this kind of thing with mod_perl in this case, because you only read from the table (except when you totally reload it), and because it does not matter if several Apache "children" each have their own copy if the table.

(In the above, I put "compile" between quotes, because perl compiles a script into "byte-code", which is later interpreted by the run-time portion of perl. But it is very fast, sometimes even faster than compiled C code. And it is very much easier, and more fun, to write an Apache add-on module in perl, than in C. At least for me.)

Only option I think then is to read

the list from flat file. I just have one basic question about mod_perl. Does
apache web server executes one process of perl per request ? Reason I am
asking is because you mentioned I could read the list from memory, and I am
not sure how would it read from memory when this script will be executed
every time it tries to process the request. Because if I try to read from
file then every request will try to open the file and read from it. It looks
like a stateless.

Thanks for detailed explanation. It does clear lot of things and also is
giving me different view points. Include directive was a great tip that I
wasn't aware of.

But it will not work in your case, because you would need to restart Apache, which will take a few seconds, during which there will be a huge number of unsatisfied HTTP requests piling up.

Now, if you are really going to have 1,000's of requests/s on this server, I would be very interested in writing such a mod_perl module for you, and have you try it out on your server. Just for the sake of seeing if it would work. And if it does, I'll put it in my CV.

André

Thanks. It doesn't look like you need to put it on your CV, people probably know you by you name :).

Were you really serious ? Did you mean that the mod_perl module that you are proposing will read the file or provide mechanism of reading the file only once. Thanks a lot!!