Rejecting requests for unknown virtual hosts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

What I'd like to do is to only enable requests for my exact virtual hosts, and deny all requests those either supply an unknown virtual host, or don't supply a virtual host at all (don't send any "Host:" header). It is my usual tactics I use on any of my servers. Primarily I do this to keep out at least those vulnerability scanner bots those are primitive enough to shoot by IP and not by domain name, and so they don't supply a valid hostname in their "Host:" header.

While I've read some doc on virtual hosting, mostly the official guide on the Apache website, I still couldn't achieve what I originally wanted. I thought I did, I just noticed the problem today. Most tutorials don't deal with the problem I found.

If the following text is TL;DR for you, please jump to my questions at the end. If you're curious why I asked them, continue to read.

What I tried:

# Include the virtual host configurations:
Include sites-enabled/

# Deny all unknown virtual host names
<VirtualHost *:80>
    ServerName *
    DocumentRoot /var/www
    <Location />
        Order allow,deny
#        Allow from googlebot.com
        Allow from 127.0.0.1
    </Location>
    SetEnvIf Remote_Addr "127\.0\.0\.1" localhostlog
    CustomLog "/var/log/apache2/access.log" combined env=localhostlog
    CustomLog "/var/log/apache2/reject.log" vhost_combined env=!localhostlog
    ErrorLog "/var/log/apache2/reject_error.log"
</VirtualHost>


There is an include for my virtual hosts, configs for them are stored in separate files. Then there is a "default" virtual host config that is supposed to take effect when the request doesn't match for any of my defined virtual hosts. Though it is still a valid virtual host that only I may access from localhost. I keep my phpMyAdmin there, for example, so it is physically inaccessible for any outsiders. Myself use it by building an SSH tunnel to access it. Since no one else has access to the server, and I don't run any proxies, I can be quite sure that no one can access my stuff besides me.

(Irrelevant sidenote, just for curious guys: sometimes I still let access to GoogleBot, just to present it a robots.txt file that denies it to crawl anything. I did this when I got an IP address for a VPS which was previously used by someone who's dumb enough to don't point a domain name for his site, or let his domain name still point to his old IP - many users was looking for the guy's site on my server, most of them were referred by Google. I thought that maybe it will help Google to delete its indexes more quickly if I show it a valid site with a denying robots.txt. That's the story of the commented "Allow" line for Googlebot.)

Also I log these invalid requests to reject.log (it's better if I can see and analyze the hopeless attempts of vulnerability scanners either way), except the valid requests from localhost which are logged to access.log. Requests for the respective virtual hosts are also logged to distinct files.

So what's the problem? Everything seems to work. Valid virtual hosts are served accurately, requests are logged to the correct logfiles. Clients those supply an invalid virtual host are presented with a cute, well-deserved 403. So what's the problem?

Vulnerability scanners those not only send an invalid virtual hostname, but doesn't send a "Host:" header at all, are still get served by my first virtual host, not this last virtual host that would give a 403 by design. But I just got informed, this is the expected, documented behaviour:

If no matching virtual host is found, then the first listed virtual host that matches the IP address will be used.
(Apache website on Name-based Virtual Host Support)

OK, make it the first virtual host config! Naive! If I put my all-rejecting virtual host before the include for my specific virtual hosts, then all request will be served by the rejecting virtual host - even request for my legit virtual host names. But it is also the expected behaviour:

Now when a request arrives, the server will first check if it is using an IP address that matches the NameVirtualHost. If it is, then it will look at each <VirtualHost> section with a matching IP address and try to find one where the ServerName or ServerAlias matches the requested hostname. If it finds one, then it uses the configuration for that server.
 
Apache tries to find a suitable virtual host config by looking from up to down. Of course, "*" matches everything, so the all-rejecting virtual host config will catch all requests, the other virtual hosts won't be checked ever.

Interesting to note that even my first virtual host gives a 400 (Bad Request) response for any requests those lacking the "Host:" header, I don't know what is the reason, but I don't have any problems with it, since this is what I originally wanted to do - reject requests without "Host:" header. The problem, then, since the request is still processed by one of my legit virtual hosts, it will be logged to the virtual host's specific access log file, and not to reject.log. Secondarily, the legit virtual host will reveal its name in the text of the 400 response:

Bad Request

Your browser sent a request that this server could not understand.

Apache/2.2.16 (Debian) Server at my_legit_virtual_host.domain.tld Port 80

Why should I help the vulnerability scanner bot by telling it a valid virtual hostname it didn't know before?

Sorry for the long elaborated e-mail. My questions are:

Thanks for your help in advance,
MegaBrutal

[Index of Archives]     [Open SSH Users]     [Linux ACPI]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Squid]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]

  Powered by Linux