Re: Port-based vhosts

André Warnier <aw@xxxxxxxxxx> · Wed, 11 Mar 2009 10:15:01 +0100

Charles Sprickman wrote:
[...]
  Under what

conditions does Apache then get involved and alter the URL?  Just 

redirects?  I understand a common redirect is just adding a trailing 

slash when the user does not supply it.  What are some other common 

cases? Who's call is it when a simple static site uses non-absolute URLs 

for all the links?  Is the browser building the fully-qualified links or 

apache (I suspect the former)?

If you suspect that it is the browser, you suspect correctly.  But the 

explanation is somewhat messy (and lengthy) unless you really understand 

the basics.  Let me try a not entirely correct but hopefully didactic 

explanation.

Say the browser retrieves a first html page from a server, using the URL 

"http://server.company.com/mydir/mypage.html";.  This URL, from which the 

browser retrieved the current page, is now for the browser the "base 

URL" of the currently displayed document.

Now say that this page contains a relative link like <img 

src="images/myface.gif" />.

If the user clicks on this link, the browser will construct a new URL by

- removing the last component of the base URL (in this case 

"mypage.html"), leaving "http://server.company.com/mydir/";

- re-adding to that the relative link "images/myface.gif", giving 

"http://server.company.com/mydir/images/myface.gif";

- retrieving this new URL

Nothing of that happens at the server side.  It's all done at the 

browser level, any browser.

In reality, what happens is a bit different, because in a URL like 

"http://server.company.com/mydir/mypage.html";, there are several parts 

which are processed differently and independently, and a HTTP request is 

not really to "http://server.company.com/mydir/mypage.html";.  The real 

HTTP request sequence is more like this :

a) the browser opens a TCP connection to port 80 of the host which has 

the IP address corresponding to the DNS resolution of the hostname 

"server.company.com"

b) on that connection, the browser writes a HTTP request like
GET /mydir/mypage.html HTTP/1.1
Host: server.company.com

then it switches to read mode and waits for the server's response to 

arrive on that same connection.

So in my first explanation above, you have to leave out the "protocol" 

and "host:port" from the current page's base URL, but the general idea 

remains.

Now about the redirects, re-using the above logic.
(This is what is called "external redirects", see later).

b) the browser sends a request to the server, like
GET /mydir HTTP/1.1
Host: server.company.com

c) the server sends a response to the browser, like
301 (this thing has moved, definitely)
Location: /mydir/  (here is the new location)

d) now the browser, automatically, re-sends a new request on the same 

connection :

GET /mydir/ HTTP/1.1
Host: server.company.com

e) and, presumably, the server now responds with the requested content.

In addition, if the browser is smart, it will remember that the URL 

"/mydir" has moved to "/mydir/", and the next time it will request it 

directly, even if the forgetful user would request "/mydir" again.  It 

will also show the "/mydir/" in the URL bar for that page, because that 

is the real URL it got the page from (and in the vain hope of educating 

the user about the fact that the URL "/mydir" is the wrong one and 

should not be used anymore).

So, the penalty of using a 301 re-direct is that there is one more 

round-trip server-browser-server (see c and d above).  But it is a 

relatively small one, because the content is very short, and because 

nowadays with keep-alive connections the same TCP connection 

browser-server can be used for all of it.

The benefit is that the browser has the correct idea of what the "base 

URL" is at all times, and thus that it can correctly interpret relative 

URLs and compose the correct follow-up requests.

"Internal" redirects :

These are things that the server does internally, without telling the 

browser about it.  mod_rewrite allows you to internally modify a request 

URL before the rest of the server will make an attempt at finding and 

serving the requested resource.  In that case thus, the browser sends a 

request like

GET /mydir HTTP/1.1
Host: server.company.com

and the server, internally, modifies this "/mydir" to "/anotherdir/", 

then proceeds to immediately serve the content of "/anotherdir/", 

without sending a redirect to the browser, and without telling the 

browser about anything.  The browser gets a response :

200 OK
...
.. content of "/anotherdir/"

This is obviously faster, because you avoid a round-trip to the browser 

and back, through a potentially slow connection.

But now the browser does not know about the substitution, and genuinely 

believes that what it got was the content corresponding to the "/mydir" 

URL. So now if in this content it finds relative links like 

"images/myface.gif", it will interpret them relative to the base URL 

"/mydir", and that may cause further problems.

So by doing this, you may be saving one round-trip for the original 

"/mydir", but at best forcing subsequent round-trips for other links, at 

worst potentially confusing the browser into requesting further invalid 

URLs.

Whether one or the other scenario is better in your case, depends on 

many factors, and you have to evaluate those yourself in function of 

your website and what is really going on there.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@xxxxxxxxxxxxxxxx
  "   from the digest: users-digest-unsubscribe@xxxxxxxxxxxxxxxx
For additional commands, e-mail: users-help@xxxxxxxxxxxxxxxx