Re: Google search indexing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> This is really two questions about Google indexing
> Let's say I have a site in PHP & MySQL.
> Let's say that I have some links on my site that use GET variables to
> call other PHP pages and pass them a GET variable, like
>
> http://www.somesite.com/somedir/somepage.php?flag=15
>
> The "flag" variable is passed to somepage.php and read by the script
> using $_GET['flag'] etc, etc. When you look at the page with flag=15,
> you get one page, when you look at it with flag=14 you see a similar
> page with completely different content (record #14 instead of #15
> obviously)
>
> Will Google see both pages if I have both linked with <A HREF="">
> tags? Or will it stop at the question mark, only loading the page
> somepage.php and ignore the ?flag=14 and ?flag=15 or whatever? Will
> it index ?flag=14 and ?flag=15 as two separate pages (which is really
> what I want, since they produce different content), or will it treat
> both as the same page?

If Google (et al) see the pages AT ALL, they will see them as separate
pages.  They have different URLs <==> they are different pages

Some search engines skip all URLs with ? in them.

Others selectively use the URLs with ? in them.

It's possible (but unlikely) that some search engines use *all* the URLs
with ? in them.

I don't know what Google, specifically, will do under what circumstances,
much less what they might decide to do tomorrow or next year.

> SECOND QUESTION, RELATED:
>
> Same scenario, but with a POSTed form. I have several hidden FORM
> fields, an a drop-down, and depending on how you submit the form you
> get different content on the resulting page.
>
> Will Google submit the form, perhaps a couple of different ways and
> treat each resulting page differently, or will it just bypass the
> form altogether?

I don't know of *ANY* search engine that will POST data to get to content.

I sincerely doubt they would want to do that, really.

> THIRD QUESTION:
>
> If the answers to the questions above are Yes and No, then I could
> use a dynamically generated list of links with ?flag= to make Google
> crawl through the part of the MySQL content (as displayed through the
> scripts in HTML) that I want it to, using links and GET variables,
> right?

Maybe.

Google might do it, while others won't and vice versa.

> If the answers to the questions above are No and No, do I have to set
> up a static .php page for EVERY record in my MySQL database to make
> it see that content I want it to see? Does anyone use the error.php
> page to catch for a 404 Not Found error, see if it can match the
> "ghost" name to a record in the DB, and display a page anyway (even
> though technically there is no somepage.php page, the error.php page
> knows to go look in the databsae for "somepage" and displays its
> content)?  I wonder if this would be a good optimization strategy.

There are several other possible solutions:
1. Use your robots.txt file to send the search engines to a "secret" page
that links to all your content, asking the engine to index that page.

2. Use Apache's mod_rewrite module to change URLs like:
http://example.com/page.php?flag=14
to URLs like:
http://example.com/page.php/flag/14
or
http://example.com/page.php/flag=14
or
http://exmaple.com/page/flag=14/page.htm

3. Use PHP and $_SERVER['PATH_INFO'] to do all the same things as in #2.
There are many examples/articles "out there" how to do this.
Google for "PHP $_SERVER PATH_INFO" and you should find some.

Also be sure to Google for "robots.txt search engines" to find out more
about the robots.txt file -- While I don't use it much myself, others find
it useful.

Finally, a note of caution.  At some point, if you have *enough* records,
you don't want to make the URLs look static.  If you do, and force Google
(et al) to index, say, a MILLION relatively un-interesting pages...

Put yourself in Google's shoes:  "Hey, here's this goofball that made us
index 1,000,000 pages of uninteresting content.  Let's just put him on the
blacklist and not index his site at all."

Use some common sense here, or suffer the consequences.

Let me give you an example:

Suppose you were responsible for maintaining a list of, oh, I don't know,
registered Republicans/Democrats.  Further suppose, for some reason, that
making this list public on the web was legal (I dunno) and you wanted to,
or, more likely, your boss wanted you to do that.

You *could* have a site where every registered voter was on their own page
with an elephant or donkey, and you *could* make static-looking URLs to
force them to get Googled...

Or you could have static-looking *pages* so every page has a couple
hundred, or even a thousand peope, to get Googled, with a nice common
masthead with the elephant or donkey.

If you force the search engines to index those zillion pages, one for each
person, you're going to make somebody cranky.  Somebody you *want* to be
friends with.

OTOH, if you arrange it so they aren't indexing *too* many pages, and the
content is useful to potential visitors, they'll like you.

I can definitely state, for the record, that it's VERY effective to make
your URLs look static -- I maintain a free online database of music venues
for touring indie musicians, and used to have dynamic URLs.

Only a few days after changing to static URLs, I suddenly noticed that
when I was searching for the venues that were out of date, *my* pages were
popping up very high in the rankings.

In fact, if the venue had a site, my page was usually right after theirs. 
If they had no site, my page was turning out #1 almost all the time.  (For
venues that had distinctive names.)

This is not because I'm some search engine expert, but because the content
being seached for matched the content I was delivering.

If you Google for those venue names, pretty much the CHaT site will come
up really high in the list.  Google for a venue that's been closed for
awhile, and we're pretty much the only result.

[aside]
We track closed venues because artists are often steered toward them by
other out-of-date references or well-meaning former residents of their
target destinations as they tour around the country.

It also helps to know about a venue that closes/re-opens, Under New
Management, and closes/re-opens again on a routine basis.

These are generally music venues you want to avoid, because often the
reasons for their closing/re-opening will adversely affect your desire to
perform there.
[/aside]

Be sure to convert your internal links to the static-looking links, even
if you code it so both work equally well, as I did to keep legacy URLs
valid.  The search engines will like your pages better because they'll
find your internal links.

You can see examples here:
http://chatmusic.com/venuealpha/a

Feel free to add your favorite music venue if it's not in there!

-- 
Like Music?
http://l-i-e.com/artists.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux