Re: Join on single table

"Matthew Weier O'Phinney" <matthew@xxxxxxxxxx> · 12 Feb 2005 17:23:31 -0000

* Micah Stevens <micah@xxxxxxxxxxxxxxxxxx>:
> What kind of speed do you need? 

I was aiming at something that would run in 2 hours or so. Currently, it
takes around 3 - 4.

> I had to go to temp tables for a logging application, but not until my
> table got upwards of a million records or so..  For the numbers your
> quoting, it should be pretty quick unless your engine needs
> optimization.. 

Well, actually... The index currently has around a million records, with
approximately 350 words per resource id, several thousand resource ids
per app_id.... you get the idea. I've optimized my indices, and I've got
appropriate indices for the selection criteria I'm using.

As I've noted, I also realize that this is a one shot deal -- after the
initial link table is created, I won't be needing to recreate it every
time I have data that changes -- I'll only be operating on deltas from
here out, and those can be done quite quickly.

Anyways, it's a moot question now -- I've got the data in, and I'm
pretty happy with the results I've seen.

> On Friday 11 February 2005 11:56 am, Matthew Weier O'Phinney wrote:
>> * Micah Stevens <micah@xxxxxxxxxxxxxxxxxx>:
>> > Sounds like a self join should work wonders.
>> >
>> > I didn't test this, but the idea should work:
>> >
>> >
>> > select t1.app_id as a1, t2.app_id as a2, t2.word, t2.score
>> > from tablename as t1
>> > left join tablename as t2
>> > on t1.resource_id = t2.resource_id and t1.app_id != t2.app_id
>> > group by word
>> > order by word
>> >
>> > voila, no temp tables. Am I missing something?
> >
>> Speed. :-)
> >
>> I had tried this as well. The temporary tables really do offer much
>> better performance. The issues I was having were (1) bad resultsets
>> (I've now got that fixed) and (2) speed. I still don't have (2)
>> completely fixed, and it may be something I can't fix.
> >
>> > On Thursday 10 February 2005 07:56 pm, Matthew Weier O'Phinney wrote:
>> >> I have a table which contains the following:
>> >>     id (primary key, auto incrementing)
>> >>     app_id (integer, foreign key)
>> >>     resource_id (integer, foreign key)
>> >>     word
>> >>     score
>> >>
>> >> (This is a search index.) I want to find all resource_ids from one
>> >> app_id that match resource_ids in another app_id by word.
>> >>
>> >> I have created a temporary table 'tmp1' that contains all resource_ids
>> >> from the second app_id (the one whose resources I wish to retrieve). I
>> >> am then looping through all resource_ids in the main table with the
>> >> first app_id, and doing the following:
>> >>
>> >>     * Creating a temporary table tmp2 with a single column 'word'
>> >>       populated by the words associated with resource_id in the main
>> >>       table
>> >>     * Selecting all distinct resource_ids from tmp1 INNER JOIN'd on tmp2
>> >>       on the word field
>> >>
>> >> The issues I'm running into are that (1) each resource_id cycle takes a
>> >> good amount of time, and (2) I seem to be getting either too many
>> >> resource_ids or not enough.
>> >>
>> >> (1) may be something I just have to deal with. As it is, I'm planning on
>> >> running the full indexing once, and then doing incremental updates, so
>> >> it may not be that big of an issue (unless it takes too much time to
>> >> create the initial index). As for (2), unfortunately, I'm not sur ehow
>> >> to really trouble shoot the issue. I know, for instance, that in once
>> >> case, I have a list of 343 words that generates a list of ~12,000
>> >> resource_ids (of a possible 18,000) -- but I don't quite know how to
>> >> spot check 300 values to be certain that this is reasonable.
>> >>
>> >> In a previous incarnation of the script, I was looping through each word
>> >> of each resource_id and then selecting out of tmp1 based on the single
>> >> word value. The results were very different (very few matches), and,
>> >> again, the script ran long.
>> >>
>> >> Any ideas?

-- 
Matthew Weier O'Phinney           | WEBSITES:
Webmaster and IT Specialist       | http://www.garden.org
National Gardening Association    | http://www.kidsgardening.com
802-863-5251 x156                 | http://nationalgardenmonth.org
mailto:matthew@xxxxxxxxxx         | http://vermontbotanical.org

-- 
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php