Re: what would a c extension buy me

joseph <kmh496@xxxxxxxxxx> · Mon, 13 Mar 2006 22:32:55 +0900

mr. maas,

psychic me ........ you are a man.

i created an index, no change.  but i already knew that because one of
the cases where an index is never used is when 
> The key used to fetch the rows is not the same as the one used in the
> > ORDER BY: 
> 
> which is the case here.

(index creation stuff follows)

i'm just going to bite the bullet for a few months and do a massive code
rehaul.  move all the matching code preferably to the sql-server.   or
pre-compiled C code for the php [ make an extension ].  i think the mb_
functions in php are very slow ( reportedly ) ... so move those
somewhere else is a good idea.  maybe the 'select ... like' clauses to
the sql-server 
maybe the improvements you suggest will make a difference, maybe not,
but  when i think about how fast mysql runs its  

mysqlimport --local --fields-terminated-by='*X*X' dict_explicit
french_english french_english 

compared with 

mysql < a.sql  (bunches of insert statements)

that's the kind of improvement i am hoping for.  more than 500 queeries
a second i am getting now, i need to really really improve that if i am
going to parse html on the fly with the technology i have.

mysql> describe korean_english;
+-----------+---------------------+------+-----+---------------------+----------------+
| Field     | Type                | Null | Key | Default             |
Extra          |
+-----------+---------------------+------+-----+---------------------+----------------+
| wordid    | int(11)             |      | PRI | NULL                |
auto_increment |
| word      | varchar(130)        | YES  | MUL | NULL                |
|
| syn       | varchar(190)        | YES  |     | NULL                |
|
| def       | blob                | YES  |     | NULL                |
|
| posn      | int(2)              | YES  |     | 1                   |
|
| pos       | varchar(13)         | YES  |     | 1                   |
|
| submitter | varchar(25)         | YES  |     | NULL                |
|
| doe       | datetime            |      |     | 0000-00-00 00:00:00 |
|
| wordsize  | tinyint(3) unsigned | YES  |     | NULL                |
|
+-----------+---------------------+------+-----+---------------------+----------------+
9 rows in set (0.00 sec)

mysql> explain select
    -> word,def,wordid,pos,posn,wordsize,syn from  korean_english  where
word
    -> like '운전할 %'  order by wordsize desc;
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
| id | select_type | table          | type  | possible_keys | key      |
key_len | ref  | rows | Extra                       |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | korean_english | range | word_idx      | word_idx |
391 | NULL |    1 | Using where; Using filesort |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
1 row in set (0.01 sec)

mysql> CREATE INDEX wordsize_index USING BTREE ON
korean_english(wordsize);
Query OK, 205265 rows affected (11.16 sec)
Records: 205265  Duplicates: 0  Warnings: 0

mysql> explain select word,def,wordid,pos,posn,wordsize,syn from
korean_english  where word like '운전할 %'  order by wordsize desc;e);
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
| id | select_type | table          | type  | possible_keys | key      |
key_len | ref  | rows | Extra                       |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
|  1 | SIMPLE      | korean_english | range | word_idx      | word_idx |
391 | NULL |    1 | Using where; Using filesort |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-----------------------------+
1 row in set (0.00 sec)

mysql>                                                               

2006-03-13 (월), 14:04 +0100, Jochem Maas 쓰시길:
> ...
> 
> >>>word,def,wordid,pos,posn,wordsize,syn from  korean_english  where word
> >>>like '운전할 %'  order by wordsize desc
> >>
> >>oh would you look at this....
> >>you're ordering by WORDSIZE.
> >>stick an index on WORDSIZE!!!
> > 
> > 
> > 
> > http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
> > 
> > 
> > In some cases, MySQL cannot use indexes to resolve the ORDER BY,
> 
> note 'In Some Cases'.
> 
> so stick an index on WORDSIZE and find out.
> 
> > although it still uses indexes to find the rows that match the WHERE
> > clause. These cases include the following: 
> > 
> > The key used to fetch the rows is not the same as the one used in the
> > ORDER BY: 
> > 
> > SELECT * FROM t1 WHERE key2=constant ORDER BY key1;
> > 
> > 2) 
> > 
> > 
> >>you have an iceballs' chance in hell that I'm going to even read
> >>the 500+ lines of code that followed here ... let alone try to
> >>optimize it. ;-)
> >>
> > 
> > 
> > it was just a 40 line summary of 550 lines of code ........  
> 
> ah, talk about being caught out :-)
> regardless a 40line summary won't cut it either - you have to take
> the block as a whole.
> 
> > 
> > 
> > 
> > questions i may deem myself to answer: 
> > 
> >>>) i get this:
> >>>root@www mysql]# tail -f /var/lib/mysql/mysqld_query.log  > out
> >>>
> >>>[root@www mysql]# cat out | wc -l
> >>>15910    <<< ----------- that's line count
> >>
> >>how many queries?
> > 
> > 15, 910 queries 
> 
> so roughly (15 * 60) seconds to run 15,000+
> queries and do the processing? that doesn't actually
> sound so bad.
> 
> > 
> > 
> > 
> > things already taken care of:
> > 1) 
> > 
> >>>                   9795 Query       select
> >>>word,def,wordid,pos,posn,wordsize,syn from  korean_english where word
> >>>like '운전할'  order by wordsize desc
> >>
> >>in cases when you are not using the wildcard tokens (percentage signs)
> >>try changing the query to use something like:
> >>
> >>	... word = '운전할' ...
> > 
> > 
> > your suggestion and a line from the query_log match exactly.
> > 
> > 
> > 2) 
> > 
> >><snip>
> >>
> >>>then, it sends each token to CallmatchThis (line 14) which calls
> >>>matchThis (line 27 - 47 below)  
> >>>matchThis may be called twice (2 sql queeries)
> >>>(line 51) select * where word = '$token' and another  (take that, but if
> >>>it's not there .... issue the next sql )
> >>>(line 55) select * where word like '$token%';
> >>
> >>Dont do "SELECT *" - always explicitly specify the fields you want.
> >>
> > 
> > that was just a paraphrase. the previous email points to the line number
> > of the code summary 
> 
> never paraphrase code - you only end up with smart ass comments like mine!
> 
> have you reordered you fields in the db yet? adn made as many VARCHARs as
> possible into CHARs?
> 
> > 
> > 
> >>
> >>
> >>>------------------------------------------+
> >>>| korean_english | CREATE TABLE `korean_english` (
> >>>  `wordid` int(11) NOT NULL auto_increment,
> >>>  `word` varchar(130) default NULL,
> >>>  `syn` varchar(190) default NULL,
> >>>  `def` blob,
> >>>  `posn` int(2) default '1',
> >>>  `pos` varchar(13) default '1',
> >>>  `submitter` varchar(25) default NULL,
> >>>  `doe` datetime NOT NULL default '0000-00-00 00:00:00',
> >>>  `wordsize` tinyint(3) unsigned default NULL,
> >>>  PRIMARY KEY  (`wordid`),
> >>>  KEY `word_idx` (`word`),
> >>>  KEY `wordid_idx` (`wordid`)
> >>>) ENGINE=MyISAM DEFAULT CHARSET=utf8 |
> >>
> >>reorder the fields so that the VARCHARS are at the end of the
> >>table (and the BLOB field at the very, very end of the table).
> >>
> >>also change VARCHARs to CHARs where you can.
> >>
> >>
> >>>+----------------+-----------------------------------------------------------------------------------------------------------------------------------------
> >>>-----------------------------------------------------------------------------------------------------------------------------------------------------------
> >>>---------------------------------------------------------------------
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>     1 function MainLoop()
> > 
> > 
> 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php