Code for search engine in PHP (in case someone needs it). The files in this email are: * create_index_directories.php * create_index_or_add_to_existing_index.php * search_index.php * ReadMe.txt Brief description: This search engine has a new architecture compared to other search engines. I invented and implemented this new search engine architecture. This search engine has been developed mainly for English Alphabet. This search engine is based on the fact that no letter in English Alphabet has more than 30,000 words starting with it. This search engine works on text/html files only. This search engine was mainly developed so that it could be used on websites. So, now websites can integrate this search engine on their platform so that a user can search anything on their website. The website can index all their pages through this search engine and also give a search box to the user. The websites now do not have to rely on third party search engines. All code in this email has been released under APACHE LICENSE, VERSION 2.0. The license details can be found here: https://www.apache.org/licenses/LICENSE-2.0 --------------------------------------- create_index_directories.php --------------------------------------- <?php /* This program creates index directories for storing index files. * Required argument: Path to directory where the top level index * directory and its subdirectories will be created. The top level index * directory will be named index_directory. */ $num_directories_given = 0; $index_dir = ""; function print_usage() { echo ("Usage:\n\n" . " Syntax:\n\n" . " create_index_directories [OPTIONS] [dir_path]\n\n" . " Description:\n\n" . " create_index_directories creates index directories for storing index files.\n" . " \"dir_path\" is the path to directory where the top level index directory\n" . " and sub directories will be created. The top level index directory will\n" . " be named index_directory.\n\n" . " Options:\n\n" . " --help\n" . " Print this usage/help and exit.\n"); } // end of print_usage for ($i = 1; $i < $argc; $i++) { //echo "Option " . $i . ": " . $argv[$i] . "\n"; $arg = $argv[$i]; if ($arg[0] === '-') { if ($arg === "--help") { print_usage(); exit(0); } else { echo "create_index_directories: Unknown option: " . $arg . "\n"; echo "Try create_index_directories --help to see the help.\n"; exit(1); } } else { $index_dir = $arg; $num_directories_given++; } } // end of for loop if ($num_directories_given == 0) { echo "create_index_directories: One directory argument is required.\n"; echo "Try create_index_directories --help to see the help.\n"; exit(1); } else if ($num_directories_given > 1) { echo "create_index_directories: \"Only one directory\" argument is required.\n"; echo "Try create_index_directories --help to see the help.\n"; exit(1); } if (is_dir($index_dir) != TRUE) { echo "create_index_directories: \"" . $index_dir . "\" is not a directory.\n"; echo "Try create_index_directories --help to see the help.\n"; exit(1); } $create_dir = $index_dir . "/index_directory"; if (file_exists($create_dir) != TRUE) { if (mkdir($create_dir) != TRUE) { echo "create_index_directories: Failed to create directory \"" . $create_dir . "\". Exiting...\n"; exit(1); } else { echo "Created directory " . $create_dir . "\n"; } } else { echo $create_dir . " already exists.\n"; } for ($i = 0; $i < 10; $i++) { $sub_dir = $create_dir . "/" . $i; if (file_exists($sub_dir) != TRUE) { if (mkdir($sub_dir) != TRUE) { echo "create_index_directories: Failed to create directory \"" . $sub_dir . "\". Exiting...\n"; exit(1); } else { echo "Created directory " . $sub_dir . "\n"; } } else { echo $sub_dir . " already exists.\n"; } } foreach (range('a', 'z') as $letter) { $sub_dir = $create_dir . "/" . $letter; if (file_exists($sub_dir) != TRUE) { if (mkdir($sub_dir) != TRUE) { echo "create_index_directories: Failed to create directory \"" . $sub_dir . "\". Exiting...\n"; exit(1); } else { echo "Created directory " . $sub_dir . "\n"; } } else { echo $sub_dir . " already exists\n"; } } ?> ----------------------------------------------------------- create_index_or_add_to_existing_index.php ----------------------------------------------------------- <?php /* This program takes files/directories as arguments and parses the * files (present in directories or given on command line) to create the * search index files. The directories are processed recursively if -r option * is given. This program also requires the path to directory where * a directory called index_directory exists. This index_directory * contains 36 folders named 0, 1, 2, .., 9 and a, b, c, .., y, z. * Index files are created in subdirectories of index_directory. * This program works on text/html files only. You can use program * create_index_directories.php to create index_directory and its subdirectories. */ // error handler function function custom_error_handler($errno, $errstr, $errfile, $errline) { //echo "Got error/notice/warning, etc. Exiting..\n"; echo "Got error/notice/warning, etc.\n"; echo $errno. "\n"; echo $errtsr . "\n"; echo $errfile . "\n"; echo $errline . "\n"; //echo "Exit status is 1.\n"; //exit(1); } // end of custom_error_handler // set to the user defined error handler $old_error_handler = set_error_handler("custom_error_handler"); function print_usage() { echo ("Usage:\n\n" . " Syntax:\n\n" . " create_index_or_add_to_existing_index OPTION[S] [FILE...] [DIR...]\n\n" . " Description:\n\n" . " create_index_or_add_to_existing_index parses a file and creates search index files\n" . " or adds to already existing index files. It works on text/html files only.\n" . " The file can be given as an argument or it may be present in a directory\n" . " which itself has been given as an argument. This program also requires\n" . " the path to directory where a directory called index_directory\n" . " and its subdirectories (0-9, a-z) exist. You can use\n" . " program create_index_directories.php to create index_directory\n" . " and its subdirectories. The paths to file/dir to be indexed should be\n" . " relative to server_root_directory_path (to be given by specifying -s option).\n\n" . " Options:\n\n" . " -i path_to_index_directory (MANDATORY option)\n" . " Use -i option to specify the path to directory where directory\n" . " called index_directory and its subdirectories (0-9, a-z) exist.\n" . " Index files are created in subdirectories of index_directory.\n\n" . " -r\n" . " Specify -r option to process directory/directories recursively.\n\n" . " -p prefix_path\n" . " Please give a prefix to add before the file path that will be written to\n" . " index files. It could be something like https://mywebsite.com. If the\n" . " file path abcd/tyr.html is going to be written to index file then it\n" . " will actually write https://mywebsite.com/abcd/tyr.html in the index\n" . " file if -p option is present.\n\n" . " -s server_root_directory_path (MANDATORY option)\n" . " The \"absolute\" path to server root directory (from where index.html or index.php will be served).\n" . " The paths to file/dir to be indexed should be relative to server_root_directory_path.\n\n" . " --help\n". " Print this usage/help and exit.\n\n" . " So, basically the file to be indexed is found by combining server_root_directory_path\n" . " and path to files/directories given on command line while the file contents\n" . " to be written is formed by combining prefix and path to files/directories given\n" . " on command line.\n"); } // end of print_usage $iOptionPresent = FALSE; $rOptionPresent = FALSE; $pOptionPresent = FALSE; $sOptionPresent = FALSE; $index_dir_parent = ""; $index_dir = ""; $prefix = ""; $server_root_path = ""; $file_dir_array = array(); $num_files_processed = 0; for ($i = 1; $i < $argc; $i++) { echo "debug: Argument/Option " . $i . ": " . $argv[$i] . "\n"; $arg = $argv[$i]; if ($arg[0] === '-') { if ($arg === "--help") { print_usage(); exit(0); } else if ($arg === "-r") { $rOptionPresent = TRUE; } else if ($arg === "-i") { $iOptionPresent = TRUE; if (($i+1) < $argc) { $index_dir_parent = $argv[$i+1]; $index_dir = $index_dir_parent . "/" . "index_directory"; $i++; continue; } } else if ($arg === "-p") { $pOptionPresent = TRUE; if (($i+1) < $argc) { $prefix = $argv[$i+1]; if ((substr($prefix, -1, 1) != "/") && (substr($prefix, -1, 1) != "\\")) { $prefix = $prefix . "/"; } $i++; continue; } } else if ($arg === "-s") { $sOptionPresent = TRUE; if (($i+1) < $argc) { $server_root_path = $argv[$i+1]; if ((substr($server_root_path, -1, 1) != "/") && (substr($server_root_path, -1, 1) != "\\")) { $server_root_path = $server_root_path . "/"; } $i++; continue; } } else { echo "create_index_or_add_to_existing_index: Unknown option: " . $arg . "\n"; echo "Try create_index_or_add_to_existing_index --help to see the help.\n"; exit(1); } } else { array_push($file_dir_array, $arg); } } // end of for loop // debug info echo "\nDEBUG_INFO_START:\n\n"; if ($rOptionPresent === TRUE) { echo "-r option is present.\n"; } else { echo "-r option is NOT present.\n"; } if ($iOptionPresent === TRUE) { echo "-i option is present.\n"; echo "index_dir_parent = " . $index_dir_parent . "\n"; } else { echo "-i option is NOT present.\n"; } if ($pOptionPresent === TRUE) { echo "-p option is present.\n"; echo "prefix = " . $prefix . "\n"; } else { echo "-p option is NOT present.\n"; } if ($sOptionPresent === TRUE) { echo "-s option is present.\n"; echo "server_root_path = " . $server_root_path . "\n"; } else { echo "-s option is NOT present.\n"; } $num_entries = count($file_dir_array); echo "Entries in file_dir_array are:\n"; for ($i = 0; $i < $num_entries; $i++){ echo $file_dir_array[$i] . "\n"; } echo "\nDEBUG_INFO_END\n\n"; // end debug info if ($index_dir_parent == "") { echo "create_index_or_add_to_existing_index: Please give the path to directory where index_directory exist.\n"; echo "Try create_index_or_add_to_existing_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if ($server_root_path == "") { echo "create_index_or_add_to_existing_index: Please give the path to server root directory.\n"; echo "Try create_index_or_add_to_existing_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (file_exists($index_dir_parent) != TRUE) { echo "create_index_or_add_to_existing_index: \"" . $index_dir_parent . "\" does not exist.\n"; echo "Please give a valid path to directory where index_directory exist.\n"; echo "Try create_index_or_add_to_existing_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (is_dir($index_dir_parent) != TRUE) { echo "create_index_or_add_to_existing_index: \"" . $index_dir_parent . "\" is not a directory.\n"; echo "Please give a valid path to directory where index_directory exist.\n"; echo "Try create_index_or_add_to_existing_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (file_exists($index_dir) != TRUE) { echo "create_index_or_add_to_existing_index: \"index_directory\" does not exist in \"" . $index_dir_parent . "\".\n"; echo "Please give a valid path to directory where index_directory exist.\n"; echo "Try create_index_or_add_to_existing_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (is_dir($index_dir) != TRUE) { echo "create_index_or_add_to_existing_index: index_directory \"" . $index_dir . "\" is not a directory.\n"; echo "Please give a valid path to directory where index_directory exist.\n"; echo "Try create_index_or_add_to_existing_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (count($file_dir_array) < 1) { echo "create_index_or_add_to_existing_index: No files/directories given for indexing.\n"; echo "Try create_index_or_add_to_existing_index --help to see the help.\n"; exit(0); } // Check if all index directories exist echo "create_index_or_add_to_existing_index: checking whether all index directories exist..\n"; for ($i = 0; $i < 10; $i++) { $sub_dir = $index_dir . "/" . $i; if (file_exists($sub_dir) != TRUE) { echo $sub_dir . " does not exist.\n"; echo "Exiting..\n"; exit(1); } if (is_dir($sub_dir) != TRUE) { echo $sub_dir . " is not a directory.\n"; echo "Exiting..\n"; exit(1); } } // end of for loop foreach (range('a', 'z') as $letter) { $sub_dir = $index_dir . "/" . $letter; if (file_exists($sub_dir) != TRUE) { echo $sub_dir . " does not exist.\n"; echo "Exiting..\n"; exit(1); } if (is_dir($sub_dir) != TRUE) { echo $sub_dir . " is not a directory.\n"; echo "Exiting..\n"; exit(1); } } // end of foreach loop echo "All index directories exist.\n\n"; echo "\n\n**** Starting Indexing.. ****\n\n"; $num_entries = count($file_dir_array); for ($i = 0; $i < $num_entries; $i++) { $file_rl_path = $file_dir_array[$i]; $file = $server_root_path . $file_rl_path; if (file_exists($file) != TRUE) { echo "\"" . $file . "\" does not exist.\n"; } else if (is_file($file) == TRUE) { process_file($file, $file_rl_path); } else if (is_dir($file) == TRUE) { process_dir($file); } else { echo "\"" . $file . "\": No such file or directory.\n"; } } // end of for loop function process_dir($dir) { //echo $dir . "\n"; $files = scandir($dir); if ($files == FALSE) { return; } $num = count($files); for ($i = 0; $i < $num; $i++) { if (($files[$i] === ".") || ($files[$i] === "..")) { continue; } $file_entry = $dir . "/" . $files[$i]; if (file_exists($file_entry) != TRUE) { echo "\"" . $file_entry . "\" does not exist.\n"; } else if (is_file($file_entry) == TRUE) { $empty_string = ""; $root_path = $GLOBALS['server_root_path']; $file_rl_path = str_replace($root_path, $empty_string, $file_entry); //echo "Old file_rl_path = " . $file_entry . ", New file_rl_path = " . $file_rl_path . "\n"; process_file($file_entry, $file_rl_path); } else if (is_dir($file_entry) == TRUE) { if ($GLOBALS['rOptionPresent'] === TRUE) { process_dir($file_entry); } else { //echo $file_entry . "\n"; // remove this later // TODO } } else { echo "\"" . $file_entry . "\": No such file or directory.\n"; } } // end of for loop } // end of process_dir function process_file($file, $file_rl_path) { //echo $file . "\n"; $handle = fopen($file, "r"); if ($handle == FALSE) { echo "Error: Failed to open file \"" . $file . "\"\n"; return; } echo "\n\nIndexing file \"" . $file . "\"\n"; // read file $line_num = 0; while (($line = fgets($handle)) != FALSE) { /* //echo $line; $line_num++; $len = strlen($line); echo "line number " . $line_num . " length = " . $len . "\n"; */ $pattern = "([0-9A-Za-z][0-9A-Za-z][0-9A-Za-z][0-9A-Za-z]*)"; preg_match_all($pattern, $line, $matches, PREG_SET_ORDER); $match_count = count($matches); for ($j = 0; $j < $match_count; $j++) { $word = $matches[$j][0]; //echo $word . "\n"; $word_l = strtolower($word); //echo $word_l . "\n"; process_word_l($word_l, $file, $file_rl_path); } } if (!feof($handle)) { echo "Error: unexpected fgets() fail when reading file \"" . $file . "\"\n"; } fclose($handle); echo "Indexing file \"" . $file . "\" completed.\n"; $GLOBALS['num_files_processed'] = $GLOBALS['num_files_processed'] + 1; echo "Total files indexed = " . $GLOBALS['num_files_processed'] . "\n"; } // end of process_file function process_word_l($word_l, $file, $file_rl_path) { $letter = substr($word_l, 0 , 1); $dir_to_check = $GLOBALS['index_dir'] . "/" . $letter; $file_to_check = $dir_to_check . "/" . $word_l; $content_without_newline = $GLOBALS['prefix'] . $file_rl_path; $content = $content_without_newline . "\n"; //create file if file does not exist if (file_exists($file_to_check) != TRUE) { //echo "\"" . $file_to_check . "\" does not exist. Creating it..\n"; if (file_put_contents($file_to_check, $content) == FALSE) { echo "Error: file_put_contents failed for file \"" . $file_to_check . "\"\n"; } return; } //echo "debug: file_to_check = " . $file_to_check . "\n"; //echo "debug: file_to_check = " . $file_to_check . "\n"; //echo "debug: file_to_check = " . $file_to_check . "\n"; //echo "debug: file_to_check = " . $file_to_check . "\n"; $handle = fopen($file_to_check, "r+"); if ($handle == FALSE) { echo "Error: Failed to open file \"" . $file_to_check . "\"\n"; return; } // check if entry exists and if not then append at the end while (($line = fgets($handle)) != FALSE) { if ($line === $content) { //echo "Entry \"" . $content_without_newline . "\" already exists in file \"" . $file_to_check ."\"\n"; return; } } if (!feof($handle)) { echo "Error: unexpected fgets() fail when reading file \"" . $file_to_check . "\"\n"; } fwrite($handle, $content); fclose($handle); } // end of process_word_l echo "\n\n**** Indexing complete.**** \n\n"; ?> ------------------------ search_index.php ------------------------ <?php /* This program searches for search words in index files. This program * requires the path to directory where a directory called index_directory exists. * This index_directory contains 36 subdirectories named 0, 1, 2, .., 9 and a, b, c, .., y, z. * The index files are present in these subdirectories. */ function print_usage() { echo ("Usage:\n\n" . " Syntax:\n\n" . " search_index OPTION[S] [search_word[s]...]\n\n" . " Description:\n\n" . " search_index searches for search_word[s] in index files. One or more\n" . " search words can be specified. This program requires the path to directory\n" . " where a directory called index_directory and its subdirectories (0-9, a-z)\n" . " exist. The index files are present in these subdirectories.\n\n" . " Options:\n\n" . " -i path_to_index_directory (MANDATORY option)\n" . " Use -i option to specify the path to directory where directory\n" . " called index_directory exist.\n\n" . " --help\n". " Print this usage/help and exit.\n"); } // end of print_usage $iOptionPresent = FALSE; $index_dir_parent = ""; $index_dir = ""; $search_keyword_array = array(); $search_results_array = array(); for ($i = 1; $i < $argc; $i++) { echo "debug: Argument/Option " . $i . ": " . $argv[$i] . "\n"; $arg = $argv[$i]; if ($arg[0] === '-') { if ($arg === "--help") { print_usage(); exit(0); } else if ($arg === "-i") { $iOptionPresent = TRUE; if (($i+1) < $argc) { $index_dir_parent = $argv[$i+1]; $index_dir = $index_dir_parent . "/" . "index_directory"; $i++; continue; } } else { echo "search_index: Unknown option: " . $arg . "\n"; echo "Try search_index --help to see the help.\n"; exit(1); } } else { array_push($search_keyword_array, $arg); } } // end of for loop // debug info echo "\nDEBUG_INFO_START:\n\n"; if ($iOptionPresent === TRUE) { echo "-i option is present.\n"; echo "index_dir_parent = " . $index_dir_parent . "\n"; } else { echo "-i option is NOT present.\n"; } $num_entries = count($search_keyword_array); echo "Entries in search_keyword_array are:\n"; for ($i = 0; $i < $num_entries; $i++){ echo $search_keyword_array[$i] . "\n"; } echo "\nDEBUG_INFO_END\n\n"; // end debug info if ($index_dir_parent == "") { echo "search_index: Please give the path to directory where index_directory exist.\n"; echo "Try search_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (file_exists($index_dir_parent) != TRUE) { echo "search_index: \"" . $index_dir_parent . "\" does not exist.\n"; echo "Please give a valid path to directory where index_directory exist.\n"; echo "Try search_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (is_dir($index_dir_parent) != TRUE) { echo "search_index: \"" . $index_dir_parent . "\" is not a directory.\n"; echo "Please give a valid path to directory where index_directory exist.\n"; echo "Try search_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (file_exists($index_dir) != TRUE) { echo "search_index: \"index_directory\" does not exist in \"" . $index_dir_parent . "\".\n"; echo "Please give a valid path to directory where index_directory exist.\n"; echo "Try search_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (is_dir($index_dir) != TRUE) { echo "search_index: index_directory \"" . $index_dir . "\" is not a directory.\n"; echo "Please give a valid path to directory where index_directory exist.\n"; echo "Try search_index --help to see the help.\n"; echo "Exiting..\n"; exit(1); } if (count($search_keyword_array) < 1) { echo "search_index: No search word given for searching.\n"; echo "Try search_index --help to see the help.\n"; exit(0); } $num_entries = count($search_keyword_array); for ($i = 0; $i < $num_entries; $i++) { $word = $search_keyword_array[$i]; $word_l = strtolower($word); $letter = substr($word_l, 0 , 1); $dir_to_check = $GLOBALS['index_dir'] . "/" . $letter; $file_to_check = $dir_to_check . "/" . $word_l; if (file_exists($file_to_check) != TRUE) { continue; } if (is_file($file_to_check) != TRUE) { continue; } $handle = fopen($file_to_check, "r"); if ($handle == FALSE) { //echo "Error: Failed to open file \"" . $file_to_check . "\"\n"; continue; } while (($line = fgets($handle)) != FALSE) { // remove newline from line $line = str_replace(array("\n", "\r"), '', $line); //$old_value = $search_results_array[$line]; //if (($old_value == NULL) || ($old_value == FALSE)) { // $old_value = 0; //} if (array_key_exists($line, $search_results_array) == FALSE) { $search_results_array[$line] = 1; } else { $search_results_array[$line]++; } } if (!feof($handle)) { echo "Error: unexpected fgets() fail when reading file \"" . $file_to_check . "\"\n"; } fclose($handle); } // end of for loop // dump search_results_array after sorting arsort($search_results_array); //var_dump($search_results_array); $keys = array_keys($search_results_array); $num_entries = count($keys); for ($i = 0; $i < $num_entries; $i++) { echo $keys[$i]. "\n"; } // end of for loop ?> ---------------- ReadMe.txt ---------------- Architecture of this Search Engine ---------------------------------------------- This search engine has a new architecture compared to other search engines. I invented and implemented this new search engine architecture. This search engine has been developed mainly for English Alphabet. This search engine is based on the fact that no letter in English Alphabet has more than 30,000 words starting with it. This search engine works on text/html files only. This search engine was mainly developed so that it could be used on websites. So, now websites can integrate this search engine on their platform so that a user can search anything on their website. The website can index all their pages through this search engine and also give a search box to the user. The websites now do not have to rely on third party search engines. The structure of the search index is that there is a top level directory called index_directory. This directory has 36 folders. The folders are named: 0, 1, 2, .., 8, 9 and a, b, c, .., y, z. Every word has an index file name with the same name in the directory which starts with the same letter as the word. So, since no letter has more than 30,000 words starting with it, there will be at max only 30,000 files in that directory. These days modern OSes can handle many more files in one directory. For example, if the word is "server", then there will be a file in "index_directory/s" folder called "server". This file will contain the path of all documents that contain the word "server". So, the contents of the file server can be: https://www.myexample.com/abcd.html https://www.myexample.com/1234.html https://www.myexample.com/hello.html These three html documents contain the word "server". Now, if someone wants to search for the word "server" then the contents of this file will be printed on the output page/screen which means that these 3 documents contain the word "server". Now, let's suppose there is another word called "hello". So, there will be a file in "index_directory/h" called hello and this will contain the path of all documents that contain the word "hello". Let's suppose that the index file "hello" has following contents: https://www.myexample.com/xyz.html https://www.myexample.com/new.html https://www.myexample.com/hello.html Now, if someone search for both keywords "server" and "hello", the output will be: https://www.myexample.com/hello.html https://www.myexample.com/abcd.html https://www.myexample.com/1234.html https://www.myexample.com/xyz.html https://www.myexample.com/new.html So, you see that "https://www.myexample.com/hello.html" is the first URL to be printed because it contains both "server" and "hello" words. So, the document which contains most number of search words will be printed first and then documents which contain less number of search words. So, basically the printing is sorted in descending order according to the number of search words present in the document. There are three programs developed in PHP in this Search Engine. So, it will run on all platforms that have PHP installed. The three programs are: * create_index_directories.php * create_index_or_add_to_existing_index.php * search_index.php * create_index_directories.php: This program creates index directories for storing index files. Required argument: Path to directory where the top level index directory and its subdirectories will be created. The top level index directory will be named index_directory. Usage: Syntax: create_index_directories [OPTIONS] [dir_path] Description: create_index_directories creates index directories for storing index files. "dir_path" is the path to directory where the top level index directory and sub directories will be created. The top level index directory will be named index_directory. Options: --help Print this usage/help and exit * create_index_or_add_to_existing_index.php: This program takes files/directories as arguments and parses the files (present in directories or given on command line) to create the search index files or add to already existing index files. The directories are processed recursively if -r option is given. This program also requires the path to directory where a directory called index_directory exists. This index_directory contains 36 folders named 0, 1, 2, .., 9 and a, b, c, .., y, z. Index files are created in subdirectories of index_directory. This program works on text/html files only. You can use program create_index_directories.php to create index_directory and its subdirectories. Usage: Syntax: create_index_or_add_to_existing_index OPTION[S] [FILE...] [DIR...] Description: create_index_or_add_to_existing_index parses a file and creates search index files or adds to already existing index files. It works on text/html files only. The file can be given as an argument or it may be present in a directory which itself has been given as an argument. This program also requires the path to directory where a directory called index_directory and its subdirectories (0-9, a-z) exist. You can use program create_index_directories.php to create index_directory and its subdirectories. The paths to file/dir to be indexed should be relative to server_root_directory_path (to be given by specifying -s option). Options: -i path_to_index_directory (MANDATORY option) Use -i option to specify the path to directory where directory called index_directory and its subdirectories (0-9, a-z) exist. Index files are created in subdirectories of index_directory. -r Specify -r option to process directory/directories recursively. -p prefix_path Please give a prefix to add before the file path that will be written to index files. It could be something like https://mywebsite.com. If the file path abcd/tyr.html is going to be written to index file then it will actually write https://mywebsite.com/abcd/tyr.html in the index\ file if -p option is present. -s server_root_directory_path (MANDATORY option) The \"absolute\" path to server root directory (from where index.html or index.php will be served). The paths to file/dir to be indexed should be relative to server_root_directory_path. --help Print this usage/help and exit. So, basically the file to be indexed is found by combining server_root_directory_path and path to files/directories given on command line while the file contents to be written is formed by combining prefix and path to files/directories given on command line. * search_index.php: This program searches for search words in index files. This program requires the path to directory where a directory called index_directory exists. This index_directory contains 36 subdirectories named 0, 1, 2, .., 9 and a, b, c, .., y, z. The index files are present in these subdirectories. Usage: Syntax: search_index OPTION[S] [search_word[s]...] Description: search_index searches for search_word[s] in index files. One or more search words can be specified. This program requires the path to directory where a directory called index_directory and its subdirectories (0-9, a-z) exist. The index files are present in these subdirectories. Options: -i path_to_index_directory (MANDATORY option) Use -i option to specify the path to directory where directory called index_directory exist. --help Print this usage/help and exit. Example ----------- There are three programs developed in PHP in this Search Engine. So, it will run on all platforms that have PHP installed. I have used xampp/PHP on Windows to develop this search engine so I will give an example on how to use it on Windows. Step 1: --------- Let's suppose that you have installed xampp in C:\ on Windows. So, your server root directory will be C:\xampp\htdocs. Step 2: --------- Let's suppose that you have copied all search engine files in C:\search_engine. Step 3: --------- Now, let's create index_directory and its subdirectories in your server root directory, which is C:\xampp\htdocs. The command and output is given below: C:\search_engine>php create_index_directories.php C:\xampp\htdocs Created directory C:\xampp\htdocs/index_directory Created directory C:\xampp\htdocs/index_directory/0 Created directory C:\xampp\htdocs/index_directory/1 Created directory C:\xampp\htdocs/index_directory/2 Created directory C:\xampp\htdocs/index_directory/3 Created directory C:\xampp\htdocs/index_directory/4 Created directory C:\xampp\htdocs/index_directory/5 Created directory C:\xampp\htdocs/index_directory/6 Created directory C:\xampp\htdocs/index_directory/7 Created directory C:\xampp\htdocs/index_directory/8 Created directory C:\xampp\htdocs/index_directory/9 Created directory C:\xampp\htdocs/index_directory/a Created directory C:\xampp\htdocs/index_directory/b Created directory C:\xampp\htdocs/index_directory/c Created directory C:\xampp\htdocs/index_directory/d Created directory C:\xampp\htdocs/index_directory/e Created directory C:\xampp\htdocs/index_directory/f Created directory C:\xampp\htdocs/index_directory/g Created directory C:\xampp\htdocs/index_directory/h Created directory C:\xampp\htdocs/index_directory/i Created directory C:\xampp\htdocs/index_directory/j Created directory C:\xampp\htdocs/index_directory/k Created directory C:\xampp\htdocs/index_directory/l Created directory C:\xampp\htdocs/index_directory/m Created directory C:\xampp\htdocs/index_directory/n Created directory C:\xampp\htdocs/index_directory/o Created directory C:\xampp\htdocs/index_directory/p Created directory C:\xampp\htdocs/index_directory/q Created directory C:\xampp\htdocs/index_directory/r Created directory C:\xampp\htdocs/index_directory/s Created directory C:\xampp\htdocs/index_directory/t Created directory C:\xampp\htdocs/index_directory/u Created directory C:\xampp\htdocs/index_directory/v Created directory C:\xampp\htdocs/index_directory/w Created directory C:\xampp\htdocs/index_directory/x Created directory C:\xampp\htdocs/index_directory/y Created directory C:\xampp\htdocs/index_directory/z Step 4: --------- Now, let's suppose that all files to be indexed are in the directory files_to_be_indexed in your server root directory (C:\xampp\htdocs\files_to_be_indexed). We can give files also on command line but in this example I am giving a directory. Now, the command to create index from the files in files_to_be_indexed is given below: C:\search_engine>php create_index_or_add_to_existing_index.php -r -i C:\xampp\htdocs -p http://localhost -s C:\xampp\htdocs files_to_be_indexed Step 5: --------- Now, let's search for four words "server hello stop start". The command and output is given below: C:\search_engine>php search_index.php -i C:\xampp\htdocs server hello stop start http://localhost/files_to_be_indexed/2/catalina_service.txt http://localhost/files_to_be_indexed/3/ctlscript.html http://localhost/files_to_be_indexed/readme_de.txt http://localhost/files_to_be_indexed/readme_en.html http://localhost/files_to_be_indexed/3/4/5/filezilla_start.html http://localhost/files_to_be_indexed/3/4/5/filezilla_stop.html http://localhost/files_to_be_indexed/3/4/mercury_start.html http://localhost/files_to_be_indexed/3/catalina_stop.txt http://localhost/files_to_be_indexed/2/apache_stop.txt http://localhost/files_to_be_indexed/3/4/5/mysql_stop.html http://localhost/files_to_be_indexed/3/4/mysql_start.html ---- End of code and ReadMe ----