Eddie Drapkin wrote: > Hey all, > we've got a repository here at work, with something like 55,000 files > in it. For the last few years, we've been naming $variables_like_this > and functions_the_same($way_too). And now we've decided to switch to > camelCasing everything and I've been tasked with somehow determining > if it's possible to automate this process. Usually, I'd just use the > IDE refactoring functionality, but doing it on a > per-method/per-function and a per-variable basis would take weeks, if > not longer, not to mention driving everyone insane. > > I've tried with regular expressions, but I can't make them smart > enough to distinguish between builtins and userland code. I've looked > at the tokenizer and it seems to be the right way forward, but that's > also a huge project to get that to work. > > I was wondering if anyone had had any experience doing this and could > either point me in the right direction or just down and out tell me > how to do it. Hi Eddie, That's quite the task :). You're going to need to scan the source to generate a list of every variable and function name using the tokenizer. Fortunately, this is easy - with the caveat that if you do this anywhere in your source: $a = $this->{$constructed . '_name'}(); you will have to handle these manually. Basically, run token_get_all() on the source, scanning for T_VARIABLE, and record every T_VARIABLE in an array. Then, scan for: 1) T_FUNCTION T_WHITESPACE* T_STRING 2) T_OBJECT_OPERATOR T_WHITESPACE* T_STRING <?php $replace = array(); foreach (new RegexIterator(new RecursiveIteratorIterator(new RecursiveDirectoryIterator('/path/to/src')), '/\.php$/', RegexIterator::MATCH, RegexIterator::USE_KEY) as $path => $file) { $source = file_get_contents($path); $checkForID = false; $var = false; $last = ''; foreach (token_get_all($source) as $token) { if (!is_array($token)) continue; if ($checkForID) { if ($token[0] == T_WHITESPACE) { $last .= $token[1]; continue; } if ($token[0] != T_STRING) { $checkForID = false; $last = ''; continue; } $token[1] = $last . $token[1]; } elseif ($token[0] == T_FUNCTION || $token[0] == T_OBJECT_OPERATOR) { $checkForID = true; $last = $token[1]; continue; } elseif ($token[0] == T_STRING) { if (function_exists($token[1])) { continue; // skip internal functions } if (strtolower($token[1]) != $token[1]) { continue; // assuming you UPPER-CASE constants, this skips them } } elseif ($token[0] != T_VARIABLE) { continue; } // we get to here if we've found one to process $new = explode('_', $token[1]); $new = array_map('ucfirst', $new); $new[0] = lcfirst($new); // for your camelCasing $new = implode('', $new); $replace[] = array($token[1], $new); ?> Next, load each file (you should use RecursiveIteratorIterator with a RecursiveDirectoryIterator and some kind of filter, probably RegexIterator, to grab the PHP source files), and then iterate over the list of variable names somewhat like this: <?php foreach (new RegexIterator(new RecursiveIteratorIterator(new RecursiveDirectoryIterator('/path/to/src')), '/\.php$/', RegexIterator::MATCH, RegexIterator::USE_KEY) as $path => $file) { $source = file_get_contents($path); foreach ($replace as $items) { $source = str_replace($items[0], $items[1], $source); if ($items[0][0] == '$') { $source = preg_replace('/->(\s*)' . substr($variable, 1) . '/', '->\\1'substr($new, 1), $source); } } file_put_contents($path, $source); } ?> Voila, code refactored. I trust you know this, but don't run that example code without testing it on a limited sandbox and comparing the results first :). I did not test anything except the regexiterator part to make sure that it actually grabbed PHP files, the rest is based on my experience tokenizing for parsing PHP when writing tools like phpDocumentor. If I made any mistakes, it would be good for you to post your final scripts for posterity back on here. Greg -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php