I know this is a bit of an old topic, but I feel really strongly about this type of thing, as it's been a source of a *lot* of frustration over the years, and took me a long time to figure out something that works well for me. Some of the stuff below has been mentioned by other people, but I think a few things deserve further extrapolation. I hope you, and others, find the following useful. AmirBehzad Eslami <behzad.eslami@xxxxxxxxx> writes: > Hi, > > i'm going to join a mid-size company with a few PHP-driven projects > written in procedural PHP, million years old. Oooh, I've been in that boat more times than I'd care to admit. > At the moment, they don't have a wiki or any documentation about their > projects. For me, the first challenge in probation period is to understand > how their code works. If there's one thing I've learned in my adventures at PHP shops with old crumbly codebases, it's that the wiki and docs are *wrong* anyhow. You *have* to understand the *code*, and you *have* to ensure that you have an understanding of the code. In ideal circumstances, it's tough to learn how to grok existing codebases. In *this* type of circumstance it can feel damn near impossible. > Considering that there is no wiki or docs, How can I see the Big Picture?* > i'm sure this is a common problem for programmers everywhere. > > What approach do you use in a similar situation? > Is there a systematic approach for this? > Is there reverse-engineering technique to understand the design of code? > > Please share your experience and thoughts. Well, there's a "very big picture", which relates to conceptual paths of action with particular intent through the software, and relates less to code and more to conceptual application design -- you get this by knowing what the software is for, and talking to the people who are creating it and using it, and understanding what they're trying to do. The rest of this reply is going to be focusing on the "Big Picture" as it pertains to understanding the code you're working on, particularly because you mention that you're doing a "refactor". Refactors without certain types of rigor often fail. Hard. Due to lack of understanding of the "Big Picture" of the codebase being worked on, and due to lack of separation of concern in the existing codebase. Subtle lack of separation of concern. Drive-you-insane subtle. I can't stress this enough -- it sounds like you are going down a path that I, and many other much more competent devs, have been down too many times. It's a path that *seems* simple, certainly simpler than the shit I'm about to talk about, but that's because you don't know what you don't know. You need to know that to know the Big Picture. You must know that you don't know what you don't know. You *must* learn to have this as part of your active mindset when dealing with "legacy" deployed codebases. The approach you need to take is different than it is when starting fresh. The "Big Picture", and your understanding of it, is an emergent property of the functionality of the existing codebase and your understanding of the parts that make up that functionality. Let's get to it: First, if they aren't already, get things into version control. A dvcs. Even if it's just your local copy, and you have to do silly things like tell svn to ignore ".git" and tell git to ignore ".svn" dirs because git-svn won't work with this codebase. I prefer git. I prefer git because it focuses on allowing you to manipulate and get information about the DAG that represents the various changes you've made and its nodes. It's invaluable for getting a codebase back to some amorphous time between then and now when things worked but there's maybe been a bunch of stuff that happened and something previously unnoticed is broken. That said, any dvcs will do, but seriously you need to be able to track changes for yourself and yourself only while poking around, and you need to be able to revert them or keep them as needed. Second, xdebug. Get familiar with it. xhprof. get familiar with it. Get familiar with the options they have for generating profiling and execution traces. Turn them on, do something really simple like "load the homepage", and then start looking a bit at what's going on. Any programming environment worth its salt will let you take an execution trace and follow it around in your editor. I use Emacs. I do use geben occasionally, but I don't do much step debugging. Try to understand at least a bit of the surrounding code at each point in the trace. Don't bother trying to understand all of it at once, there's too much (I've worked on PHP codebases that could produce multiple-gig execution traces on a page load when the verbosity of the trace was high. I hope you and everyone else never have to work on them). Doing this gives you a little bit of a feel for typical paths of execution, and exposes you to a fair amount of existing code. Now for the absolute most important thing to do here: Do *not* refactor untested code unless absolutely necessary. When you think it's absolutely necessary, wait until getting it under test has stumped more than one person, preferably more than one person currently working on the code, ideally one who has more experience than you do in some way. I'm not trying to be insulting or demeaning here, it's seriously that important, and I follow my own advice. Read the book "Working Effectively with Legacy Code" by Michael Feathers. Seriously. Read this book. This book saved more than one refactor project that I've worked on. There's quite a bit in it that isn't relevant to PHP for one reason or another, but it's written in a modular-enough fashion that you can skip it. Even if you're not familiar with the languages used in examples, you shouldn't have a problem understanding the examples. Read it. If I was going to make this post way longer than it already is by talking about the specifics -- the fine-grained, useful-on-the-code-level specifics, the specifics that have actual code listings from actual programs and show what was done to refactor them and why -- it would be "Working Effectively with Legacy Code". Seriously, figure out what you need to do to get shit under test before changing it, test your changes, and be prepared to delete lots of tests as well. Since isolation is difficult, you might have to mimic other parts of the system in your test code where that setup code later becomes redundant unnecessary cruft because of other changes that have been made. That can be a huge pain, but the combination of execution traces and thinking carefully make it much easier. It's certainly less pain than the inevitable flood of functionality changes and breakages you and your co-coders didn't know you made and aren't sure when you made them. Focus on changing as little as possible -- maintain the "API" of the current code as long as you can, even if it means you need to stick "class_alias" or similar at the bottom of a new source file because there are huge swaths of existing code that don't use namespaces or you just implemented something that did the job of 10 existing classes. Or if it means taking the static function that a shitton of code is calling that is modifying global state in a bad bad way and making it dispatch to an instance method of a replacement class. The other code will be changed when your understanding of the "Big Picture" is complete enough for it to be an appropriate change, and you'll know when that time comes. PHPUnit and php-code-coverage are your friends. There are other code-analysis tools available as well like PHP_CodeSniffer that are worth getting to know. This process is slow at first, but then gets faster. As long as you have tests (and well-done tests at that...) in place, you can refactor and rewrite whatever and know what broke where, if anything. And know if things ... well ... work. This does bring up a bit of an issue -- what do you test? I've found, particularly with this type of "existing freaking old ball of mud of a codebase" type of refactor-to-test thing, that you should focus on testing stuff that is part of a class or module's "public api", if that makes sense. It's not a hard rule of thumb. And of course, in all things, be pragmatic. The goal is to improve this software. This software does something. The only thing documenting it is the code. You want to produce tests that serve as executable documentation of the intent of the code, and you want to be able to undo things you did, potentially sweeping huge changes of things, at a pretty fine-grained level if it turns out that they were a bad idea. You want to preserve existing functionality as far as the user is concerned -- it's what the software is for. Refactoring to tests helps start to create "executable docs", and helps *immensely* in preserving existing functionality. A DVCS lets you play with untested larger changes and exploratory coding with the assurance that you can go back. On that note, this is long enough. I hope it was time well spent for the reader. > -Thanks in advance, > Behzad Best of luck, -- Jeremiah Dodds blog : http://jdodds.github.com github : https://github.com/jdodds freenode/skype : exhortatory twitter : kaens -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php