Re: Joining a team, where no wiki or docs are available

Jeremiah Dodds <jeremiah.dodds@xxxxxxxxx> · Sat, 29 Sep 2012 03:26:44 -0400

I know this is a bit of an old topic, but I feel really strongly
about this type of thing, as it's been a source of a *lot* of
frustration over the years, and took me a long time to figure out
something that works well for me. Some of the stuff below has been
mentioned by other people, but I think a few things deserve further
extrapolation. 

I hope you, and others, find the following useful.

AmirBehzad Eslami <behzad.eslami@xxxxxxxxx> writes:

> Hi,
>
> i'm going to join a mid-size company with a few PHP-driven projects
> written in procedural PHP, million years old.

Oooh, I've been in that boat more times than I'd care to admit.

> At the moment, they don't have a wiki or any documentation about their
> projects. For me, the first challenge in probation period is to understand
> how their code works.

If there's one thing I've learned in my adventures at PHP shops with old
crumbly codebases, it's that the wiki and docs are *wrong* anyhow. You
*have* to understand the *code*, and you *have* to ensure that you have
an understanding of the code. 

In ideal circumstances, it's tough to learn how to grok existing
codebases. In *this* type of circumstance it can feel damn near
impossible.

> Considering that there is no wiki or docs, How can I see the Big Picture?*
> i'm sure this is a common problem for programmers everywhere.
>
> What approach do you use in a similar situation?
> Is there a systematic approach for this?
> Is there reverse-engineering technique to understand the design of code?
>
> Please share your experience and thoughts.

Well, there's a "very big picture", which relates to conceptual paths of
action with particular intent through the software, and relates less to
code and more to conceptual application design -- you get this by
knowing what the software is for, and talking to the people who are
creating it and using it, and understanding what they're trying to do.

The rest of this reply is going to be focusing on the "Big Picture" as
it pertains to understanding the code you're working on, particularly
because you mention that you're doing a "refactor".

Refactors without certain types of rigor often fail. Hard. Due to lack
of understanding of the "Big Picture" of the codebase being worked on,
and due to lack of separation of concern in the existing
codebase. Subtle lack of separation of concern. Drive-you-insane
subtle.

I can't stress this enough -- it sounds like you are going down a path
that I, and many other much more competent devs, have been down too many
times. It's a path that *seems* simple, certainly simpler than the shit
I'm about to talk about, but that's because you don't know what you
don't know. You need to know that to know the Big Picture.

You must know that you don't know what you don't know. You *must* learn
to have this as part of your active mindset when dealing with "legacy"
deployed codebases. The approach you need to take is different than it
is when starting fresh.

The "Big Picture", and your understanding of it, is an emergent property
of the functionality of the existing codebase and your understanding of
the parts that make up that functionality.

Let's get to it:

First, if they aren't already, get things into version control. A
dvcs. Even if it's just your local copy, and you have to do silly things
like tell svn to ignore ".git" and tell git to ignore ".svn" dirs
because git-svn won't work with this codebase.

I prefer git. I prefer git because it focuses on allowing you to
manipulate and get information about the DAG that represents the various
changes you've made and its nodes. It's invaluable for getting a
codebase back to some amorphous time between then and now when things
worked but there's maybe been a bunch of stuff that happened and
something previously unnoticed is broken. That said, any dvcs will do,
but seriously you need to be able to track changes for yourself and
yourself only while poking around, and you need to be able to revert
them or keep them as needed.

Second, xdebug. Get familiar with it. xhprof. get familiar with it. Get
familiar with the options they have for generating profiling and
execution traces.

Turn them on, do something really simple like "load the homepage", and
then start looking a bit at what's going on. Any programming environment
worth its salt will let you take an execution trace and follow it around
in your editor. I use Emacs. I do use geben occasionally, but I don't do
much step debugging.

Try to understand at least a bit of the surrounding code at each point
in the trace. Don't bother trying to understand all of it at once,
there's too much (I've worked on PHP codebases that could produce
multiple-gig execution traces on a page load when the verbosity of the
trace was high. I hope you and everyone else never have to work on
them).

Doing this gives you a little bit of a feel for typical paths of
execution, and exposes you to a fair amount of existing code.

Now for the absolute most important thing to do here:

Do *not* refactor untested code unless absolutely necessary. When you
think it's absolutely necessary, wait until getting it under test has
stumped more than one person, preferably more than one person currently
working on the code, ideally one who has more experience than you do in
some way. I'm not trying to be insulting or demeaning here, it's
seriously that important, and I follow my own advice.

Read the book "Working Effectively with Legacy Code" by Michael
Feathers. Seriously. Read this book. This book saved more than one
refactor project that I've worked on. There's quite a bit in it that
isn't relevant to PHP for one reason or another, but it's written in a
modular-enough fashion that you can skip it. Even if you're not familiar
with the languages used in examples, you shouldn't have a problem
understanding the examples. Read it.

If I was going to make this post way longer than it already is by
talking about the specifics -- the fine-grained,
useful-on-the-code-level specifics, the specifics that have actual code
listings from actual programs and show what was done to refactor them
and why -- it would be "Working Effectively with Legacy Code". 

Seriously, figure out what you need to do to get shit under test before
changing it, test your changes, and be prepared to delete lots of
tests as well. Since isolation is difficult, you might have to mimic
other parts of the system in your test code where that setup code later
becomes redundant unnecessary cruft because of other changes that have
been made. That can be a huge pain, but the combination of execution
traces and thinking carefully make it much easier. It's certainly less
pain than the inevitable flood of functionality changes and breakages
you and your co-coders didn't know you made and aren't sure when you
made them.

Focus on changing as little as possible -- maintain the "API" of the
current code as long as you can, even if it means you need to stick
"class_alias" or similar at the bottom of a new source file because
there are huge swaths of existing code that don't use namespaces or you
just implemented something that did the job of 10 existing classes. Or
if it means taking the static function that a shitton of code is calling
that is modifying global state in a bad bad way and making it dispatch
to an instance method of a replacement class. The other code will be
changed when your understanding of the "Big Picture" is complete enough
for it to be an appropriate change, and you'll know when that time comes.

PHPUnit and php-code-coverage are your friends. There are other
code-analysis tools available as well like PHP_CodeSniffer that are
worth getting to know. 

This process is slow at first, but then gets faster. As long as you have
tests (and well-done tests at that...) in place, you can refactor and
rewrite whatever and know what broke where, if anything. And know if
things ... well ... work. This does bring up a bit of an issue -- what
do you test?

I've found, particularly with this type of "existing freaking old ball
of mud of a codebase" type of refactor-to-test thing, that you should
focus on testing stuff that is part of a class or module's "public
api", if that makes sense. It's not a hard rule of thumb. And of course,
in all things, be pragmatic.

The goal is to improve this software. This software does something. The
only thing documenting it is the code. You want to produce tests that
serve as executable documentation of the intent of the code, and you
want to be able to undo things you did, potentially sweeping huge
changes of things, at a pretty fine-grained level if it turns out that
they were a bad idea. You want to preserve existing functionality as far
as the user is concerned -- it's what the software is for.

Refactoring to tests helps start to create "executable docs", and helps
*immensely* in preserving existing functionality. A DVCS lets you play
with untested larger changes and exploratory coding with the assurance
that you can go back.

On that note, this is long enough. I hope it was time well spent for the
reader.

> -Thanks in advance,
> Behzad

Best of luck,

-- 
Jeremiah Dodds

blog           : http://jdodds.github.com
github         : https://github.com/jdodds
freenode/skype : exhortatory
twitter        : kaens

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php