Friday, February 05, 2010

Automated refactoring without heavy IDEs

Unix programs are beautiful and universally compatible. You can chain them to accomplish incredible tasks. Let's do some automated refactoring such as renaming php classes and methods without IDEs, using only sed, find and grep, plus your version control system of choice. You do not have to edit many hundred files by hand, and these tools should be available in every Linux distribution and maybe also on Mac Os X.

Renaming classes
START
Commit your work and make sure that the output of svn status is empty.
svn update
svn commit -m "what I have done until now..."
svn status
Refactoring is not an exact science, so wrong commands can destroy a codebase. In case you forget a \ and all your <?php tags are replaced by a apt-get cow, you'll simply enter
svn revert -R .
to restore the original state of the working copy after the last commit.
I'm sure your version control system has similar commands.

STEP 1
Provided that you use autoloading, moving or renaming the file that contains the class definition is the first step of the operation.
mv Package1\OldClassName.php Package2\NewClassName.php
Usually directly hooked in the version control system:
svn move Package1\OldClassName.php Package2\NewClassName.php
or what your VCS provides you with.

STEP 2
The syntax of the sed command is analogue to vim's search&replace one:
sed -i -e 's/\<old_classname\>/New_ClassName/g' \
    `find folder1 folder2 -name *.php`
This command renames all OldClassName occurrences:
  • in place, without creating new files (-i)
  • using an expression defined on the command line and that follows (-e)
  • where OldClassName is present as a single word (\< and \> modifiers), and not for example AbstractOldClassName
  • substituting them with NewClassName
  • in all the file and thorughout all the single lines (/g)
We should list after the command all files we want to edit, but we can simply use a find command which find all files that:
  • is in folder1 or folder2, or in their subdirectories (you can insert as many folders as you want)
  • have a name that matches *.php (essentially their extension has to be .php)
The backticks (`) simply indicate to substitute the enclosed command with its result. They are a powerful tool to use sub-commands (otherwise we would have been stuck with find -exec.)The \ let the shell know that the command continues on the new line.

When the php classes to rename do not take advantage of Php 5.3 new features, the renaming it's very simple, since OldClassName is always the fully qualified name of the class. From php 5.3 namespaces are available to structure classes in different packages without very long names, but the namespace separator unfortunately concides with the shell backslash. Thus to enter it you have to insert another backslash:
sed -i -e 's/\<old\\classname\>/New\\KlassName/g' `find folder1 folder2 -name *.php`
sed -i -e 's/\<classname\>/KlassName/g' `find folder1 folder2 -name *.php`
The first sed modifies the use statements or the direct references, which are the means 99% of classes are referred by. The second sed modifies the remaining references to the base class name embedded in php files and it may not be necessary if you're only moving a class around.
A problem that occurs is when you're moving a class from a namespace and their old sibling did not have use statement to import its name but now they have to, or the equivalent specular case when you're moving a class into a namespace and you want use statements to vanish. You'll have to resort to manual editing to fix the interested files.

Don't forget to repeat the two steps also with the test classes. The exact commands depend on your naming convention, but often the test cases mirror the production code hierarchy, so that OldClassNameTest should be replaced with NewClassNameTest. In xUnit frameworks, test cases are first order citizens (classes), so there's nothing different from the production code renaming process.

END
Check the results by looking for every occurrence of the old or new name in your working copy:
grep -r "OldClassName"
or grep -rl to show only the list of files where the pattern does occur.
svn diff and svn status shows you the current changeset (every modified line, added and deleted ones) and the list of modified files respectively.
After manual check, run your whole test suite. If some of your test fail, discover the errors (that's what tests are for) and repair the situation manually. If the working copy is compromised, revert to the original revision.

Renaming methods
Provided that you have not overlapping method names in unrelated classes, the START and the END phases are the same, and they are very good practices to adopt in refactoring.
Methods may have a limited impact on the codebase in respect to classes:
  • to rename private methods, you can edit them directly in your editor of choice, opening the class file.
  • For protected methods, you should check also subclasses, and when they are a handful it's often faster to direct edit the files instead of running sed.
  • For public methods, the problem is analogue to renaming a class, but there are no file movements nor namespacing issues. Follow Step 1.

Of course both these refactorings break backward-compatibility, so make sure you're not modifying any published interface in a minor release of your application. They are also dangerous if uncontrolled, because regular expressions are an advanced tool that can quickly mangle all your code. Make sure you can return to the original copy of the code via version control at any time, and that there are many tests in place that cover the functionalities whose provider classes you want to refactor.
Refactoring shouldn't be hard. I hope this little guide helps you.

5 comments:

Unknown said...

You can also do all the things u used sed and grep for with that little known ide known emacs

Giorgio said...

You can do them also with vim, just append ! at the start of each line. :)

Anonymous said...

This post is really helpful.
But keep in eye that "sed -e 's/\/New_ClassName/g' \"
will replace ALL occurrences of old_classname. This could lead to semantic errors. I suggest you check for something like "class old_classname" or "extends old_classname" or "new old_classname" or "old_classname::" ....

Giorgio said...

Each of these refactorings can be destructive; you should have a test suite to run after execution, or at least look at the svn|git diff before accepting the result.

xlpharmacy said...
This comment has been removed by a blog administrator.

ShareThis