One of my
recent projects
at Raizlabs required the distribution of PHP software to
customers as a trial demo. We wanted to allow customers to demo the
software before they committed to a purchase. To limit piracy, we
wanted to protect the PHP code itself.
We looked into various solutions for PHP
source code protection, among them other open source encoders and
obfuscators, as well as some closed source byte-code compilers. Our
issue with the byte-code compilers was that in every case, they
either required additional run-time loadable modules be shipped with
our application, or they required that server extensions such as
Zend be installed. Given that we could not guarantee the functional
state of the server on which our scripts will be installed, we
decided against any solution involving byte-code compilation.
The popular open source obfuscators,
without fail, broke our source code, or required us to change our
source code to suit the obfuscator. It was quicker to write our own
obfuscator in C# than to change all of our PHP code to conform to
the arbitrary code guidelines imposed by the obfuscators. Our own
obfuscator though, of course, imposes our own guidelines on the code
it is capable of obfuscating, due to our own coding style and
internal PHP coding practices. The major rule, if using this
obfuscator to encode your PHP scripts, is that it does not
understand PHP variables declared in the body of the HTML. This
means that if you name your input tag foo, you
cannot use the variable $foo in your PHP code,
unless it is explicitly excluded by the user of the PHP Obfuscator
application. The way around this, of course, is to use the
$_REQUEST, $_POST, or
$_GET arrays for all HTML input variables, as
$_REQUEST[‘foo’] will still be valid, even after
the scripts are obfuscated.
Background
This obfuscator was written mainly to
encode a piece of PHP software called
PHP Email Manager. This
application required that we be able to exclude various files and
variables from the obfuscation process, in order to support user
defined configurations in a config.php file. This file will
set up variables for use by the rest of the application, but it is
important that this file remains readable, and that the variable
names expressed in this file remain unchanged throughout the rest of
the application.
There are three main parts to the
obfuscation application:
-
The PHP Obfuscator GUI, which allows the
user to select source code to be encoded, source code to be
excluded, functions and variables to be excluded, and
obfuscation preferences. Obfuscation can be executed from the
GUI, or an obfuscation project file can be persisted in XML for
use by the command line tool.
-
The PHP Obfuscator command line tool, which
allows the user to automate a script into their build process
for obfuscation. The one command line argument accepted by the
command line tool is the filename of an obfuscation project
file, which is created by the GUI.
-
The Obfuscator class,
which is used by both the GUI and the command line tool, to
perform the obfuscation on the target PHP code.
Using the code
Use of the Obfuscator
class is a simple process of instantiating an instance of the
Obfuscator class with an
ObfuscatorUI object as a parameter, and calling
Start. The ObfuscatorUI is an interface
implementation of IObfuscatorUI that provides
the obfuscator with the following functions:
StatusUpdate(String), Done(), and
Error(string).
Through these three functions, the Obfuscator
class can communicate back to whatever component called it as it
proceeds to encode your source code.
The first parameter to the
Start function on the obfuscator is an
ObfuscatorConfig object. All this object does is the
instantiation of the class that was persisted by the GUI when the
user selected to save a project file. A PHP obfuscator project file
is a persisted ObfuscatorConfig object, using
built-in .NET XML serialization.
ObfuscatorUI ui = new ObfuscatorUI();
Obfuscator obfuscator = new Obfuscator(ui);
obfuscator.Start(config, false);
In the above example, the
ObfuscatorUI UI object is a simple class, defined in
Program.cs, which just outputs the status returned by the
obfuscator to the console window.
class ObfuscatorUI : Obfuscation.IObfuscatorUI
{
#region IObfuscatorUI Members
public void StatusUpdate(string status)
{
Console.WriteLine(status);
}
public void Done()
{
Console.WriteLine("Done.");
}
public void Error(string errorText)
{
Console.WriteLine("Error: " + errorText);
}
#endregion
}
In the case of the main PHP obfuscator
UI, the implementation of the IObfuscatorUI
interface is actually done on the main Form
class of the application.
Just a note: the second parameter to the
Start function on the obfuscator is the
asynchronous operation flag. This will determine whether all the
obfuscation of your source code occurs synchronously (as was the
case in the command line tool) or synchronously (as is the case in
the GUI). This allows for any GUI that uses this class to remain
responsive as encoding is taking place.
All regular expressions to be used
during obfuscation for detection of variable names, function calls,
function declarations, class declarations, and strings are stored in
the Settings class of the obfuscation DLL.
How the encoding
works
The Obfuscator
class has three functional components:
-
Encode variable names
-
Encode function names
-
Remove whitespace
Through the application of these three
functions, the source code is rendered somewhat unreadable, but
still fully functional.
When the obfuscation process is started,
if the target directory already exists, the user is prompted so it
can be removed. Its removal is essential because this is the
directory into which every file from the source directory will be
copied and modified. The target directory will become an exact
replica of the source directory hierarchy, aside from the encoding
that is performed on all the files.
After the target directory is created, a
recursive copy from the source directory to the target directory
takes place. All files are copied, regardless of whether they were
selected to be encoded. The copying of every file takes place so
that the end result in the target directory is a complete solution,
not just the encoded files.
Once all the un-encoded files exist in
the target directory, the obfuscation begins. Each file that was
selected is opened, and PHP code blocks are extracted. No HTML parts
of the PHP file should be processed. From every block of code,
comments are removed, then variables are renamed (an MD5 is created
from their original name), then whitespace is removed. Function and
class declarations within this code block are then detected using a
regular expression, and they are added to a list for renaming in the
second pass of all the selected files, assuming they do not match a
list of function names that are built into PHP (see the
phpFunctions class).
After the above process has been applied
to all the blocks in a file, the file is re-written with the
replacement PHP code blocks, and we proceed to the next file. After
all the files have been processed, a second pass of the files is
made, renaming all of the detected function calls and class
instantiations that we detected names for in the first pass.
Points of Interest
There is no one -right way- to encode
PHP files for distribution; everyone has their own preferences and
their own technique. We developed what works best for us for a very
specific situation; others will have different techniques that are
better suited for their unique situations. This is just the one that
suited us, and I hope someone else can find it useful, and maybe add
to its functionality.