Sayonara PHP

"Martin Alterisio" <malterisio777@xxxxxxxxx> · Wed, 26 Dec 2007 02:04:09 -0300

Please let me do a little explanation of the title first. Japanese is an
interesting language where context is vital to the meaning of a word.
Sayonara usually means a simple "good bye", but within a different context
can mean "we'll probably never meet again".

To understand this mail you'll have to know that I was just another user of
PHP, an user that was probably too eager. I wanted to get more involved with
the development of PHP as I do believe in all the philosophy of open-source.
In the end I found my attempts ended in frustration, but, nevertheless, I
learned a lot in just a few months. I don't want this mail to be one where I
get to display all my frustration, instead I want to leave here all my
findings, the things I researched, the few things I managed to actually
code, and mostly the ideas that someone else might find useful.

---- To those who may want to involve in the php internals ----

For those in the generals list that may ever try to venture in the internals
of PHP, remember that you have to back your point of view with a patch. So,
sit down, remember the old days in college using the c compiler, and code
like a cowboy before trying to promote anything in the internals. It's the
status quo of the PHP development community, as I did learn too late.

---- Namespaces: function imports ----

Here is the patch to add function imports to 5.3. To be consistent constants
imports have to be added too:

http://martinalterisio.com.ar/php5.3/use.function.v2b.patch

If you don't know what imports are, they are just a way to refer to a longer
name with a shorter name. For example:

<?php
class MyRowset extends Zend::Db::Table::Rowset::Abstract {
...

or with imports:

<?php
use Zend::Db::Table::Rowset::Abstract as Rowset;
class MyRowset extends Rowset {
...

The use statement behavior currently supports only class names aliasing.
Functions and constants have to referred with full name, although these too
can be namespaced.

---- Import statement is broken, why, how can be fixed ----

While doing the previous patch I realized that the use statement is broken.
It should generate and error when you try to override an existing name. But
the use statement is handled at compile, where it's unknown if a name will
be overridden or not. What happens is that the error might be triggered
depending on the conditions and order of compilation. If you have an opcode
cache, this error may not appear until the cache is invalidated.

On a suggestion by Dmitry, which I really don't know if he knew about this
issue with use or not, but, anyway, his idea solved this issue, I made this
patch:

http://martinalterisio.com.ar/php5.3/file.scope.use.patch

With this the use statement is checked only against the names used in the
current file (or namespace if using multiple namespaces per file). Since the
imports only affect the current file, this is more sensible, and the issue
mentioned before disappears.

---- Name clash and ambiguity issue introduced by namespaces ----

There's another pending issue with namespaces, there's a name clash that
currently goes undetected, and makes static methods partially unavailable.
This is due to the fact that using :: as namespace separator generates
ambiguity. foo::bar() can refer to the static method bar in class foo, or to
the function bar in the namespace foo. This is an issue to php library
developers. Someone can inject a namespaced function which overrides your
static method.

One possible solution I approached was to prevent the name clash altogether,
but I found this approach inappropriate for 2 reasons: the performance
impact is too big; is not consistent with how other name clashes are handled
in php (classes and functions may have the same name).

Another approach, which I believe is the correct one but never got the
chance to implement in a patch, is to change the order of name resolution,
search first the static method and then the namespaced function, and if the
user wants to refer to the function he can import the function. This way
both remain accessible although the user has to solve the ambiguity. Also
this reduces the impact of adding namespaces on legacy code, since there's
an impact to all static method calls (because first the namespaced function
is searched).

---- Reducing impact on performance introduced by namespaces ----

I found out that although the philosophy behind the namespaces
implementation is to do as much as possible in compile time, but much is
pushed to the executor. Those could be solved on compile time. Much can be
optimized changing the name resolution rules. If these become more explicit,
the compiler can discern which is the actual name that's referred to. As of
now, it can be optimized using imports and explicit names, which are used as
alternative notation. In other words, the normal use of namespaces is not
optimal.

There's still one name resolution that seems inevitable that it will fall to
the executor: the ambiguity mentioned earlier between static methods and
namespaced functions. This could be solved by the user if the use statement
allows to also explicitly indicate the type of import: use class X; use
namespace X; use function X; use const X;

---- Fix name resolution rules for better coding practices ----

Also, as of now, I'm more than confident when I say that the current name
resolution rules will bring much headaches to users. For starters you'll
have to make a habit of prefixing :: to all internal function calls (such as
::count, ::strlen, etc). This way will be safer for creating php libraries,
since another user could inject a namespaced function that overrides those
functions. This is because the function call without that prefix will try
first a function in the same namespace then the internal. Also, for this
same reason, using the :: prefix will be faster (since it's solved at
compile time). And if you want to refer to an element of the current
namespace, is better to use namespace::

If you don't know about the name resolution rules, check what's written in
the manual:

http://php.net/language.namespaces.rules

What I wanted to implement but will never get the chance is name resolution
rules that aren't context aware and explicit:

foo(); // is always global foo (except if foo is an alias)
new A(); // is always global class A (except if A is an alias)
A::B(); // try static method of global class then namespaced function
(except if A is an alias)
namespace::foo(); // is always foo() in current namespace
new namespace::A(); // is always class A in current namespace
::foo(); // is always global foo (aliases ignored)
new ::A(); // is always global class A (aliases ignored)

I think this will improve readability, maintainability and debugging,
because of its explicitness.

---- Autoloading issue with namespaces ----

There is also an issue with autoloading and internal names with the name
resolution rules. The autoload has to be the last thing tried, therefore
even if there's a namespaced name that overrides an internal name, it won't
be seen if its loading its subject to autoloading. That's also another
reason to change the name resolution rules. With the rules I explained
earlier there won't be this issue with the autoload.

---- Possible enhancement for autoloading with namespaces ----

Regarding the autoloading, I think there's an enhancement that can be
achieved with the implementation of namespaces. Consider the possibility of
a namespaced __autoload(). Autoloading in PHP has one important issue: as
the system grows, and external libraries grow, the complexity of the
autoloading increases. Using the spl autoloader, each library adds its
autoloading. If you have many libraries, autoload can cost too much. If a
namespaced __autoload() is implemented, this can reduce the impact by
distributing the autoloading behavior, ie, first use the namespace autoload,
then try the global autoload. A package should know better where its classes
are.

---- Constrained scope for imports is unpractical ----

When trying to refactor code to use namespaces, as a test, I also found that
having the use statement limited to outer scope is unpractical. One
necessary addition, which is not very complicated, is to have an extra scope
for use statements, such as imports in a function scope. It's only a matter
of keeping an extra table for the function scope in compile time.

---- Namespaces keyword issue, it can be solved without taking a keyword
----

There's still the issue of the keyword taken by the namespaces
implementation. It doesn't matter if it's "package" or "namespace". Both are
keywords widely used in php (use google code search if you don't believe
me). I know they have tried to remove the need for the keyword, but I still
think there's a way. Consider the following:

<?php
class Foo::Bar {
  use bla::bla;
}
?>

Instead of:

<?php
namespace Foo;
use bla::bla;
class Bar {
}
?>

In the first there's no need for namespace declaration, it's declared with
the class name. The same can used for functions and consts:

<?php
function Foo::test() {
  use bla::bla;
}
const Foo::CONSTANT = 101;
?>

This approach restricts namespaces to classes, functions or constants scope.
If you want to execute code in a namespace you'll have to be in one of these
scopes. But, I think it's a restriction one would pay in favor of all those
libraries that will break because they use the fatal keyword (think of all
the XML related libraries that use "namespace").

Also, using namespace:: or package:: doesn't need to take a keyword (think
of self:: and parent::, they aren't keywords just special names that can't
be used for naming classes).

---- Namespaces as nested classes? ----

Reading about how the previous implementation of namespaces went down the
drain, one recurring though in some users and developers caught my
attention. Maybe namespaces and nested classes should be one and the same
thing in php. Considering that many are using classes as namespaces for
functions, this is not such a illogical approach to the problem. I have not
much considered the technical feasibility of this approach, but one that
would be probably needed is the ability to forward declare members. Without
this, all definition must be clustered.

Example:
<?php
final class A { // should be final to have nested classes?
  public class B; // forward declaration
}
?>

other file:
<?php
class A::B {
}
?>

I can't say much about this approach. It's just one wild idea.

---- Type hints, improvements could help drastically improve performance
----

I thought much about type hints. Right now they are only seen as syntactic
sugar for system designers, and something that reduces performance. Actually
quite the opposite can be achieved, but not with the current implementation
of type hinting. The guys behind flash 9 obtained a 10x improvement in
performance thanks to type hinting. Actually doing the same with PHP is
quite sensible, since one of the bottlenecks for performance is the zval.
Knowing before hand that the variable is a native type, a just in time
native compile can be done to drastically improve performance.

For that to happen first type hinting must be improved. Here are some
thoughts I shared with another user some time ago:

http://martinalterisio.com.ar/php5.3/php-typehints.txt

---- Taints ----

Last but not least, I thought about taints. Since PHP6 will remove safe mode
and magic quotes, as far as I know, if nothing else is there to prevent
users from being users, well PHP6 might be considered too insecure. Taints
should be the solution to this, but approaches copied from other languages
seem not feasible in PHP. Variable level taints are not the way to go: not
much can be added to zval without suffering the consequences, and a simple
model of tainted/not tainted is not safe enough, as there are many taints to
be considered (XSS, SQL injection, HTML injection to say the least).

I think one possible approach to consider is scope taints. Instead of
tracking taints on variable level, do it on scope level, ie, attach taints
to functions, classes, global scope. Taints should be an arbitrarily sized
list of elements, where the user can also add taints of his own (we don't
know where security holes might appear in the near future, so let's leave
that door open). Taints tracking is to be attached to classes, functions or
global scope (methods use class scope).

When function or class code refer to another scope (function call, method
call, member access, global access) a pollution occurs. In a pollution the
involved scopes become infected with taints from both. The pollution
operation needs a new opcode that can handle a reference to scope either
statically or by an object reference. For each function/class the user has
to be able to mark taints that infect them, which taints they can
handle/resist, and which taints they reject. A function/class ignores
pollution by taints that can handle/resist. If a function/class is polluted
by a taint that rejects, an error occurs. Internal functions should define
also how they are affected by taints, and some defaults taints be specified
for known security issues.

The problem with this approach is that is not an automagical solution. It
requires the user to be conscious of the security issues. If he does nothing
about it an error occurs, but he can mark the scope as one that handles the
taint and still do nothing about it.

There's two alternatives to how keep track of taints:
1) keep a list of taints that pollute the scope
2) keep a list of taints that DO NOT pollute the scope

The second alternative is harder to understand. It assumes that any scope
cannot be trusted by nature. Instead of adding threats, you remove threats.
I think this approach is more secure.

---- The end ----

Well that wraps it all, I think. That's as much as I can download from my
brain which is related to PHP. Do whatever you want with all this, even the
spam folder is fine.

Anyway, it's been fun, and I learned a lot.
My thanks to everyone that ever gave a hand.

A former PHP user says to you all:

Sayonara PHP

P.S.: Please be understanding if I don't answer replies to this email.