Autoloading (Revisited)

September 19th, 2011 § 14 comments § permalink

Upon the arrival of PHP 5.0, the ability to autoload classes was introduced. At the time, autoloading was such a new feature, it was hardly adopted. As such, many applications being ported from PHP4 to PHP5 still had lots of procedural code in them (code incapable of being be autoloaded) and many class files which had long ‘require_once‘ lists. It wasn’t until years later that certain best practices had emerged and the prolific usage of require_once/include_once throughout large bodies of code had started drying up. Even after autoloading had been adopted by larger more visible projects, a common patten had yet to emerge. The PEAR project had already had its one-class-per-file rule, and a class to filesystem naming convention, but this was hardly the rule at the time, and as such, there were many different patterns of autoloading strategies.

As time has passed, slowly, more and more projects had gone through re-writes and the strategy that most projects were landing on was the one that came from the PEAR group. Fast-forward to today, and we see that this standard for autoloading has agreed upon by a large number of projects and has come to be named the “PSR-0 autoloading standard”.

What We’ve Learned

After having attained a consistency (for code) in how we utilize autoloading, we’ve attempted to find the most efficient and performance optimized way of executing our autoloading strategy. Matthew Weier O’Phinney has blogged about this in the past, it’s a good read if you have not already read it. To summarize, he found the following things to be true:

  • disk based class name to filesystem location maps are the fastest lookups
  • class filesystem paths that are absolute and that do not rely on include_path are fastest
  • lightweight autoload functions that utilize class maps directly are the fastest

For more information about the above generalization, see Matthew’s blog post.

Nearly a year ago, in conjunction with his findings, Matthew also wrote a classmap generation tool. This tool produced a .classmap.php file that would reside in the directory responsible for containing class files. The general idea here is that a developer could utilize a automatic mapping based autoloader, like the PSR-0 autoloader, or, he could utilize this .classmap.php file in order to build a more performance centric strategy for his/her autoloading needs.

This approach presents developers with two primary problems. One, dot files are generally hidden on a filesystem, and as such, this means that this PHP data array is also part of a code-path that is hidden from most developers view of the codebase. This then lead to moments of confusion when something related to the location of classes goes awry. The second of the problems it that this strategy assumes that the consumer has some way of consuming the contents of this class-map file. For ZF users, they could utilize one of the shipped Zend\Loader classes that are designed to use a class-map. The problem here is not necessarily for ZF users, but that it is promoting a strategy that is more ZF specific than generic in nature.

The addition of, and swift adoption of PHP’s namespace support in PHP 5.3 has also presented us with both a platform for standardization as well as a few challenges. Traditionally, when we thought of the PEAR naming convention, we assumed that for a given class (in prefix notation) Alpha_Beta_Gamma, there would be a single mapping of this class to a single place on the filesystem, namely: some/path/Alpha/Beta/Gamma.php. This inherently presents no problems. What does present a problem is if we have another project that utilizes part of this prefix, but in a different location. Assume that you want to use part of the prefix, for example, the Alpha_Beta_ portion, with a different logical component/module/project within your organization. In this case, it might make sense that class Alpha_Beta_Gamma live in one project on disk, and that Alpha_Beta_Omega live somewhere completely different. Any number of situations could realistically present this problem, but the most apparent is that your organization wants to utilize a naming scheme that allows for MyCompany_MyDivisionWithinMyCompany_PerhapsSomeLogicalComponent_ClassName.

In any of the likely scenarios of the above, a simple mapping rule that might govern one class name to filesystem name autoloader will not work for another class that could conceivably within the same project without some kind of either autoloader filter, or filesystem munging. Either way, we can no longer make the assumption that a simple map of class name to one location on disk mapping will suffice.

More an more, we are seeing this pattern emerge, (this time with namespace):

This class is then found inside its own logical project, with its own data files, web files, or test files in a project structure that looks similar to this:

[code lang="bash"]
path/to/VendorName_Component/
src/
VendorName/
ComponentName/
SomeComponentClass.php
data/
some-data-file.txt
tests/
phpunit.xml
phpunit-bootstrap.php
VendorName/
ComponentName/
SomeComponentClassTest.php'
docs/
some-documentation-format.xml
README.md
[/code]

As you can imagine, any one vendor/organization who’s in the business of building software will more than likely have more than one project that both utilizes this kind of naming scheme and also takes advantage of this listed project structure for developing and releasing this bit of code. This being the case, unless the project is merged with other code for the purposes of a consuming project, parts of the namespace will exist in two separate parts of the filesystem … something which, a specialized autoloader will need to take into consideration.

Ideally, we should find a solution that will present class-map based autoloading in a way that is an easily identifiable code pattern, simple, expressive, works well with common development practices and takes advantages of the current day PHP platform (namespaces and autoloading facilities).

And, What I’ve Found Is This …

And, what I’ve found is that projects should present a few different options as per how they provide an “out-of-the-box” experience as it relates to autoloading. Such a solution should offer the consumer a usage story that consists of the most minimal of requirements when it comes to bootstrapping this 3rd party code. Let’s examine the following project structure (expanded from our example above):

[code lang="bash"]
path/to/VendorName_Component/
src/
VendorName/
ComponentName/
SomeComponentClass.php
data/
some-data-file.txt
tests/
phpunit.xml
phpunit-bootstrap.php
VendorName/
ComponentName/
SomeComponentClassTest.php'
docs/
some-documentation-format.xml
autoload_classmap.php
autoload_function.php
autoload_register.php
README.md
[/code]

What you’ll notice is the addition of 3 autoload_*.php files. Let’s have a look at what these files provide and the reasons for their existence. First the autoload_classmap.php:

This file provides the exact map of the classname to the location on disk that this class can be found in. This file takes advantage of PHP’s ability to have return values returned from the inclusion of a file. A simple usage story for this file might be:

Let’s next look at the autoload_function.php file:

This file provides a closure based autoloader as its return value. This function can then be used by the consumer directly for injecting into their own autoloader stack/queue, or directly into the autoloader queue provided by PHP:

Either way, the consumer is provided with a callback that is capable of being utilized, in a single line, to bootstrap this components autoloading needs.

Finally, the complete, one line solution can be found by utilizing autoload_regsiter.php directly:

While the above is so trivial as to ask why it should be included, it does offer a single-line usage story:

Why not do this in the first place? Well, this approach is assuming the consumer does not necessarily care about how the autoload function is loaded into PHP’s spl_autoload queue. One thing to keep in mind is that when spl_autoload_register() is called, autoloaders are placed as the end of the queue by default. This behavior can be changed by passing true as the 3rd parameter of spl_autoload_register(). This type of performance optimization might be important when you know some autoload-able code will be utilized more often than other code, and thus you want the autoloader for that code to be consulted first. Another reason for this kind of user registration is that some autoloaders might be so generic as to want to act as a fallback autoloader or a generic autoloader. For these kind of autoloaders, it is important that they always be last in the queue since they might throw an error or exception when they cannot find a class as opposed to returning false and letting other autoloader have an attempt at finding the class requested.

Conclusion

The above mentioned strategy is something to be considered if you are creating reusable PHP components that you wish provide perhaps as Pyrus packages and/or as PHP phar archives for 3rd party consumption. This autoloading strategy provides an out-of-the-box usability experience in minimal amount of code. It also plays nice with other autoloaders, provides a solution that is opcode cacheable, and since it utilizes absolute paths (via __DIR__) – minimizes the amount of stat() calls to the filesystem your application will generate during its runtime.

Where am I?

You are currently viewing the archives for September, 2011 at Ralph Schindler.