Autoloading (Revisited)

September 19th, 2011 § 14 comments

Upon the arrival of PHP 5.0, the ability to autoload classes was introduced. At the time, autoloading was such a new feature, it was hardly adopted. As such, many applications being ported from PHP4 to PHP5 still had lots of procedural code in them (code incapable of being be autoloaded) and many class files which had long ‘require_once‘ lists. It wasn’t until years later that certain best practices had emerged and the prolific usage of require_once/include_once throughout large bodies of code had started drying up. Even after autoloading had been adopted by larger more visible projects, a common patten had yet to emerge. The PEAR project had already had its one-class-per-file rule, and a class to filesystem naming convention, but this was hardly the rule at the time, and as such, there were many different patterns of autoloading strategies.

As time has passed, slowly, more and more projects had gone through re-writes and the strategy that most projects were landing on was the one that came from the PEAR group. Fast-forward to today, and we see that this standard for autoloading has agreed upon by a large number of projects and has come to be named the “PSR-0 autoloading standard”.

What We’ve Learned

After having attained a consistency (for code) in how we utilize autoloading, we’ve attempted to find the most efficient and performance optimized way of executing our autoloading strategy. Matthew Weier O’Phinney has blogged about this in the past, it’s a good read if you have not already read it. To summarize, he found the following things to be true:

  • disk based class name to filesystem location maps are the fastest lookups
  • class filesystem paths that are absolute and that do not rely on include_path are fastest
  • lightweight autoload functions that utilize class maps directly are the fastest

For more information about the above generalization, see Matthew’s blog post.

Nearly a year ago, in conjunction with his findings, Matthew also wrote a classmap generation tool. This tool produced a .classmap.php file that would reside in the directory responsible for containing class files. The general idea here is that a developer could utilize a automatic mapping based autoloader, like the PSR-0 autoloader, or, he could utilize this .classmap.php file in order to build a more performance centric strategy for his/her autoloading needs.

This approach presents developers with two primary problems. One, dot files are generally hidden on a filesystem, and as such, this means that this PHP data array is also part of a code-path that is hidden from most developers view of the codebase. This then lead to moments of confusion when something related to the location of classes goes awry. The second of the problems it that this strategy assumes that the consumer has some way of consuming the contents of this class-map file. For ZF users, they could utilize one of the shipped Zend\Loader classes that are designed to use a class-map. The problem here is not necessarily for ZF users, but that it is promoting a strategy that is more ZF specific than generic in nature.

The addition of, and swift adoption of PHP’s namespace support in PHP 5.3 has also presented us with both a platform for standardization as well as a few challenges. Traditionally, when we thought of the PEAR naming convention, we assumed that for a given class (in prefix notation) Alpha_Beta_Gamma, there would be a single mapping of this class to a single place on the filesystem, namely: some/path/Alpha/Beta/Gamma.php. This inherently presents no problems. What does present a problem is if we have another project that utilizes part of this prefix, but in a different location. Assume that you want to use part of the prefix, for example, the Alpha_Beta_ portion, with a different logical component/module/project within your organization. In this case, it might make sense that class Alpha_Beta_Gamma live in one project on disk, and that Alpha_Beta_Omega live somewhere completely different. Any number of situations could realistically present this problem, but the most apparent is that your organization wants to utilize a naming scheme that allows for MyCompany_MyDivisionWithinMyCompany_PerhapsSomeLogicalComponent_ClassName.

In any of the likely scenarios of the above, a simple mapping rule that might govern one class name to filesystem name autoloader will not work for another class that could conceivably within the same project without some kind of either autoloader filter, or filesystem munging. Either way, we can no longer make the assumption that a simple map of class name to one location on disk mapping will suffice.

More an more, we are seeing this pattern emerge, (this time with namespace):

This class is then found inside its own logical project, with its own data files, web files, or test files in a project structure that looks similar to this:

[code lang="bash"]
path/to/VendorName_Component/
src/
VendorName/
ComponentName/
SomeComponentClass.php
data/
some-data-file.txt
tests/
phpunit.xml
phpunit-bootstrap.php
VendorName/
ComponentName/
SomeComponentClassTest.php'
docs/
some-documentation-format.xml
README.md
[/code]

As you can imagine, any one vendor/organization who’s in the business of building software will more than likely have more than one project that both utilizes this kind of naming scheme and also takes advantage of this listed project structure for developing and releasing this bit of code. This being the case, unless the project is merged with other code for the purposes of a consuming project, parts of the namespace will exist in two separate parts of the filesystem … something which, a specialized autoloader will need to take into consideration.

Ideally, we should find a solution that will present class-map based autoloading in a way that is an easily identifiable code pattern, simple, expressive, works well with common development practices and takes advantages of the current day PHP platform (namespaces and autoloading facilities).

And, What I’ve Found Is This …

And, what I’ve found is that projects should present a few different options as per how they provide an “out-of-the-box” experience as it relates to autoloading. Such a solution should offer the consumer a usage story that consists of the most minimal of requirements when it comes to bootstrapping this 3rd party code. Let’s examine the following project structure (expanded from our example above):

[code lang="bash"]
path/to/VendorName_Component/
src/
VendorName/
ComponentName/
SomeComponentClass.php
data/
some-data-file.txt
tests/
phpunit.xml
phpunit-bootstrap.php
VendorName/
ComponentName/
SomeComponentClassTest.php'
docs/
some-documentation-format.xml
autoload_classmap.php
autoload_function.php
autoload_register.php
README.md
[/code]

What you’ll notice is the addition of 3 autoload_*.php files. Let’s have a look at what these files provide and the reasons for their existence. First the autoload_classmap.php:

This file provides the exact map of the classname to the location on disk that this class can be found in. This file takes advantage of PHP’s ability to have return values returned from the inclusion of a file. A simple usage story for this file might be:

Let’s next look at the autoload_function.php file:

This file provides a closure based autoloader as its return value. This function can then be used by the consumer directly for injecting into their own autoloader stack/queue, or directly into the autoloader queue provided by PHP:

Either way, the consumer is provided with a callback that is capable of being utilized, in a single line, to bootstrap this components autoloading needs.

Finally, the complete, one line solution can be found by utilizing autoload_regsiter.php directly:

While the above is so trivial as to ask why it should be included, it does offer a single-line usage story:

Why not do this in the first place? Well, this approach is assuming the consumer does not necessarily care about how the autoload function is loaded into PHP’s spl_autoload queue. One thing to keep in mind is that when spl_autoload_register() is called, autoloaders are placed as the end of the queue by default. This behavior can be changed by passing true as the 3rd parameter of spl_autoload_register(). This type of performance optimization might be important when you know some autoload-able code will be utilized more often than other code, and thus you want the autoloader for that code to be consulted first. Another reason for this kind of user registration is that some autoloaders might be so generic as to want to act as a fallback autoloader or a generic autoloader. For these kind of autoloaders, it is important that they always be last in the queue since they might throw an error or exception when they cannot find a class as opposed to returning false and letting other autoloader have an attempt at finding the class requested.

Conclusion

The above mentioned strategy is something to be considered if you are creating reusable PHP components that you wish provide perhaps as Pyrus packages and/or as PHP phar archives for 3rd party consumption. This autoloading strategy provides an out-of-the-box usability experience in minimal amount of code. It also plays nice with other autoloaders, provides a solution that is opcode cacheable, and since it utilizes absolute paths (via __DIR__) – minimizes the amount of stat() calls to the filesystem your application will generate during its runtime.

Tagged , , ,

  • http://blog.stuartherbert.com/php/ Stuart Herbert

    Nice post.

    The concern I have with the classmap approach is simple: the whole point of PSR0 was to get away from each app, each component, having to inject additional autoloaders onto the autoloader queue.
    There’s a second concern – with the approach you’ve blogged about, developers almost have to initialise each component (by loading its classmap) before making use of it. Bootstrap files can already be a major performance drag – filling it with include() statements for components somewhat negates the ease-of-use and lightweight benefits that autoloading brings.

    I’d love to see a follow-up looking at how components can take advantage of classmaps without having to ship their own autoloader, and without having to make an explicit include() call to load the classmap either :)

  • Shein Alexey

    Hello, Ralph!
    Thanks for very interesting read, finally I could organize this topic in my mind, will definitely use this approach in my project :)
    IMO autoload_function.php file is excessive, I’d merge it with autoload_register.php (inline closure into spl_autoload_register call) since very few users bother about implementation details, and if they are, they can edit the call right at hand.

  • http://phpdeveloper.org chris

    Weeeeird….I was just thinking about this last night (more on the topic of standardized interfaces, though). Thanks for the update on this – I’ve actually be wondering if there’s been a namespace move like this.

  • http://weierophinney.net/matthew/ Matthew Weier O’Phinney

    Stu — The point of the autoloader functionality Ralph describes is to serve several use cases, not just one. First, the code would still follow PSR-0, so a PSR-0 autoloader would still work; you simply potentially have additional paths for your include_path (though depending on how the package is installed, the src may be merged into a single path!).

    Next, for standalone installs, such as importing via git submodules, or one-off scripts, you may not want to create an autoloader. That’s where autoload_register.php shines — simply require it, and start using the classes.

    A third use case exists in ZF2. Having the classmap available as a discrete object allows us to merge classmaps from several modules/components together to be consumed in a single autoloader. This will give huge performance benefits — while still maintaining PSR-0 compatibility, and reducing the number of code paths/stat calls necessary to provide the autoloading functionality.

    Shein — the main reason for separating the closure and the registration is for folks who want to control the order in which autoloaders are registered. I can see where it can be considered excessive, but if you look at flexibility of use cases, it makes sense.

  • http://pooteeweet.org Lukas

    I can see that it makes sense to provide an auto loader to get up and running quickly. But I don’t see a point in trying to standardize on a way for each lib to provide class map and other optimizations. If you want these optimizations you don’t want them per lib, you want a single class map etc.

  • Pingback: PHPDeveloper.org: Ralph Schindler's Blog: Autoloading (Revisited)

  • Pingback: PHP Autoloading cikkek | Kerek egy ég alatt

  • Pingback: Autoloading (Revisited) | BrainPair - the Techno Blog

  • http://eugenioz.blogspot.com/ OZ

    This Autoloader https://github.com/jamm/Autoload can load classes by PSR-0 standart, can map classes and can map namespaces.

  • http://pooteeweet.org Lukas

    you might be interested in the discussion on the composer list http://groups.google.com/group/composer-dev/browse_thread/thread/3514e3aa1bb77bfa

  • Pingback: Autoloading (Revisited) | Zend Framework University

  • Christian Weiske

    Lowercasing the vendor would solve your problem. Since all other names are uppercase, you have a clear distinction between \foo\Bar\Baz and \foo\bar\Baz.

  • http://answers.com Aaron C. Meadows

    Fantastic article! Minor correction: Line 6 of the usage example for ‘autoload_function.php’ has ‘autoload_classmap.php’ but should be ‘autoload_function.php’

  • Elodie

    Really informative post! This page is a nice reference on autoloading in PHP as well:

    Example of autoloading in PHP

What's this?

You are currently reading Autoloading (Revisited) at Ralph Schindler.

meta