Autoloading (Revisited)

September 19th, 2011 by Ralph Schindler

Upon the arrival of PHP 5.0, the ability to autoload classes was introduced. At the time, autoloading was such a new feature, it was hardly adopted. As such, many applications being ported from PHP4 to PHP5 still had lots of procedural code in them (code incapable of being be autoloaded) and many class files which had long ‘require_once‘ lists. It wasn’t until years later that certain best practices had emerged and the prolific usage of require_once/include_once throughout large bodies of code had started drying up. Even after autoloading had been adopted by larger more visible projects, a common patten had yet to emerge. The PEAR project had already had its one-class-per-file rule, and a class to filesystem naming convention, but this was hardly the rule at the time, and as such, there were many different patterns of autoloading strategies.

As time has passed, slowly, more and more projects had gone through re-writes and the strategy that most projects were landing on was the one that came from the PEAR group. Fast-forward to today, and we see that this standard for autoloading has agreed upon by a large number of projects and has come to be named the “PSR-0 autoloading standard”.

What We’ve Learned

After having attained a consistency (for code) in how we utilize autoloading, we’ve attempted to find the most efficient and performance optimized way of executing our autoloading strategy. Matthew Weier O’Phinney has blogged about this in the past, it’s a good read if you have not already read it. To summarize, he found the following things to be true:

  • disk based class name to filesystem location maps are the fastest lookups
  • class filesystem paths that are absolute and that do not rely on include_path are fastest
  • lightweight autoload functions that utilize class maps directly are the fastest

For more information about the above generalization, see Matthew’s blog post.

Nearly a year ago, in conjunction with his findings, Matthew also wrote a classmap generation tool. This tool produced a .classmap.php file that would reside in the directory responsible for containing class files. The general idea here is that a developer could utilize a automatic mapping based autoloader, like the PSR-0 autoloader, or, he could utilize this .classmap.php file in order to build a more performance centric strategy for his/her autoloading needs.

This approach presents developers with two primary problems. One, dot files are generally hidden on a filesystem, and as such, this means that this PHP data array is also part of a code-path that is hidden from most developers view of the codebase. This then lead to moments of confusion when something related to the location of classes goes awry. The second of the problems it that this strategy assumes that the consumer has some way of consuming the contents of this class-map file. For ZF users, they could utilize one of the shipped Zend\Loader classes that are designed to use a class-map. The problem here is not necessarily for ZF users, but that it is promoting a strategy that is more ZF specific than generic in nature.

The addition of, and swift adoption of PHP’s namespace support in PHP 5.3 has also presented us with both a platform for standardization as well as a few challenges. Traditionally, when we thought of the PEAR naming convention, we assumed that for a given class (in prefix notation) Alpha_Beta_Gamma, there would be a single mapping of this class to a single place on the filesystem, namely: some/path/Alpha/Beta/Gamma.php. This inherently presents no problems. What does present a problem is if we have another project that utilizes part of this prefix, but in a different location. Assume that you want to use part of the prefix, for example, the Alpha_Beta_ portion, with a different logical component/module/project within your organization. In this case, it might make sense that class Alpha_Beta_Gamma live in one project on disk, and that Alpha_Beta_Omega live somewhere completely different. Any number of situations could realistically present this problem, but the most apparent is that your organization wants to utilize a naming scheme that allows for MyCompany_MyDivisionWithinMyCompany_PerhapsSomeLogicalComponent_ClassName.

In any of the likely scenarios of the above, a simple mapping rule that might govern one class name to filesystem name autoloader will not work for another class that could conceivably within the same project without some kind of either autoloader filter, or filesystem munging. Either way, we can no longer make the assumption that a simple map of class name to one location on disk mapping will suffice.

More an more, we are seeing this pattern emerge, (this time with namespace):

namespace VendorName\ComponentName {

    class SomeComponentClass {

    }

}

This class is then found inside its own logical project, with its own data files, web files, or test files in a project structure that looks similar to this:

path/to/VendorName_Component/
    src/
        VendorName/
            ComponentName/
                SomeComponentClass.php
    data/
        some-data-file.txt
    tests/
        phpunit.xml
        phpunit-bootstrap.php
        VendorName/
            ComponentName/
                SomeComponentClassTest.php'
    docs/
        some-documentation-format.xml
    README.md

As you can imagine, any one vendor/organization who’s in the business of building software will more than likely have more than one project that both utilizes this kind of naming scheme and also takes advantage of this listed project structure for developing and releasing this bit of code. This being the case, unless the project is merged with other code for the purposes of a consuming project, parts of the namespace will exist in two separate parts of the filesystem … something which, a specialized autoloader will need to take into consideration.

Ideally, we should find a solution that will present class-map based autoloading in a way that is an easily identifiable code pattern, simple, expressive, works well with common development practices and takes advantages of the current day PHP platform (namespaces and autoloading facilities).

And, What I’ve Found Is This …

And, what I’ve found is that projects should present a few different options as per how they provide an “out-of-the-box” experience as it relates to autoloading. Such a solution should offer the consumer a usage story that consists of the most minimal of requirements when it comes to bootstrapping this 3rd party code. Let’s examine the following project structure (expanded from our example above):

path/to/VendorName_Component/
    src/
        VendorName/
            ComponentName/
                SomeComponentClass.php
    data/
        some-data-file.txt
    tests/
        phpunit.xml
        phpunit-bootstrap.php
        VendorName/
            ComponentName/
                SomeComponentClassTest.php'
    docs/
        some-documentation-format.xml
    autoload_classmap.php
    autoload_function.php
    autoload_register.php
    README.md

What you’ll notice is the addition of 3 autoload_*.php files. Let’s have a look at what these files provide and the reasons for their existence. First the autoload_classmap.php:

<?php
return array(
    'VendorName\Component\SomeComponentClass' => __DIR__ . '/src/VendorName/ComponentName/SomeComponentClass.php'
    /* .. other classes here .. */
);

This file provides the exact map of the classname to the location on disk that this class can be found in. This file takes advantage of PHP’s ability to have return values returned from the inclusion of a file. A simple usage story for this file might be:

<?php
// ...
$classmapAutoloader = new MyClassMapAutoloader();
$classmapAutoloader->loadClassMap(include __DIR__ . '/vendors/VendorName_Component-1.5/autoload_classmap.php');
// ...

Let’s next look at the autoload_function.php file:

<?php
return function ($class) {
    static $classmap = null;
    if ($classmap === null) {
        $classmap = include __DIR__ . '/autoload_classmap.php';
    }
    if (!isset($classmap[$class])) {
        return false;
    }
    return include_once $classmap[$class];
};

This file provides a closure based autoloader as its return value. This function can then be used by the consumer directly for injecting into their own autoloader stack/queue, or directly into the autoloader queue provided by PHP:

<?php
// ...
spl_autoload_register(include __DIR__ . '/vendors/VendorName_Component-1.5/autoload_function.php');
// ... or ...
$autoloader = new MyFancyAutoloader();
$autoloader>registerAutoloaderFunction(include __DIR__ . '/vendors/VendorName_Component-1.5/autoload_classmap.php');

Either way, the consumer is provided with a callback that is capable of being utilized, in a single line, to bootstrap this components autoloading needs.

Finally, the complete, one line solution can be found by utilizing autoload_regsiter.php directly:

<?php
// autoload_register.php
spl_autoload_register(include __DIR__ . '/autoload_function.php');

While the above is so trivial as to ask why it should be included, it does offer a single-line usage story:

<?php
// ...
require_once __DIR__ . '/vendors/VendorName_Component-1.5/autoload_register.php';

Why not do this in the first place? Well, this approach is assuming the consumer does not necessarily care about how the autoload function is loaded into PHP’s spl_autoload queue. One thing to keep in mind is that when spl_autoload_register() is called, autoloaders are placed as the end of the queue by default. This behavior can be changed by passing true as the 3rd parameter of spl_autoload_register(). This type of performance optimization might be important when you know some autoload-able code will be utilized more often than other code, and thus you want the autoloader for that code to be consulted first. Another reason for this kind of user registration is that some autoloaders might be so generic as to want to act as a fallback autoloader or a generic autoloader. For these kind of autoloaders, it is important that they always be last in the queue since they might throw an error or exception when they cannot find a class as opposed to returning false and letting other autoloader have an attempt at finding the class requested.

Conclusion

The above mentioned strategy is something to be considered if you are creating reusable PHP components that you wish provide perhaps as Pyrus packages and/or as PHP phar archives for 3rd party consumption. This autoloading strategy provides an out-of-the-box usability experience in minimal amount of code. It also plays nice with other autoloaders, provides a solution that is opcode cacheable, and since it utilizes absolute paths (via __DIR__) – minimizes the amount of stat() calls to the filesystem your application will generate during its runtime.

Learning About Dependency Injection and PHP

May 18th, 2011 by Ralph Schindler

Over the past few years, there are a few concepts and programming patterns that have muscled their way into the hearts and minds of PHP developers from other languages and programming communities. These concepts range from the MVC application architecture as well as various modeling techniques (think ActiveRecord and Data Mapper), to a pure shift in the way we think about application architectures, like aspect-oriented programming (AoP) and event-driven programming. Perhaps it’s because PHP has been adopted at an enterprise level thus increasing the demand for what developers might call enterprise quality programming patterns, or perhaps it’s simply because of PHP’s ever evolving object model that makes new things possible. After all, who doesn’t like new shiny things? Whatever the reason, one of the newest concepts (at least over the past 3 years or so) that has emerged as one of our heated topics of debate is how to manage object dependencies. Interestingly, the argument of how to manage dependencies is generally named by the solution which its proponents give as the solution: dependency injection (the abstract principle is actually called Inversion of control).

In any circle of developers that are of the object-oriented persuasion, you’ll never hear an argument that dependency injection itself, is bad. In these circles, it is generally accepted that injecting dependencies is the best way to go. Injecting object dependencies in PHP looks like this:


// construction injection
$dependency = new MyRequiredDependency;
$consumer = new ThingThatRequiresMyDependency($dependency);

That’s basically it. There are many variations of this: setter injection, interface injection, call time injection, in addition to the above mentioned constructor injection. These are all valid ways of injecting the dependencies into the consuming object. Ultimately, the goal here is to avoid this:


class ThingThatHasAnExternalDependency
{
    public function __construct() {
        $this->dependency = new ARequiredDependency;
        // or
        $this->secondDependency = ARequiredDependency::getInstance();
    }
}

The above code is an example of a violation of the Hollywood Principle, which basically states: “Don’t call us, we’ll call you.”.

Yet, this is not the heart of the argument. Perhaps it was 4-5 years ago in the PHP community, but it’s not anymore. The heart of the argument is not should we be doing it, but how do we go about doing it.

This article is not about the intricacies and implementation details of DI containers and DI frameworks. It’s also not about the various ways and means of injecting dependencies into other objects, or which method might be better. In fact, this article has no opinion if injecting dependencies is even good for you or your application. This article is an exploration how adopting any DI framework for PHP affects the lifecycle of a project, both the code as well as the developer, team or organization that is constructing it.

A Brief History of Dependency Management In PHP

It is important to know why PHP is as popular as it is, after all, it’s this popularity that DI Frameworks fight against for adoption inside a PHP application framework. To understand PHP’s popularity, history, and evolution, let’s look at this code:

// these 6 lines actually represent 5 different web centric "langauges"!
include_once 'includes/config.php'; // ultimately there is a mysql_connect() call in here somewhere
include_once 'templates/header.php';
$rows = mysql_query('SELECT * FROM users'); // magically uses the mysql_connect() resource
foreach ($rows as $row) {
    echo '<div class="user-row"><a href="/delete-user.php" onclick="someJSFunction();">' . $row['username'] . '</div>';
}
include_once 'templates/footer.php';

From the beginning, we’ve been trained into thinking that our dependencies are magically managed. As you can see above, the mysql_query() function, while it will accept a connection resource, does not require it. In fact, if it’s not supplied, it will use the first open mysql connection it can find inside the PHP runtime. Assuming that the above mentioned delete-user.php script is part of a larger collection of PHP scripts, which we will call “the application” … it is important to note that even this script itself is pulling in its dependencies instead of them being injected. For all intents and purposes, the config.php, header.php and footer.php are all dependencies of this script, much like other scripts similar in nature to this delete-user.php. To sum it up, if there is a new dependency that is now required by the business logic portion of this application (ie: the lines between the header and footer), they now have to be introduced to all scripts in this application. This does not exactly adhere to the DRY principle.

But, let’s take a step back and look at this snippet of code from the organizational perspective. To do this, we must first understand the various phases of the code’s lifecycle within any organization. For the purposes of this example, let’s assume that from idea to production, code will go through the following phases: development, build, deployment, to application start-up (in production). If this were a C/C++ or Java project, code will have been written (developed), it will have been compiled (built), then it would have been packaged or some deployment tool’s process invoked (deployed); it them would have been run (executed via some startup script, or executing a binary.) PHP, and Perl at the time, achieved all of the same objectives but in fewer steps making it a wildly popular platform for highly iterative web projects. This same application in PHP would have been coded in some text editor (developed), and FTP’d up to a production server (deployed). You’ll notice that it neither had to be built/compiled, or started on the server since the target, Apache, was already running with PHP embedded into it. For all intents and purposes, a cheap and easy FTP tool was both the build and deployment tool for this application’s lifecycle.

It was this simplicity that made PHP the popular choice for web applications. This popularity was attained because the simplicity of the PHP platform allowed for two extremely important facets of development to emerge: the idea of building an application became approachable to even the novice individual, and without all the cruft that came along with the application lifecycle, building and deploying applications in PHP increased PHP’s “fun-ness” factor.

While this style of building applications allowed for a proliferation of PHP applications to be developed, there was in fact a negative side to be revealed later in time. As applications quickly grew, their ability to be maintained decreased. We give them the name “Spaghetti code”, and for all the right reasons. Objects, if they were even being used, were generally wrappers around procedural functionality. So object dependency management wasn’t even a consideration for most developers. Looking back, perhaps it was this original simplicity that allowed developers to create applications without even having to know what a dependency was or how to find it. In any case, as these applications grew uncontrollably, maintaining them and hacking them started to lose the PHP fun factor exponentially.

A Brief History of DI Frameworks

As PHP developers started identifying the problems with their Model 1 applications, they started looking for solutions in other programming communities. At this time, the Java community was still heavily rooted in the enterprise/software development/software engineering world, and problems such as dependency management already had some interesting solutions. Most notably, there was the Spring Framework, who’s primary facility for dependency management was a component called the IoC Container, or the Inversion of Control container. This container managed the fully lifecycle of object creation using callbacks. This meant that you no longer has to use the “new” keyword (the same new keyword in PHP). Also, it wired the dependencies for you at instantiation time. This meant that you no longer had to concern yourself with how dependencies were injection; be it through the constructor, properties or setter methods. The Spring Framework was one of the first frameworks that encouraged the use of definition files to manage the knowledge required to wire all your dependencies together. True to form in the Java community, these definition files were created in XML.

As it might seem, this is indeed a deviation from the PHP philosophy that had made PHP so popular. PHP allowed you to write the most minimal amount of code to complete your application. In the Java/DI world, particularly with the Spring framework, you had a much richer application lifecycle. Not only were you developing code for your appliation, but you were creating code about code to manage code. This is known as meta-programming. In addition to this meta-programming that was going on, you also now had this compilation phase required by the Java platform which was generally tucked away inside your build time tasks. Moreover, this application had to be deployed (there were generally tools for this too), and (for good measure), due to the platform, your application had to be started. Needless to say, this application lifecycle might seem heavier, for lack of a better term, to the average PHP developer.

Since then, several frameworks have cropped up that sport some kind of dependency management. Before this technique was picked up in PHP, they were all heavily rooted in the Java and .NET communities. A quick google search will return a few notable names like PicoContainer, Spring.NET, Unity, Butterfly and google-guice to name a few. These frameworks attain popularity since they attempt to ease some of the burdens that DI places upon the developer whether it be by using reflection to create definitions, or even adding an annotation system so that DI definitions can be written inside the code they are set to manage.

DI and PHP

To understand the attainability of having a dependency management framework for PHP, one should first understand how the counterparts in Java and .NET rely upon their respective platforms to do certain jobs. For a quick reference, see the images from this blog post. One of the more important facets to remember is that the expected application lifecycle of a Java/.NET application is much richer. You are expected to have build-time tasks. You are expected to have deployment tasks. And, generally, your application understand the difference between being in development, staging and production – so it can adjust how it runs accordingly. Moreover, the platform itself has facilities in place that aid the developer both in development time with code generation as well as in production.

PHP never expects or facilitates the usage of any kind of build-time tasks. PHP also does not have any kind of built-in annotation support (a meta-programming technique), nor does it have any kind of application scope or per-application memory space. What does this mean for someone who is creating a DI container? Let’s explore.

Development Time

General speaking, any time you are writing, altering or just shifting code around, you are in development mode, your application should be running in a development environment. The structure of your application’s classes, functions and files within the filesystem is probably changing with each time you click save. Dependency management systems require knowledge of your code in order to effectively do their job. This knowledge generally comes in the form of some kind of definition.

This definition can be created by hand, by the developer, generated at runtime by some application hooks, or generated with the use of a special tool. If this is done by hand, a developer is required to explicitly map the various functions/methods that will need to be called in order to inject a particular object dependency. The more dependencies you have, the more verbose this definition might become.

A better route would be to generate this definition file, after all, the code you’ve written, if written correctly will self-describe its dependencies. There are two options for generation, manual and automatic. An example of manual generation would be a developer giving a command line tool the minimal information it needs to be able to go parse your code, figure out the dependency map for itself, and generate some kind of definition to be used during runtime. Minimal information might include some kind of seed information like where to find your classes or perhaps what filters to use when inspecting classes. Sometimes, these tools might make use of special interfaces (also called interface injection) to understand that their purpose is to describe the various dependencies of the class implementing said interface. Another approach might be to utilize special annotations on classes and class methods that describe the various required and optional dependencies and how they are to be injected.

The same techniques employed in this manual approach could also be put to use in an automatic approach. In automatic approach, imagine this same command line tool from the manual approach was now a service of the application itself. While in development mode, it would run as often as need be in order to determine if code changes have happened. If they have, the service would regenerate the dependency definition file so that the rest of the application can utilize the dependency definition inside the DI container available to the application during runtime.

There are a couple of concerns that are specific to PHP with regards to dependency management. Since PHP is a share-nothing architecture with no application level memory, this definition would need to be loaded and parsed and put into memory on each request. The larger the dependency tree that you track, the larger the memory footprint of the dependency definition graph. Furthermore, since this definition has to be loaded on each request, if it is in a non-native format (meaning anything other than PHP code), there are certain costs with converting this format, be it XML, YAML, JSON, or INI to the in-memory structure that the dependency management container requires. What’s more, the PHP platform does not keep track of file changes. So without some kind of user-land tracking, it is hard to know what files during development have changed. Thus, your dependency management system, if it’s taking an automatic approach, would have to rescan the filesystem for changes upon each request during development – which has its own consequences.

Deployment Time

When one is done writing code and is ready to push this application into production, the act of pushing this application is called deployment. The mode for this application is now considered “production”. In production, you can be sure that the structure of your code is stable and will not change, thus your dependency graph is now safe from changes too. Since this is the case, there is no longer a need to keep updating and regenerating this dependency definition file like you were during development.

Even though the definition is no longer changing, there still is the concern about how expensive it is to load this definition each request. Naturally, the cheapest form of definition would be a PHP array or structure describing the definition that can then be loaded in-memory. Other file types like XML, YAML, JSON, etc first have to go through a parsing phase before they can be used. This activity of parsing these files could be expensive, and could benefit from some kind of caching. Caching the definition in some way shape or form, would ensure there is minimal overhead per-request when the application is using this dependency management container.

Other Observations & Criticisms

It is important to realize that dependency management solutions in and of themselves are, in all the available words, full frameworks. They require that you understand both their philosophy as well have a minimal understanding of what facilities they are offering in order to use them effectively. To understand the true benefits of any framework one must first know the pain points the framework is attempting to solve. Seeing the end result of a framework without knowing what it is facilitating might lead to one to dismiss it as overkill or unintuitive. For example, take the following code (typical of dependency management systems)

$userRepository = $dic->get('UserRepository');

If you encounter this line of code without fully understanding the dependency injection container being used, you wouldn’t be able to appreciate its usefulness. You could instantiate your Application\Model\UserRepository yourself, sure, but you’d also have to locate and inject the database adapter to use and into that you’d have to inject and load the configuration for that database connection. If you are doing this in multiple controller actions, there is a lot of repeated boilerplate code that is required to “wire” the UserRepository object. Internally, the DiC object is loading and consulting a definition, creating objects, injecting those objects, and returning the requested object that has been fully wired and ready to use.

The above code also demonstrates two common criticism of dependency management frameworks, which is also a criticism of frameworks in general. By using this framework, you are moving further away from the facilities of the language or platform itself. Instead of using the “new” keyword to create a new object, you’ve asked another object to create this requested object for you. What this has done has shifted developers away from utilizing the language’s well understood API and onto the framework’s API. Additionally, this kind of code is not easily understood by IDE’s. While special features could be added to the IDE to support this framework, it does not inherently know what kind of object is being returned by the $dic->get(..) method call.

Summary

While dependency management frameworks have clear drop-in benefits, there exist a few considerations that have unknown or unexplored consequences. For example, if the benefit is such that all dependencies are managed, and all a developer has to do is configure it, does that encourage deeper object graphs when creating classes and class dependencies? If so, what is the performance impact of these deep object graphs, particularly on the PHP platform. What are the memory implications of such object graphs, what are the speed implications of them? Furthermore, if one needed to debug an object that has been generated by a dependency management framework, is that easily possible?

At the end of the day, whether or not to use a dependency management framework is a matter of cost versus benefit. In order to be able to make an informed decision, a developer should consider a few scenarios. First, one should know what code might look like with and without this new framework. This will give an indication of the cost/benefit at the code level, does it actually save lines of code, and developer headaches? Secondly, one should consider how much added knowledge a developer or a team of developers need in order to understand this framework. Lastly, one should consider what kind of performance impact implementing this new framework has on the application’s throughput.

PHP Component and Library API Design Overview

January 18th, 2011 by Ralph Schindler

There’s been lots of change in the PHP community over the past few years. PHP now has namespaces. More PHP developers are using an IDE. More PHP developers are pulling inspiration from the Java, C#/.NET, and Ruby communities. And even more PHP developers are embracing the object-oriented and, ironically, the functional nature (closures) of PHP. All these changes make for interesting code. What has also happened is that better and more readable code is being produced by this ever growing PHP community. It’s been a long time since “PHP application” meant a series of transaction scripts as a mix of SQL, CSS, JS, with some PHP sprinkled in, and a couple of few classes for good measure. Of course, that still exists, but you no longer need to go to the ends of the earth to find non-spaghetti code that is understandable within a few minutes.

For the most part, all of these changes are good changes. The number of good/senior/expert level PHP developers is ever increasing and there are more and more “enterprise grade” frameworks and libraries that are being produced. That said, with all of these new changes, the one area which is still fairly inconsistent from project to project is the naming conventions that are employed inside PHP 5.3 project that utilize namespaces. This article will attempt to describe what an API is, how names and object-oriented features affect an API, and how various decisions affect the consumers of a particular API is.

What Is An API?

Before we jump into naming, it’s important to have a common understanding of the actual problem area. When we talk about names, we are really talking about the API. An API is a particular set of rules and specifications that a developer can follow to access and make use of the services and resources provided by another particular software program, component or library. Put another way, it is an interface between various software pieces and facilitates their interaction, similar to the way the user interface facilitates interaction between humans and computers.

For PHP 4 / procedural based libraries, the API is defined by the functions that are declared for usage in that library. It is further described by the global names and global state that the library utilizes to do its job. Typically, API’s based on purely function based libraries are far simpler to understand.

Object-oriented API’s are a bit more complex. When you build an object-oriented library or component, you are typically designing two API’s at the same time, whether or not you know it. This is the nature of object-oriented languages when you employ the use of abstract classes and interfaces in your design.

The first API, the more common of the two, I call the Consumption API. This is the API that answers the question: “how do people consume this thing.” The answer to this question is generally situated around the great majority of use cases that were identified by the author of the software component/library. In PHP, consumption might look like this:

$foo = new SomeCompany\FooComponent\FooComponent($options);
$foo->setAdapter(new SomeCompany\FooComponent\Adapter\SomeAdapter($adapterOptions));
$interestingResult = $foo->doSomethingInteresting();

As you can see, no declarative code was required to fulfill the most common use case that was identified as a need for this component’s existence. The above API is defined by the totality of all the public (concrete) classes, their public properties and public methods. By examining these elements, a good API design should allow a developer to deduce how the component works without examining any documentation. When that is possible, the API has become the documentation as well as the “story” behind how the component/library is to be used.

Not all use cases are accounted for in generic components and generic libraries. As developers, we attempt to create generic libraries and components that will solve the majority of problems of the majority of the community. We cannot envision all use cases or even edge cases behind a particular component. That said though doesn’t means that the outlying use cases are unimportant or should be unaccounted for. These use cases are handled by the secondary API: the “Extension API”.

The Extension API answers the question: “since this component does 90% of what I want, how can I extend it to fulfill the last few of my needs.” Clearly, it makes sense to leverage tools that do most of what you need especially if they can be extended in ways that are outside of the out-of-the-box feature-set. Object-oriented/class based code is particularly well suited to extension through the principle of overriding polymorphism.

The primary tool behind overriding polymorphism is method overriding. For this to be possible, base types, or the types that are shipped with the component/library you are extending, will be overridden to fulfill this new behavior that is your specialized use case. Consider the following code example:

namespace MyCompany\FooComponent\Adapter; // My Component
use SomeCompany\FooComponent\Adapter\SomeAdapter; // Consumed Component

// extend the provided Component with my special use case
class MyAdapter extends SomeAdapter
{
    protected function _someWorkToBeDone()
    {
        // do something special that fulfills our use case
        return parent::_someWorkToBeDone();  // protected method on parent class
    }
}

As you can see here, we’ve extended the functionality of the base adapter from the shipped component/library with our own functionality. This is possible since the base adapter tucked away the business logic we needed to alter inside a protected method. This is what allows us to rely on overriding polymorphism to extend code to suit more specific needs. This “Extension API” can therefore be defined by the totality of all protected members of a class: methods and properties that can be utilized in child classes. These protected methods are not all that important or even useful in the documented and de-facto use cases of a component, but become extremely important when extending.

API Philosophy

It’s hard to quantify importance of any one aspect of a codebase’s API over another without first talking about the general philosophy. In the land of a 1000 frameworks and libraries, being well written and poorly written divides the great majority of them. Of what is left of the (generally regarded) well written ones, philosophy divides the rest.

There exist two common philosophical “goals” that most libraries/components generally subscribe to that, depending on your perspective, might be contradictory. For arguments sake, let’s assume that each is as important as the other. The first: “easy to use”. A component’s like-ability by developers is greatly determined by how easy something is to use, if it’s intuitive, if it’s fulfills the majority of one’s needs. The other: “easy to extend”. The majority of the time, a component is written for some well known use cases. Generally, that will suite the majority of the needs of any one developer, but there are always some unknown use cases. A components ability to be able to deliver a mostly working solution while allowing the developer to extend it for the unknown is what determined how easy it is to extend said component.

More often than not, ease of use and extensibility live at two ends of the spectrum. Things that are easy to use are generally hard to extend, and things that are simple to extend are generally harder to use. This is the case because to accommodate one usually comes at the expense of the other.

Getting back to philosophy and this example at hand, both ease of use and extensibility are both equally important. The goal, in terms of API design, is to be able to accommodate each equally and strike a balance between the two so that each goal is represented in the API.

Basic Tips And Tricks For Better APIs

The tips and tricks for building better component API’s could get fairly long, so this article will attempt to cover some of the more “basic” ideas.

Adopt A Common Namespace & Class Naming Scheme

While it is true that the PHP platform has no built-in packaging, or file based import mechanism… the PHP autoloader with the help of some common conventions can get you 99% of the way there. Large projects like Zend Framework, Symfony, PHPUnit, and PEAR have all settled on a pretty simple and common naming scheme based on the PEAR naming standards. By utilizing this naming scheme, your code will be instantly familiar to developers who already have knowledge of this scheme in other projects. The benefit here is that developers will know exactly where to find classes inside the filesystem.

namespace MyCompany\MyComponent;
class Foo {
    // will be found relative to the include_path, or some path
    // managed by an autoloader at
    // MyCompany/MyComponent/Foo.php, pretty simple eh?
}
Avoid Doing Too Much In the Constructor

There’s lots of places on the web that discuss this, so I’ll link to them here and not go into too much detail. I’ve seen it called a “unified constructor”, but that’s not what we are talking about here, or at least, that is not the goal. The goal is to allow the consumer to give as much or as little information about the identity of the object at instantiation time. The common signature that I like for this is the following:

class Foo
{
    public function __construct($options = null)
    {
        if (is_array($options)) {
            $this->setOptions($options);
        } elseif (is_string($options)) {
            $this->setValueThatIsDocumentAndWellKnown($options);
        }
    }
}

Generally, the call to setOptions() will in turn call various setters if they exist. What is important is that at construction/instantiation time a consumer is not required to fulfill all of the classes requirements. Why is this important? It reverses order in which dependencies are required to be interacted with. Lets examine this in code:

// Example 1
// assuming: class Foo { __construct(A $a, B $b, C $c) {} }
$a = new A($aOption1, $aOption2);
$b = new B();
$c = new C($cOption, $a);

$foo = new Foo($a, $b, $c); // and finally
$foo->doSomethingInteresting();

/** OR ALTERNATIVELY **/

// Example 2
// assuming: class Foo { __construct($options = null) {} }
$foo = new Foo(array(
    'a' => ($a = new A($aOption1, $aOption2)),
    'b' => new B(),
    'c' => new C($cOption, $a)
    ));
$foo->doSomethingInteresting();

// Example 3
// or better:
$foo = new Foo();
$a = new A($aOption1, $aOption2);
$foo->setA($a)
    ->setB(new B())
    ->setC(new C($cOption, $a));
$foo->doSomethingInteresting();

The difference is that in Example 1, even though our target use case is handled by class Foo, we are forced to interact with the dependencies first. Conversely, examples 2 and 3 show that our target object Foo is created up front, and dependencies are handled after instantiation. If code clarity is a goal, reading the code top down in example 2 and 3 makes more sense than in example 1 since the API has allowed the developer to code his use case in a top-down or story-like code block. Why do I like this pattern of usage? Simple: it highlights PHP’s loose nature and flexibility in it’s use case… but mostly because it’s more readable.

Avoid final And private

This one speaks to extensibility. Unless you are attempting to restrict a user from utilizing some kind of use case, there is little gain in marking members as final or private. Sooner or later, someone somewhere will need to override a method you’ve implemented for some obscure use case. A better approach is to provide them with a codebase that will meet most of their needs and can be extended to fulfill the rest if they are outside the original scope. That way, they are not forced to patch your codebase.

Summary

This is by far not an exhaustive list. As more of the larger projects move to using namespaces, closures and the other PHP 5.3 features, we’ll start to see a few more best-practices emerge as they relate to API design. In the mean time, this overview will serve as a springboard for a few discussions on API design moving forward with ZF2 and PHP 5.3 component development that is currently on-going.

Exception Best Practices in PHP 5.3

September 15th, 2010 by Ralph Schindler

Every new feature added to the PHP runtime creates an exponential number of ways developers can use and abuse that new feature-set. However, it’s not until developers have had that chance that some agreed-upon good usage and bad usage cases start to emerge. Once they do emerge, we can finally start to classify them as best or worst practices.

Exception handling in PHP is not a new feature by any stretch. In this article, we’ll discuss two new features in PHP 5.3 based around exceptions. The first is nested exceptions and the second is a new set of exception types offered by the SPL extension (which is now a core extension of the PHP runtime). Both of these new features have found their way into the book of best best practices and deserve to be examined in detail.

Special note: some of these features have existed in PHP < 5.3 or are at least capable of being implemented in PHP < 5.3. When this article mentions PHP 5.3, it is not in the strictest sense of the PHP runtime. Instead, it is meant that code bases and projects that are adopting PHP 5.3 as a minimum version but also all of the best practices that have emerged in this new phase of development. This phase of development highlighted by the “2.0″ efforts of projects like Zend Framework, Symfony, Doctrine and PEAR to name a select few.

Background

Previously in PHP 5.2, there was a single exception class Exception. Generally, speaking from a Zend Framework / PEAR coding standard perspective, this exception class became the root for all exceptions that might be thrown from within your library. For example, if you created a library for your company MyCompany, then you would, according to ZF/PEAR standards, have prefixed all code with MyCompany_. For this library, you might create a base exception for your library code: MyCompany_Exception, which extends the PHP class Exception and from which all your components might inherit, subclass, and throw. So, if you created a component MyCompany_Foo, it might have a base exception class called MyCompany_Foo_Exception that is expected to be thrown from within the MyCompany_Foo component. These exceptions can be caught by attempting to catch MyCompany_Foo_Exception, MyCompany_Exception, or simply Exception. This would allow 3 levels of granularity (or more depending on how many times the MyCompany_Foo_Exception was subclassed) to consumers of this component in this particular library, and handle that exception in a way they deem fit.

New Feature: Nesting

In PHP 5.3, the base exception class now handles nesting. What is nesting? Nesting is the ability to catch a particular exception, create a new exception object to be thrown with a reference to the original exception. This then allows the caller access to both the exception thrown from within the consumed library of the more well known type, but also access to the exception that originated this exceptional behavior as well.

Why is this useful? Typically, this is most useful in code that consumes other code that throws exceptions of its own type. This might be code that utilizes the adapter pattern to wrap 3rd party code to deliver some kind of adaptable functionality, or simply code that utilizes some exception throwing PHP extension.

For example, in the component Zend_Db, it uses the adapter pattern to wrap specific PHP extensions in order to create a database abstraction layer. In one adapter, Zend_Db wraps PDO, and PDO throws its own exception PDOException, Zend_Db needs to catch these PDO specific exceptions and re-throw them as the expected and known type of Zend_Db_Exception. This gives developers the assurance that Zend_Db will always throw exceptions of type Zend_Db_Exception (so it can be caught), but they will also have access to the original PDOException that was thrown in case it is needed.

The following is an example of how a fictitious database adapter might implement nested exceptions:


class MyCompany_Database
{
    /**
     * @var PDO object setup during construction
     */
    protected $_pdoResource = null;

    /**
     * @throws MyCompany_Database_Exception
     * @return int
     */
    public function executeQuery($sql)
    {
        try {
            $numRows = $this->_pdoResource->exec($sql);
        } catch (PDOException $e) {
            throw new MyCompany_Database_Exception('Query was unexecutable', null, $e);
        }
        return $numRows;
    }

}

To utilize a nested exception, you would call the getPrevious() method of the caught exception:


// $sql and $connectionParameters assumed
try {
    $db = new MyCompany_Database('PDO', $connectionParams);
    $db->executeQuery($sql);
} catch (MyCompany_Database_Exception $e) {
    echo 'General Error: ' . $e->getMessage() . "\n";
    $pdoException = $e->getPrevious();
    echo 'PDO Specific error: ' . $pdoException->getMessage() . "\n";
}

Most recent PHP extensions have OO interfaces. As such, those API’s tend to lean on throwing exceptions instead of raising errors. A short list of exception throwing extensions in PHP include PDO, DOM, Mysqli, Phar, Soap and SQLite.

New Feature: New Core Exception Types

Also in PHP 5.3 development we are shining a light on some new and interesting Exception types. These exceptions have been in place since the PHP 5.2.x, but it has not been till recently and the “re-evaluation” exception best practices that they are now gaining some limelight. They are implemented in the SPL extension and are listed on the manual pages located here. Since these new exception types are part of core PHP as part of SPL, they can be used by anyone who targets PHP 5.3 as the minimum runtime for their code. While this might seem less important for when writing application layer code, the way we adopt and use these new exception types becomes even more important when we are writing and consuming library code.

So why new exception types in general? Previously, developers attempted to give more meaning to their exceptions by putting more information into the message of the exception. While this is good, it has a few drawbacks. One is that you cannot catch an exception based on a message. This can be a problem if you know a set of code is throwing the same exception type with various message for various exceptional conditions that can be handled differently. For example, an authentication class that during $auth->authenticate(); it throws the same type of exception (let’s assume Exception), but with different messages for two specific failures: a failure where the authentication server cannot be reached and the same exception type but different message for a failed authentication attempt. In this case (nevermind that using Exceptions might not be the best way to handle authentication responses), it would require string parsing the message to handle those two scenarios differently.

The solution to this is clearly some way to codify exceptions so that they can be easily interrogated when trying to discern how to react to this exceptional situation. The first response libraries have had is to use the $code property of the Exception base class. The other is to create multiple types, or new exception classes, that can be thrown to describe the behavior. Both of these approaches have the same simple drawback. Neither has emerged as a best practice and as such, neither is considered a standard, thus each project attempting to replicate this solution might do so with small variations that force the consumer to go back to the documentation to understand the library specific solution that was created. Now with the new types approach in the SPL, otherwise known as the Standard PHP Library; developers can utilize these new types in the same way in their projects and the projects they are consuming since a best practice for these new types has emerged.

The second drawback of the detailed message approach is that it makes understanding the exceptional situation harder for non-english or limited-english speaking developers. This might slow down some developers when trying to decipher what an exception message is trying to convey. As many developers as there are writing exceptions, there are equally as many variations in how they will describe that situation in the message since there is no standard for conformity or for codification.

So How Do I Use Them, Give Me The Dirty Details?

There are a total of 13 new exceptions in the SPL. Two of them can be considered “base” types: LogicException and RuntimeException; both extend the PHP Exception class. The remainder of the methods can thusly be broken down into three logical groups: the dynamic call group, the logic group and the runtime group.

The dynamic call group contains the exceptions BadFunctionCallException and BadMethodCallException. BadMethodCallException is a subclass of BadFunctionCallException which in turn is a subclass of LogicException. That means that these exceptions can be caught by either their direct type, LogicException, or simply Exception. When do you use these? Generally, these should be used when an exceptional situation arises as a result of an unresolvable __call() during a method or when a callback cannot find a valid function to call (or better put, when something is not is_callable()).

For example:


// OO variant
class Foo
{
    public function __call($method, $args)
    {
        switch ($method) {
            case 'doBar': /* ... */ break;
            default:
                throw new BadMethodCallException('Method ' . $method . ' is not callable by this object');
        }
    }

}

// procedural variant
function foo($bar, $baz) {
    $func = 'do' . $baz;
    if (!is_callable($func)) {
        throw new BadFunctionCallException('Function ' . $func . ' is not callable');
    }
}

While the direct example is inside __call and anywhere near something that will call_user_func(), this group of exceptions are also useful when developing any kind of API where dynamic method call and function call lookups are utilized. An example of this would be a SOAP or XML-RPC client/server who is capable of issuing and/or interpreting method requests.

The second group is the logic group. This group consists of DomainException, InvalidArgumentException, LengthException, and OutOfRangeException. These exceptions are a subclass of LogicException which is in turn a subclass of the PHP Exception class. You use these exceptions when there is an exceptional situation that arises from either a mutation of state or as a result of bad method or function parameters. To get a better understanding of this, we will first look at the last group of exceptions.

The final group is the runtime group. It consists of OutOfBoundsException, OverflowException, RangeException, UnderflowException, and UnexpectedValueException. These exceptions are a subclass of RuntimeException which is in turn a subclass of the PHP Exception class. These exception should be used when an exceptional situation arises during the “runtime” of a function or method call.

How do these logic group and runtime group work together? If you look at the anatomy of an object, one of two things is generally happening. First, the object will be tracking and mutating state. This means the object is generally not doing anything (yet); it might have configuration passed to it; it might be setting up properties (via setters and getters); or, it might be getting references to other objects. Second, when the object is not tracking and mutating state, it is operating – doing what it was designed to do. This is the object’s runtime. For instance, during the objects lifetime, it might be created, passed a configure object, then it might have setFoo($foo), setBar($bar) called. During these times any kind of LogicException should be raised. In addition, when the object is asked to do something, with parameters, for example $object->doSomething($someVariation); during the first few lines when it interrogates that $someVariation variable, it would throw a LogicException. After it is done interrogating $someVariation, and it goes on about doing its job of doSomething(), this is considered its “runtime” and in this code it would throw RuntimeExcpetions.

To better understand, we’ll look at this concept in code:


class Foo
{
    protected $number = 0;
    protected $bar = null;

    public function __construct($options)
    {
        /** this area throws LogicException types **/
    }

    public function setNumber($number)
    {
        /** this method throws LogicException types **/
    }

    public function setBar(Bar $bar)
    {
        /** this method throws LogicException types **/
    }

    public function doSomething($differentNumber)
    {
        if ($differentNumber != $expectedCondition) {
            /** this area throws LogicException types **/
        }

        /**
         * From here on down, this method throws
         * RuntimeException types
         */
    }

}

Now that this concept is understood, what does this do for a consumer of this code base? The caller can be sure that anytime they are mutating the state of an object, they can catch exceptions with the most specific type, for example InvalidArgumentException or LengthException, and at least LogicException. By having this level of granularity, and multiple types involved, they can catch the exception minimally with LogicException, but also get greater understanding of what when wrong via the actual type of the exception. This same concept applies for the Runtime group of exceptions as well, more specific types can be thrown and either the specific or the less specific type will be caught. This offers a greater deal of knowledge about the situation and granularity of control to the caller.

Below is a table of the information you might find of interest concerning these SPL exceptions

Best Practices In Library Code

Since the advent of these new exception types in PHP 5.3, a new best practice for library code has also emerged. While it is most beneficial to get a standard specialized exception type like InvalidArgumentException or RuntimeException, it would also be useful to catch component level exceptions. You can read a more in-depth discussion of the concepts on the ZF2 wiki or the PEAR2 wiki.

The long and short of this, in addition to the best practices listed above, is that there should be a component level type that can be caught for any exception that emanates. This is accomplished by using what is known as a Marker Interface. By creating a component level marker interface, real exception types inside a given component can extends the SPL exception types and be caught by any number of class types at runtime. Let’s examine the following code:


// usage of bracket syntax for brevity
namespace MyCompany\Component {

    interface Exception
    {}

    class UnexpectedValueException
        extends \UnexpectedValueException
        implements Exception
    {}

    class Component
    {
        public static function doSomething()
        {
            if ($somethingExceptionalHappens) {
                throw new UnexpectedValueException('Something bad happened');
            }
        }
    }

}

Assuming the above code, if one were to execute MyCompany\Component\Component::doSomething(), the exception that is emitted from the doSomething() method can be caught by any of the following types: PHP’s Exception, SPL’s UnexpectedValueException, SPL’s RuntimeException the component’s MyCompany\Component\UnexpectedValueException, or the component’s MyCompany\Component\Exception. This affords the caller any number of opportunities to catch an exception that emanates from a given component within your library. Furthermore, by analyzing the types that make up the exception, more semantic meaning can be given to the exceptional situation that just occurred.

Summary

In summary, this article should help guide you in creating and throwing more meaningful exceptions in a standards based and best practices way by negating the emphasis of the exception message and putting more emphasis on the exception type. If you’d like to carry on the discussion of these concepts feel free to comment here, on the PHP documentation pages, or in the ZF2 wiki comments section for the Exception proposal linked above.

PHPundamentals Series: A Background on Statics (Part 1 on Statics)

May 6th, 2010 by Ralph Schindler

Just beyond reading the title, you’ve more than likely come to this article as the curious yet uninformed, the mad and raving lunatic, or as an enlightened one. Static class members (from here on called simply, “statics”) in PHP conjure both the best and worst in developers for a variety of reasons. In part 1 of this series of articles on statics, we’ll explore some background to get a better understanding of statics in PHP.

Some Static Background And Understanding

Before we can move into the arguments that surround statics, we first need to understand what they are in the context of PHP.  The core of the PHP language and runtime can draw some pretty big corollaries from the Java/JVM and C#/.NET language platforms. The biggest, and most important for the purposes of this article, is PHP’s object model. Like Java and .NET, PHP follows a class-based, single-inheritance, multiple-interface model- a tenet described by the grandfather of OO languages: smalltalk. Of course, PHP applies its own “perspective” when it comes to the actual implementation details in that of typing, casting, mixed-paradigm usage, and so on; but the foundation for the object model is clearly defined.

That said, it is easy for the PHP community to draw comparisons and, more importantly, “borrow” best practices from both the Java and .NET communities. We certainly have borrowed our fair share with regards to development time tools, infrastructure tools and design patterns. Over the past 5 to 7 years, there has been an increasing adoption of best practices and patterns from the enterprise Java community, particularly in the form of two major texts: GoF and PoEAA. The GoF (Gang of Four) text primarily discusses best practices in the form of code structure and reuse: factory, singleton, adapter, composite, facade, iterator and observer to name a few. PoEAA (Patterns of Enterprise Application Architecture), on the other hand, attempts to solve higher order problems, particularly architectural problems at the application layer: MVC, Page Controller, Front Controller, Domain Model, Table and Row Gateway, and so on. While the examples are primarily executed in Java, they are structurally similar when implemented in PHP, so much so that PHP developers can read the Java examples as pseudo-code. This is what makes these patterns so applicable and thus popular in the PHP community.

Since we now know where these usage patterns originated, we should have a look at the target language platform: PHP. The key concept which delineates the PHP platform from the JVM and .NET platforms, is that PHP by default assumes a shared-nothing architecture. What does this mean? It means out of the box, PHP is not a persistent application platform. PHP’s runtime is built around the notion of primarily solving the web problem. In turn, since the web is request driven, you might say that an application written in PHP is also request driven. Put another way, the scope of your application is bound to a single request. The shared-nothing aspect means that the state of the application is built-up and torn-down upon the start and completion of each request to your application. Conversely, Java and .NET offer a persistent application stack which means the application’s state exists separate from the requests that come in via the web server. So, in PHP, the many requests each contain a single running instance of your application. In Java/.NET, the single application running handles the many requests.

Statics in Analogies

Still don’t get it? Let’s talk in a couple of analogies. Let’s assume we’ve built a basic application with the “out-of-the-box” technologies offered; one built on top of PHP and the other built on top of Java (or .NET, you can choose.) With your Java/.NET application, if a request is never received from your web server, the application is indeed still running. In PHP, if a request is never received from your web server, the application has NEVER run. The runtime of a Java/.NET application might be hours or days, whereas the runtime of a PHP application is a long as it takes to service the request. This analogy’s mileage may vary, and it is surely intended for demonstrative purposes. You could inject any number of monkey wrenches into it, but for all intents and purposes- it’s correct and it works.

Understanding the full scope of an applications runtime state is the most important aspect into understanding the role of static class members in OO programming. Static class members live as long as the application runtime is valid and alive. What this means it is that any class member state that has been set during any operation during the applications runtime will persist until the application ceases to exist. Looking back at our main platform differences, we can see that in the Java/.NET platform, statics members created in the scope of an application layer will be around until someone pushes the “shutdown” button on that application. This could mean a static member or static state is persisted for hours, days, or even longer. Like these persistent application stacks, PHP will destroy any static members and state at the end of the applications lifecycle. Unlike these persistent application stacks, the application lifecycle ends with the completion of a web request. This means that static members and static state in PHP, for the average web application, sticks around for seconds or less and is only valid in the context of a single web request.

Statics in Pictures

Still don’t get it? Lets have a look at a few images to better explain these concepts.

The following images will attempt to explain the various layers of a web application, one from the perspective of the JVM/.NET platform, the other from the perspective of the PHP platform. (For all intents and purposes, the PHP platform could also be any scripting language executed by an apache module or fastcgi.)

The green layer is the web server layer, this is the process that will attach to port 80 and listen for requests. The blue layer represents the application process itself. This layer is responsible for global application state and class-based static state. The orange layer is a request which comes in from the web, this is typically what we’ve called a page request. Inside of each web request is the yellow layer, which represents the page-lifecycle. In terms of the application, this is where all of the request specific application routines happen including page startup and business logic.

Contrasted against …

The most important thing to take away from these images, particularly with respect to understanding statics, is the blue layer, or the layer that best represents the scope of globals and static members. This is the heart of what is meant by a “shared-nothing” architecture. It is this key difference that affects how we architect the code for our web applications.

In the next article in this series, we’ll have a look at PHP’s application architecture in greater detail and how it solves problems that might arise from a shared-nothing style architecture, why this architecture is arguably better for the web and cloud based services, but most importantly, how statics fit into this paradigm.

Database Abstraction Layers Must Live!

July 15th, 2009 by Ralph Schindler

I come preaching true hope, against the fallacies.

I’ve heard the arguments for and against database abstraction layers (DALs) time and time again. I must say first, I agree with them all, both sides, equally. Interestingly, I can put the vocal proponents of each side of the argument in one of two boxes: a programmer guy box, or a database guy box. For some unknown reason though, they never seem to see eye to eye.

Honestly though, I like to put myself in the middle of that argument. I see both sides. I think fine tuning an application’s core business with vendor specific features is tremendously important, after all, that is why there are so many competing database vendors. Generally speaking of database driven projects, I feel like planning to use a specific vendor up front, knowing its pro’s and con’s, and tailoring an application to the chosen database’s strengths can only help in the long run. Also, I feel that building a database model first before any code, offers many performance and scalability advantages than does code first development.

That said, I also see value in using a database as a simple data-store when the actual database is not a key component of the overall application. That’s right, it is completely valid to say that the data-storage & database component of an application sometimes is not the key component; a database guy probably will never agree with you there. Just as there are programmers who swear by this code first, database later mantra, there are database developers that will swear by the database first, code later mantra.

The fact is, each project is unique. It’s this uniqueness of projects and their execution that ultimately shapes the perspectives of developers as well as the tools they write and consume. To say that one mantra is clearly a better choice over another is simply being ignorant.

The Use Case of Abstraction Layers

To be honest, I don’t really buy the “I might switch database vendors at some point” argument either, as Jeremy Zawodny points out. For larger projects (on the scale of the facebooks, the twitters, etc), switching the database underneath after a project has been in production is a monumental task- regardless if you have an abstraction layer or not. Chances are, you used some of the database specific features, not to mention, you now have a large set of mission critical data that also has to be ported. Long story short, its never as easy as swapping the abstraction layers database adapter out.

What I will buy though, is there are some problems that fall in thicker end of the Pareto Principle that can be solved with a database abstraction layer. For the uninitiated, the Pareto Principle is effectively the 80/20 rule. In software use cases, when applying this term- the 80% use case is the majority of use cases. These use cases are generally not that interesting in terms of database interaction. To give it a label, we can call these the CRUD, BREAD, or <<insert your favorite terminology here>> operations. That is not to say that these operations are not important, but they are not special. In fact, they are so un-special, that we can just about apply a standard query syntax (SQL 92) to them, and expect that the query is both portable between databases and common across applications that wish to use them.

This is where database abstraction fits in. As a developer, you’ll come across this problem time and time again. A large portion of an application are CRUD screens and the smaller more interesting part of your application is your reporting screens. With an abstraction layer, we are able to code against both a unified API as well as have a layer that will produce consistent and vendor compatible queries. This allows us to build more specialized data access layers (patterns) for multiple database vendors with great ease. You want Table Gateway- done, you want Row Gateway- done, you want Active Record- done. Each can be implemented to tackle the 80% part of the 80/20 rule when applied to the database centric business code of an application.

The Slow Path & The Fast Path

When I talk about this 80/20 rule in terms of the applications we write, I like to further refine the terminology so that it easier to visualize. The most prominent terms that helps developers visualize the 80/20 rule in their application is the slow path of your application, and the fast path of your application. Each of these terms has a set of characteristics that set each apart from one another:

Slow Path:

  • Performance is not of primary importance
  • Has an interactive nature
  • Validation and verification of data are of high priority
  • Application to data-store interactions are fairly trivial
  • Does not comprise applications core business logic

Fast Path:

  • Performance is of importance
  • Limited interactive nature, information flow is fairly static (non-interactive)
  • Flow of information consist of already verified and validated data (originates from the databsae)
  • Application to data-store interaction can become complex (JOINs, SUB-SELECTS, VIEWS)
  • Is the core business of the application

To get a better understanding of how the terms are applied, lets look at a typical web application. Generally speaking, there are a few web based forms that users interact with. These forms are the entry point of a code path that does not get a lot of throughput. This is generally because forms are submitted by people, and people can only type and submit forms so fast. In addition to this being a less traveled code path, it also has a few checks along the way- validation of data, and verification of data. Typically, the problems of verification and validation of data are not too unique to the application being executed. In fact, the web forms, validation and verification problems have been solved over and over again by various libraries.

On the other side of the equation, there is the aggregation and merging of the stored data (which inevitably came from the aforementioned web forms.) Since the unique aggregation and processing of this data is the core aspect of business of said application, it stands to reason that this code path will be more well traveled by users. This, is the fast path. The problems solved in this code path are generally unique and since they are unique, it’s hard to find an off the shelf solution to these problems.

Since this is where the money is to be made, it also stands to reason that developers should concentrate their efforts in the fast path of their application. This means they should solve the slow path problems of their application with existing tried and tested solutions- this includes generic forms solutions, validation and verification libraries and yes, database abstraction layers.

Getting Cozy With Zend_Db, a Database Abstraction Layer

Not that we’ve made a use case for DAL’s, what would one look like? Well, I’ll use Zend Frameworks Zend_Db as my use case.

The connection code:

$dbAdapter = Zend_Db::factory(array(
    'adapter' => 'Pdo_Mysql', // could be Pdo_Sqlite, Mysqli, Pdo_Mysql, Db2, or even Oracle
    'params' => array(
        'username' => 'test_user',
        'password' => 'test_pwd',
        'dbname' => 'test'
        )
    ));

You’ll note that since this factory takes a standardized array, it makes it trivial to swap out various connection information for different adapters.

Simple queries:

$data = array(
    'name'        => 'Remember the Milk',
    'description' => '2% Milk'
    'due_on'      => '2009-07-15',
    );
$dbAdapter->insert('todo_list', $data); // insert that data

// or
$lastInsertId = $dbAdapter->lastInsertId('todo_list');
$dbAdapter->update('todo_list', array('completed' => 'YES'), 'id = ' . $lastInsertId);

$dbAdapter->delete('todo_list', 'id = ' . $lastInsertId);

Here you’ll notice the generic and abstracted nature of this API. Since there are several tasks in database interaction that are consistent across the board, those such as INSERT, UPDATE and DELETE, it makes sense that we can create a generic API for handling such interactions. These interactions (INSERT, UPDATE and DELETE) represent the mutation methods of a database and as such, represent the most predominant way of getting data into a system.

For all intents and purposes though, simple SELECTs are fairly standardized too. They are standardized enough as to compliment the INSERT, UPDATE, and DELETE abstractions so that we can find actual rows to do these mutation operations.

Now that we have a simple and consistent API for doing simple SELECTs, INSERTs, UPDATEs, and DELETEs; we can implement something a little more interesting: the table & row gateway:

Zend_Db_Table_Abstract::setDefaultAdapter($dbAdapter);
$userTable = new Zend_Db_Table('user'); // ZF 1.9 feature
$userRow = $table->find(5); // find user by id 5 (primary key);
echo $userRow->username;

Immediately, you should see the inherent value in the above example. Rudimentary and common tasks can now be handled with a consistent and simple API. But what happens when you’ve started using this DAL, and you want to use a vendor specific feature? Well..

// assuming what you want is really REPLACE or INSERT IGNORE from mysql
$dbAdapter->query('INSERT IGNORE INTO configuration (name, value) VALUES (?, ?)', array($name, $value));

// OR
$dbAdapter->query('REPLACE INTO configuration (name, value) VALUES (?, ?)', array($name, $value));

As you can see, the query method of our database adapter will allow us to pass custom SQL into the database thus taking advantage of vendor specific features.

What if you want to combine both paradigms for ultimate flexibility?


// assuming Zend_Db_Table_Row, with a FriendshipReference rule
$friendRowset = $currentUserRow->findDependentRowset('User', 'FriendshipReference');

// collect friend id's
foreach ($friendRowset as $friendRow) {
    $friendIds[] = $friendRow->related_user_id;
}

$inClause = ' IN (' . implode(',', $friendIds) . ')';

$select = $dbAdapter->select();
$select
    ->from('user', array(
        'user_id',
        'related_user_id',
        'became_friends_on'
        ))
    ->where('user_id ' . $inClause);

// interact with driver directly
$mysqli = $dbAdapter->getConnection();
$mysqli->query('CREATE TEMPORARY TABLE friend ('
        . ' `user_id` int(11) NOT NULL,'
        . ' `related_user_id` int(11) NOT NULL,'
        . ' `became_friends_on` DATE NOT NULL'
        . ' ) ENGINE=MEMORY;'
    );
$mysqli->query('INSERT INTO friend ' . (string) $select);

// query new friend view
$friendTable = new Zend_Db_Table('friend');
$rows = $friendTable->fetchAll(
    'became_friends_on > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)',
    'became_friends_on'
    );

While that above example is “a bit out there”, it does show that even with a DAL, if it’s flexible enough, you can code as close to or as far away from the database as you like. Ultimately the mantra here is: lets get the job done in the most effective, efficient and sound way possible.

Conclusions

Simply put, a database abstraction layer is just another tool in the toolbox. You don’t have to completely change your paradigm of programming, nor do you have to apply an all-or-none approach to using a DAL. When applied correctly, you can build out the slow path of your application in little to no time, while leaving extra time for developing and fine-tuning the fast path of your application. And to keep code from becoming unruly, simply apply some best-practices code organization to your project.

PHP: Environments, Libraries, and Applications – Oh My!

May 24th, 2009 by Ralph Schindler

Over the past 10 years or so, I’ve worked with many different code bases and libraries. Originally, the “libraries” were my own because in my earlier programming days, I had a bad case of “NCH” syndrome. That’s “Not Coded Here” syndrome for the uninitiated. As time had gone on, there were some solutions that I needed for a simple project and did not have the time nor the patience to develop a custom library for. That’s when I started relying on others experience and code to get me through projects.

The first “library” I remember using was px.sklar.com by David Sklar. There were some great components in there that were worth integrating into projects, but I hesitate to call it a true library though since its both a repository of both reusable components as well as complete solutions/applications. Moving on into the 21st century, a more “official” PHP library was being born; the PEAR project. The first component I really started depending on for many projects was the Spreadsheet_Excel_Writer. PEAR is not without issues of its own, but thats a topic for a separate article.

A Little History

My earliest PHP applications where fairly simple. A PHP page that would interact with a database, and render some html. Looking back at them, they all look like oodles of hacks and spaghetti code. Of course this was 1999ish, so it was OK because after all, it got the job done. As projects grew larger, so did a desire for better organization. This new wave of applications I was writing at the time was the first divergence from Model 1 applications, and came with the introduction of the second library I started using.

Smarty (which used to be part of the PHP Project), was a library I came to depend on in every project. The single greatest aspect of Smarty from a code organization standpoint was that it separated scripts into “business logic” scripts and “presentation logic” scripts. If an application was a soup of code, Smarty was the tool which divided out the presentation specific code, or what we’d call the ‘view’ in the MVC paradigm, from the business specific code, or what we’d call the controller and model in the MVC paradigm. This was the first step many took towards what is known in the JSP world as Model 2 programming.

So why this history wrapped in with a little personal experience? Well, I’d say the path I have followed is pretty typical of programmers that use scripting languages to build applications, specifically web-applications. That said, as the technologies we’ve used evolved and grown.. we tend to move towards solutions that offer a sense of best practices, better code organization, and most importantly- reduce the time to market.

What does that have to do with you? Well, I’ve seen my share of PHP centric projects come and go. In addition to those projects, I’ve kept a watchful eye on projects in other communities such as the Ruby, Perl, Java and .NET communities. From them, we’ve borrowed concepts, ideas and tools to create better solutions for the PHP community. With that, I’ll continue on with explaining several of the most common facets of any PHP project. If this seems basic at first, its actually laying the groundwork for a few more in-depth articles down the line.

What is an Environment?

In PHP, the environment is the set of resources, capabilities and settings for immediate use within the lifespan of any one php process. I know thats a very general statement, but lets explore that a bit. On most systems, you’ll find a php.ini file. This ini file generally sets values for the php process to initialize with when it starts up. Some of these can be modified by the SAPI (command line layer, apache layer, etc), while other can be modified during runtime via set_ini, and others cannot be modified at all.

Each time a script is executed, it first inherits these php.ini values. This means, by default, if none have changed, a script is subject to the rules defined by the php.ini on the system. If these values (php.ini system values) are out of your control, this means that the script running has an ambiguous initial environment. This environment might have been defined by the system administrator or by the packager of the php distribution you are using.

If you are subject to an ambiguous environment setup, there are greater the chances your application will fail upon setup or during execution. At least one of these situations has come to plague a PHP developer at one time or another:

  • display_errors might be off, causing a WTF moment when an error arises.
  • error_reporting level is set to E_STRICT and the script was not written with respect to the error_reporting including this mode, thus creating 100′s of notices.
  • open_basedir was set and your script doesn’t have access to some resources it expects to have access to.

Those are just 3 of the more popular examples stemming from 3 different keys that can be set within a php.ini. To put it in a bigger perspective: there are 100s of these values. The point that needs to be most impressed is that for any given php script or php application, it should either check the environment at script startup, or in the least provide all of the environment prerequisites and assumptions the script or application makes. The ideal solution is to supply a script that will check the environment and report at installation time if the ini values are correct.

One of the more interesting environment variables in PHP, much like other languages and systems, is the common path. In PHP, the common path is called the include_path. The include_path just might be the most important php.ini based value to any script or project. During a PHP scripts runtime, the loading of files and components are generally checked against the paths defined within the include_path. This means that any scripts or classes (effectively any PHP code) can be located and loaded with a relative path, a path that is relative to any of the paths defined in the include_path.

The include_path is a pretty powerful thing. It makes it easier to bundle components and packages into “libraries”, and use them within projects. This helps facility DRY principals by encouraging good code reuse and solid library design. On the other hand, if you don’t properly manage your libraries that are on your include_path, this could pose some pretty significant problems down the line. More on that later though.

The general rule of thumb is this: take control of the php process’s environment as much as possible to ensure consistent behavior.

What is a Library?

Its seems like library is a fairly generic term, but I want to add some specific meaning to it at least in terms of PHP. A general definition of a library would effectively be a “collection of reusable code”; and that statement is true for all intents and purposes. For the purposes of this article, I’d like to take that a little further.

A library is a collection of components. While a library solves a less specific general problem, components solve a more specific general problem. Get it yet?

For demonstration purposes, I’ll use the Zend Framework.. since I’m a little biased towards that one. The Zend Framework has a couple of libraries, the main one called the Standard Library. The ZF Standard Library solves a pretty general problem: “The PHP Application problem”. As you can see, thats a fairly general (relatively speaking) problem it attempts to solve. This library is made up of several components that solve specific problems within the “PHP Application problem.” For example, Zend_View and Zend_Controller solve the “web application structure” problem. Zend_Form solves the “web forms” problem. So on and so forth. These are problems that can be solved with tried, tested, and true solutions. These solutions can generally be considered “best practices“. They are solved so that you can get onto solving the even more specific problems… those inside the “application”.

Its worth noting that the definition of a library is also relative to the audience its targeted at. In our above example, the Zend Framework’s intended audience is all PHP developers. Your company, on the other hand, has a smaller target audience: its internal developers. Since that audience is a smaller and more concise group, their needs are more specific than those of the global developer community. That means that a company’s “library” might solve “more specific general problems” on a company wide scale. For example, a company might have 10 applications that use a single-sign-on system. Since those 10 applications within that company have the less specific problem of user sign on, that solution would be best fitted inside the company’s “library”.

In general, libraries solve problems that are generic enough for the entire intended audience, and each problem solved into a component of the “library”. Everything else goes into your “application”.

What is an Application?

As hinted above in the section on libraries, an application too is defined by the problem it attempts to solve. An application is a collection of business specific code which solves a very specific business problem. Again, this sounds generic, but it can be further defined and explained.

A business problem is the most specific problem that can be solved with code; this is the application. It will be the sum of all target environments, target audiences, and target tasks that should be solved. These business problems have a very narrow focus. While applications can be further defined into specific areas of code, the whole of the application’s object is to solve the business problem.

Depending on how complicated the business problem is that is target of the application to solve; an application might be modular. If an application is modular, that implies that the application’s problem area can be divided into even more specific areas of code with specific responsibilities. Lets take a community website for example. The site might include forums, user management, mail, calendaring and news. Each of these respective areas of the site could be considered modules of the main application or website. While this is a generic example, it does demonstrated a logical division of responsibility which is ultimately the point of introducing modules into an application. Each project and business should evaluate their application and decide upfront how granular the application’s problem is, and how best to further divide it. Doing this up front will alleviate many issues that could arise later as the code base starts to grow.

Beyond the modularity of an application, a further, more logical division and organization of code is generally applied. While there are several paradigms of application organization, we’ll focus on the MVC architecture (if you are not familiar with the MVC architecture it might be best to read the wikipedia article first before moving forward). Both an applications module and a non-modular application can be organized into Models, Views, and Controllers.. the main constituents of the MVC paradigm. Without getting to involved into what MVC is, one should know that:

  • The model represents the code base for solving the business problem at hand in a UI and environment agnostic way.
  • The controller represents the code base responsible for bridging a user’s interaction with the UI to the business model, and setting up new UI.
  • The view represents the code base responsible for creating the environment specific UI.

The above grouping of purposes is what is called as a separation of concerns.

Recap

Here is a recap of the terms defined within this article:

  • An Environment is the sum of all resources, capabilities and settings that exist in a PHP process. This generally includes what extensions and ini settings are preset for the PHP process.
  • A Library is collection of code that solves a less specific problem which is further defined by the libraries target audience and problem area.
  • A Component is a collection of code that solves a more specific problem within a library.
  • An Application is collection of code that solves a specific business problem. Ideally, applications consume libraries and components to facilitate quicker and more standardized development.
  • A Module is a collection of code that solves a more specific atomic problem of the larger business problem. The sum of all modules within an application attempt the solve the larger business problem.
  • MVC is a way to group code within both a module and application into a code base that facilitate a better separation of concerns.

PHPAustin Meetup Slides – Software Engineering In PHP

May 15th, 2009 by Ralph Schindler

On Tuesday, Josh Butts and I gave a presentation at the monthly Austin PHP Meetup titled “Software Engineering In PHP”.  Around 30 people were present and judging by the number of questions that were raised on each slide, the interest in the subject matter was fairly high.  In the end, it took around 2:15 to get through the 35 or so slides.

Read the rest of this entry »