Autoloading (Revisited)

September 19th, 2011 by Ralph Schindler

Upon the arrival of PHP 5.0, the ability to autoload classes was introduced. At the time, autoloading was such a new feature, it was hardly adopted. As such, many applications being ported from PHP4 to PHP5 still had lots of procedural code in them (code incapable of being be autoloaded) and many class files which had long ‘require_once‘ lists. It wasn’t until years later that certain best practices had emerged and the prolific usage of require_once/include_once throughout large bodies of code had started drying up. Even after autoloading had been adopted by larger more visible projects, a common patten had yet to emerge. The PEAR project had already had its one-class-per-file rule, and a class to filesystem naming convention, but this was hardly the rule at the time, and as such, there were many different patterns of autoloading strategies.

As time has passed, slowly, more and more projects had gone through re-writes and the strategy that most projects were landing on was the one that came from the PEAR group. Fast-forward to today, and we see that this standard for autoloading has agreed upon by a large number of projects and has come to be named the “PSR-0 autoloading standard”.

What We’ve Learned

After having attained a consistency (for code) in how we utilize autoloading, we’ve attempted to find the most efficient and performance optimized way of executing our autoloading strategy. Matthew Weier O’Phinney has blogged about this in the past, it’s a good read if you have not already read it. To summarize, he found the following things to be true:

  • disk based class name to filesystem location maps are the fastest lookups
  • class filesystem paths that are absolute and that do not rely on include_path are fastest
  • lightweight autoload functions that utilize class maps directly are the fastest

For more information about the above generalization, see Matthew’s blog post.

Nearly a year ago, in conjunction with his findings, Matthew also wrote a classmap generation tool. This tool produced a .classmap.php file that would reside in the directory responsible for containing class files. The general idea here is that a developer could utilize a automatic mapping based autoloader, like the PSR-0 autoloader, or, he could utilize this .classmap.php file in order to build a more performance centric strategy for his/her autoloading needs.

This approach presents developers with two primary problems. One, dot files are generally hidden on a filesystem, and as such, this means that this PHP data array is also part of a code-path that is hidden from most developers view of the codebase. This then lead to moments of confusion when something related to the location of classes goes awry. The second of the problems it that this strategy assumes that the consumer has some way of consuming the contents of this class-map file. For ZF users, they could utilize one of the shipped Zend\Loader classes that are designed to use a class-map. The problem here is not necessarily for ZF users, but that it is promoting a strategy that is more ZF specific than generic in nature.

The addition of, and swift adoption of PHP’s namespace support in PHP 5.3 has also presented us with both a platform for standardization as well as a few challenges. Traditionally, when we thought of the PEAR naming convention, we assumed that for a given class (in prefix notation) Alpha_Beta_Gamma, there would be a single mapping of this class to a single place on the filesystem, namely: some/path/Alpha/Beta/Gamma.php. This inherently presents no problems. What does present a problem is if we have another project that utilizes part of this prefix, but in a different location. Assume that you want to use part of the prefix, for example, the Alpha_Beta_ portion, with a different logical component/module/project within your organization. In this case, it might make sense that class Alpha_Beta_Gamma live in one project on disk, and that Alpha_Beta_Omega live somewhere completely different. Any number of situations could realistically present this problem, but the most apparent is that your organization wants to utilize a naming scheme that allows for MyCompany_MyDivisionWithinMyCompany_PerhapsSomeLogicalComponent_ClassName.

In any of the likely scenarios of the above, a simple mapping rule that might govern one class name to filesystem name autoloader will not work for another class that could conceivably within the same project without some kind of either autoloader filter, or filesystem munging. Either way, we can no longer make the assumption that a simple map of class name to one location on disk mapping will suffice.

More an more, we are seeing this pattern emerge, (this time with namespace):

namespace VendorName\ComponentName {

    class SomeComponentClass {

    }

}

This class is then found inside its own logical project, with its own data files, web files, or test files in a project structure that looks similar to this:

path/to/VendorName_Component/
    src/
        VendorName/
            ComponentName/
                SomeComponentClass.php
    data/
        some-data-file.txt
    tests/
        phpunit.xml
        phpunit-bootstrap.php
        VendorName/
            ComponentName/
                SomeComponentClassTest.php'
    docs/
        some-documentation-format.xml
    README.md

As you can imagine, any one vendor/organization who’s in the business of building software will more than likely have more than one project that both utilizes this kind of naming scheme and also takes advantage of this listed project structure for developing and releasing this bit of code. This being the case, unless the project is merged with other code for the purposes of a consuming project, parts of the namespace will exist in two separate parts of the filesystem … something which, a specialized autoloader will need to take into consideration.

Ideally, we should find a solution that will present class-map based autoloading in a way that is an easily identifiable code pattern, simple, expressive, works well with common development practices and takes advantages of the current day PHP platform (namespaces and autoloading facilities).

And, What I’ve Found Is This …

And, what I’ve found is that projects should present a few different options as per how they provide an “out-of-the-box” experience as it relates to autoloading. Such a solution should offer the consumer a usage story that consists of the most minimal of requirements when it comes to bootstrapping this 3rd party code. Let’s examine the following project structure (expanded from our example above):

path/to/VendorName_Component/
    src/
        VendorName/
            ComponentName/
                SomeComponentClass.php
    data/
        some-data-file.txt
    tests/
        phpunit.xml
        phpunit-bootstrap.php
        VendorName/
            ComponentName/
                SomeComponentClassTest.php'
    docs/
        some-documentation-format.xml
    autoload_classmap.php
    autoload_function.php
    autoload_register.php
    README.md

What you’ll notice is the addition of 3 autoload_*.php files. Let’s have a look at what these files provide and the reasons for their existence. First the autoload_classmap.php:

<?php
return array(
    'VendorName\Component\SomeComponentClass' => __DIR__ . '/src/VendorName/ComponentName/SomeComponentClass.php'
    /* .. other classes here .. */
);

This file provides the exact map of the classname to the location on disk that this class can be found in. This file takes advantage of PHP’s ability to have return values returned from the inclusion of a file. A simple usage story for this file might be:

<?php
// ...
$classmapAutoloader = new MyClassMapAutoloader();
$classmapAutoloader->loadClassMap(include __DIR__ . '/vendors/VendorName_Component-1.5/autoload_classmap.php');
// ...

Let’s next look at the autoload_function.php file:

<?php
return function ($class) {
    static $classmap = null;
    if ($classmap === null) {
        $classmap = include __DIR__ . '/autoload_classmap.php';
    }
    if (!isset($classmap[$class])) {
        return false;
    }
    return include_once $classmap[$class];
};

This file provides a closure based autoloader as its return value. This function can then be used by the consumer directly for injecting into their own autoloader stack/queue, or directly into the autoloader queue provided by PHP:

<?php
// ...
spl_autoload_register(include __DIR__ . '/vendors/VendorName_Component-1.5/autoload_function.php');
// ... or ...
$autoloader = new MyFancyAutoloader();
$autoloader>registerAutoloaderFunction(include __DIR__ . '/vendors/VendorName_Component-1.5/autoload_classmap.php');

Either way, the consumer is provided with a callback that is capable of being utilized, in a single line, to bootstrap this components autoloading needs.

Finally, the complete, one line solution can be found by utilizing autoload_regsiter.php directly:

<?php
// autoload_register.php
spl_autoload_register(include __DIR__ . '/autoload_function.php');

While the above is so trivial as to ask why it should be included, it does offer a single-line usage story:

<?php
// ...
require_once __DIR__ . '/vendors/VendorName_Component-1.5/autoload_register.php';

Why not do this in the first place? Well, this approach is assuming the consumer does not necessarily care about how the autoload function is loaded into PHP’s spl_autoload queue. One thing to keep in mind is that when spl_autoload_register() is called, autoloaders are placed as the end of the queue by default. This behavior can be changed by passing true as the 3rd parameter of spl_autoload_register(). This type of performance optimization might be important when you know some autoload-able code will be utilized more often than other code, and thus you want the autoloader for that code to be consulted first. Another reason for this kind of user registration is that some autoloaders might be so generic as to want to act as a fallback autoloader or a generic autoloader. For these kind of autoloaders, it is important that they always be last in the queue since they might throw an error or exception when they cannot find a class as opposed to returning false and letting other autoloader have an attempt at finding the class requested.

Conclusion

The above mentioned strategy is something to be considered if you are creating reusable PHP components that you wish provide perhaps as Pyrus packages and/or as PHP phar archives for 3rd party consumption. This autoloading strategy provides an out-of-the-box usability experience in minimal amount of code. It also plays nice with other autoloaders, provides a solution that is opcode cacheable, and since it utilizes absolute paths (via __DIR__) – minimizes the amount of stat() calls to the filesystem your application will generate during its runtime.

Learning About Dependency Injection and PHP

May 18th, 2011 by Ralph Schindler

Over the past few years, there are a few concepts and programming patterns that have muscled their way into the hearts and minds of PHP developers from other languages and programming communities. These concepts range from the MVC application architecture as well as various modeling techniques (think ActiveRecord and Data Mapper), to a pure shift in the way we think about application architectures, like aspect-oriented programming (AoP) and event-driven programming. Perhaps it’s because PHP has been adopted at an enterprise level thus increasing the demand for what developers might call enterprise quality programming patterns, or perhaps it’s simply because of PHP’s ever evolving object model that makes new things possible. After all, who doesn’t like new shiny things? Whatever the reason, one of the newest concepts (at least over the past 3 years or so) that has emerged as one of our heated topics of debate is how to manage object dependencies. Interestingly, the argument of how to manage dependencies is generally named by the solution which its proponents give as the solution: dependency injection (the abstract principle is actually called Inversion of control).

In any circle of developers that are of the object-oriented persuasion, you’ll never hear an argument that dependency injection itself, is bad. In these circles, it is generally accepted that injecting dependencies is the best way to go. Injecting object dependencies in PHP looks like this:


// construction injection
$dependency = new MyRequiredDependency;
$consumer = new ThingThatRequiresMyDependency($dependency);

That’s basically it. There are many variations of this: setter injection, interface injection, call time injection, in addition to the above mentioned constructor injection. These are all valid ways of injecting the dependencies into the consuming object. Ultimately, the goal here is to avoid this:


class ThingThatHasAnExternalDependency
{
    public function __construct() {
        $this->dependency = new ARequiredDependency;
        // or
        $this->secondDependency = ARequiredDependency::getInstance();
    }
}

The above code is an example of a violation of the Hollywood Principle, which basically states: “Don’t call us, we’ll call you.”.

Yet, this is not the heart of the argument. Perhaps it was 4-5 years ago in the PHP community, but it’s not anymore. The heart of the argument is not should we be doing it, but how do we go about doing it.

This article is not about the intricacies and implementation details of DI containers and DI frameworks. It’s also not about the various ways and means of injecting dependencies into other objects, or which method might be better. In fact, this article has no opinion if injecting dependencies is even good for you or your application. This article is an exploration how adopting any DI framework for PHP affects the lifecycle of a project, both the code as well as the developer, team or organization that is constructing it.

A Brief History of Dependency Management In PHP

It is important to know why PHP is as popular as it is, after all, it’s this popularity that DI Frameworks fight against for adoption inside a PHP application framework. To understand PHP’s popularity, history, and evolution, let’s look at this code:

// these 6 lines actually represent 5 different web centric "langauges"!
include_once 'includes/config.php'; // ultimately there is a mysql_connect() call in here somewhere
include_once 'templates/header.php';
$rows = mysql_query('SELECT * FROM users'); // magically uses the mysql_connect() resource
foreach ($rows as $row) {
    echo '<div class="user-row"><a href="/delete-user.php" onclick="someJSFunction();">' . $row['username'] . '</div>';
}
include_once 'templates/footer.php';

From the beginning, we’ve been trained into thinking that our dependencies are magically managed. As you can see above, the mysql_query() function, while it will accept a connection resource, does not require it. In fact, if it’s not supplied, it will use the first open mysql connection it can find inside the PHP runtime. Assuming that the above mentioned delete-user.php script is part of a larger collection of PHP scripts, which we will call “the application” … it is important to note that even this script itself is pulling in its dependencies instead of them being injected. For all intents and purposes, the config.php, header.php and footer.php are all dependencies of this script, much like other scripts similar in nature to this delete-user.php. To sum it up, if there is a new dependency that is now required by the business logic portion of this application (ie: the lines between the header and footer), they now have to be introduced to all scripts in this application. This does not exactly adhere to the DRY principle.

But, let’s take a step back and look at this snippet of code from the organizational perspective. To do this, we must first understand the various phases of the code’s lifecycle within any organization. For the purposes of this example, let’s assume that from idea to production, code will go through the following phases: development, build, deployment, to application start-up (in production). If this were a C/C++ or Java project, code will have been written (developed), it will have been compiled (built), then it would have been packaged or some deployment tool’s process invoked (deployed); it them would have been run (executed via some startup script, or executing a binary.) PHP, and Perl at the time, achieved all of the same objectives but in fewer steps making it a wildly popular platform for highly iterative web projects. This same application in PHP would have been coded in some text editor (developed), and FTP’d up to a production server (deployed). You’ll notice that it neither had to be built/compiled, or started on the server since the target, Apache, was already running with PHP embedded into it. For all intents and purposes, a cheap and easy FTP tool was both the build and deployment tool for this application’s lifecycle.

It was this simplicity that made PHP the popular choice for web applications. This popularity was attained because the simplicity of the PHP platform allowed for two extremely important facets of development to emerge: the idea of building an application became approachable to even the novice individual, and without all the cruft that came along with the application lifecycle, building and deploying applications in PHP increased PHP’s “fun-ness” factor.

While this style of building applications allowed for a proliferation of PHP applications to be developed, there was in fact a negative side to be revealed later in time. As applications quickly grew, their ability to be maintained decreased. We give them the name “Spaghetti code”, and for all the right reasons. Objects, if they were even being used, were generally wrappers around procedural functionality. So object dependency management wasn’t even a consideration for most developers. Looking back, perhaps it was this original simplicity that allowed developers to create applications without even having to know what a dependency was or how to find it. In any case, as these applications grew uncontrollably, maintaining them and hacking them started to lose the PHP fun factor exponentially.

A Brief History of DI Frameworks

As PHP developers started identifying the problems with their Model 1 applications, they started looking for solutions in other programming communities. At this time, the Java community was still heavily rooted in the enterprise/software development/software engineering world, and problems such as dependency management already had some interesting solutions. Most notably, there was the Spring Framework, who’s primary facility for dependency management was a component called the IoC Container, or the Inversion of Control container. This container managed the fully lifecycle of object creation using callbacks. This meant that you no longer has to use the “new” keyword (the same new keyword in PHP). Also, it wired the dependencies for you at instantiation time. This meant that you no longer had to concern yourself with how dependencies were injection; be it through the constructor, properties or setter methods. The Spring Framework was one of the first frameworks that encouraged the use of definition files to manage the knowledge required to wire all your dependencies together. True to form in the Java community, these definition files were created in XML.

As it might seem, this is indeed a deviation from the PHP philosophy that had made PHP so popular. PHP allowed you to write the most minimal amount of code to complete your application. In the Java/DI world, particularly with the Spring framework, you had a much richer application lifecycle. Not only were you developing code for your appliation, but you were creating code about code to manage code. This is known as meta-programming. In addition to this meta-programming that was going on, you also now had this compilation phase required by the Java platform which was generally tucked away inside your build time tasks. Moreover, this application had to be deployed (there were generally tools for this too), and (for good measure), due to the platform, your application had to be started. Needless to say, this application lifecycle might seem heavier, for lack of a better term, to the average PHP developer.

Since then, several frameworks have cropped up that sport some kind of dependency management. Before this technique was picked up in PHP, they were all heavily rooted in the Java and .NET communities. A quick google search will return a few notable names like PicoContainer, Spring.NET, Unity, Butterfly and google-guice to name a few. These frameworks attain popularity since they attempt to ease some of the burdens that DI places upon the developer whether it be by using reflection to create definitions, or even adding an annotation system so that DI definitions can be written inside the code they are set to manage.

DI and PHP

To understand the attainability of having a dependency management framework for PHP, one should first understand how the counterparts in Java and .NET rely upon their respective platforms to do certain jobs. For a quick reference, see the images from this blog post. One of the more important facets to remember is that the expected application lifecycle of a Java/.NET application is much richer. You are expected to have build-time tasks. You are expected to have deployment tasks. And, generally, your application understand the difference between being in development, staging and production – so it can adjust how it runs accordingly. Moreover, the platform itself has facilities in place that aid the developer both in development time with code generation as well as in production.

PHP never expects or facilitates the usage of any kind of build-time tasks. PHP also does not have any kind of built-in annotation support (a meta-programming technique), nor does it have any kind of application scope or per-application memory space. What does this mean for someone who is creating a DI container? Let’s explore.

Development Time

General speaking, any time you are writing, altering or just shifting code around, you are in development mode, your application should be running in a development environment. The structure of your application’s classes, functions and files within the filesystem is probably changing with each time you click save. Dependency management systems require knowledge of your code in order to effectively do their job. This knowledge generally comes in the form of some kind of definition.

This definition can be created by hand, by the developer, generated at runtime by some application hooks, or generated with the use of a special tool. If this is done by hand, a developer is required to explicitly map the various functions/methods that will need to be called in order to inject a particular object dependency. The more dependencies you have, the more verbose this definition might become.

A better route would be to generate this definition file, after all, the code you’ve written, if written correctly will self-describe its dependencies. There are two options for generation, manual and automatic. An example of manual generation would be a developer giving a command line tool the minimal information it needs to be able to go parse your code, figure out the dependency map for itself, and generate some kind of definition to be used during runtime. Minimal information might include some kind of seed information like where to find your classes or perhaps what filters to use when inspecting classes. Sometimes, these tools might make use of special interfaces (also called interface injection) to understand that their purpose is to describe the various dependencies of the class implementing said interface. Another approach might be to utilize special annotations on classes and class methods that describe the various required and optional dependencies and how they are to be injected.

The same techniques employed in this manual approach could also be put to use in an automatic approach. In automatic approach, imagine this same command line tool from the manual approach was now a service of the application itself. While in development mode, it would run as often as need be in order to determine if code changes have happened. If they have, the service would regenerate the dependency definition file so that the rest of the application can utilize the dependency definition inside the DI container available to the application during runtime.

There are a couple of concerns that are specific to PHP with regards to dependency management. Since PHP is a share-nothing architecture with no application level memory, this definition would need to be loaded and parsed and put into memory on each request. The larger the dependency tree that you track, the larger the memory footprint of the dependency definition graph. Furthermore, since this definition has to be loaded on each request, if it is in a non-native format (meaning anything other than PHP code), there are certain costs with converting this format, be it XML, YAML, JSON, or INI to the in-memory structure that the dependency management container requires. What’s more, the PHP platform does not keep track of file changes. So without some kind of user-land tracking, it is hard to know what files during development have changed. Thus, your dependency management system, if it’s taking an automatic approach, would have to rescan the filesystem for changes upon each request during development – which has its own consequences.

Deployment Time

When one is done writing code and is ready to push this application into production, the act of pushing this application is called deployment. The mode for this application is now considered “production”. In production, you can be sure that the structure of your code is stable and will not change, thus your dependency graph is now safe from changes too. Since this is the case, there is no longer a need to keep updating and regenerating this dependency definition file like you were during development.

Even though the definition is no longer changing, there still is the concern about how expensive it is to load this definition each request. Naturally, the cheapest form of definition would be a PHP array or structure describing the definition that can then be loaded in-memory. Other file types like XML, YAML, JSON, etc first have to go through a parsing phase before they can be used. This activity of parsing these files could be expensive, and could benefit from some kind of caching. Caching the definition in some way shape or form, would ensure there is minimal overhead per-request when the application is using this dependency management container.

Other Observations & Criticisms

It is important to realize that dependency management solutions in and of themselves are, in all the available words, full frameworks. They require that you understand both their philosophy as well have a minimal understanding of what facilities they are offering in order to use them effectively. To understand the true benefits of any framework one must first know the pain points the framework is attempting to solve. Seeing the end result of a framework without knowing what it is facilitating might lead to one to dismiss it as overkill or unintuitive. For example, take the following code (typical of dependency management systems)

$userRepository = $dic->get('UserRepository');

If you encounter this line of code without fully understanding the dependency injection container being used, you wouldn’t be able to appreciate its usefulness. You could instantiate your Application\Model\UserRepository yourself, sure, but you’d also have to locate and inject the database adapter to use and into that you’d have to inject and load the configuration for that database connection. If you are doing this in multiple controller actions, there is a lot of repeated boilerplate code that is required to “wire” the UserRepository object. Internally, the DiC object is loading and consulting a definition, creating objects, injecting those objects, and returning the requested object that has been fully wired and ready to use.

The above code also demonstrates two common criticism of dependency management frameworks, which is also a criticism of frameworks in general. By using this framework, you are moving further away from the facilities of the language or platform itself. Instead of using the “new” keyword to create a new object, you’ve asked another object to create this requested object for you. What this has done has shifted developers away from utilizing the language’s well understood API and onto the framework’s API. Additionally, this kind of code is not easily understood by IDE’s. While special features could be added to the IDE to support this framework, it does not inherently know what kind of object is being returned by the $dic->get(..) method call.

Summary

While dependency management frameworks have clear drop-in benefits, there exist a few considerations that have unknown or unexplored consequences. For example, if the benefit is such that all dependencies are managed, and all a developer has to do is configure it, does that encourage deeper object graphs when creating classes and class dependencies? If so, what is the performance impact of these deep object graphs, particularly on the PHP platform. What are the memory implications of such object graphs, what are the speed implications of them? Furthermore, if one needed to debug an object that has been generated by a dependency management framework, is that easily possible?

At the end of the day, whether or not to use a dependency management framework is a matter of cost versus benefit. In order to be able to make an informed decision, a developer should consider a few scenarios. First, one should know what code might look like with and without this new framework. This will give an indication of the cost/benefit at the code level, does it actually save lines of code, and developer headaches? Secondly, one should consider how much added knowledge a developer or a team of developers need in order to understand this framework. Lastly, one should consider what kind of performance impact implementing this new framework has on the application’s throughput.

PHP Component and Library API Design Overview

January 18th, 2011 by Ralph Schindler

There’s been lots of change in the PHP community over the past few years. PHP now has namespaces. More PHP developers are using an IDE. More PHP developers are pulling inspiration from the Java, C#/.NET, and Ruby communities. And even more PHP developers are embracing the object-oriented and, ironically, the functional nature (closures) of PHP. All these changes make for interesting code. What has also happened is that better and more readable code is being produced by this ever growing PHP community. It’s been a long time since “PHP application” meant a series of transaction scripts as a mix of SQL, CSS, JS, with some PHP sprinkled in, and a couple of few classes for good measure. Of course, that still exists, but you no longer need to go to the ends of the earth to find non-spaghetti code that is understandable within a few minutes.

For the most part, all of these changes are good changes. The number of good/senior/expert level PHP developers is ever increasing and there are more and more “enterprise grade” frameworks and libraries that are being produced. That said, with all of these new changes, the one area which is still fairly inconsistent from project to project is the naming conventions that are employed inside PHP 5.3 project that utilize namespaces. This article will attempt to describe what an API is, how names and object-oriented features affect an API, and how various decisions affect the consumers of a particular API is.

What Is An API?

Before we jump into naming, it’s important to have a common understanding of the actual problem area. When we talk about names, we are really talking about the API. An API is a particular set of rules and specifications that a developer can follow to access and make use of the services and resources provided by another particular software program, component or library. Put another way, it is an interface between various software pieces and facilitates their interaction, similar to the way the user interface facilitates interaction between humans and computers.

For PHP 4 / procedural based libraries, the API is defined by the functions that are declared for usage in that library. It is further described by the global names and global state that the library utilizes to do its job. Typically, API’s based on purely function based libraries are far simpler to understand.

Object-oriented API’s are a bit more complex. When you build an object-oriented library or component, you are typically designing two API’s at the same time, whether or not you know it. This is the nature of object-oriented languages when you employ the use of abstract classes and interfaces in your design.

The first API, the more common of the two, I call the Consumption API. This is the API that answers the question: “how do people consume this thing.” The answer to this question is generally situated around the great majority of use cases that were identified by the author of the software component/library. In PHP, consumption might look like this:

$foo = new SomeCompany\FooComponent\FooComponent($options);
$foo->setAdapter(new SomeCompany\FooComponent\Adapter\SomeAdapter($adapterOptions));
$interestingResult = $foo->doSomethingInteresting();

As you can see, no declarative code was required to fulfill the most common use case that was identified as a need for this component’s existence. The above API is defined by the totality of all the public (concrete) classes, their public properties and public methods. By examining these elements, a good API design should allow a developer to deduce how the component works without examining any documentation. When that is possible, the API has become the documentation as well as the “story” behind how the component/library is to be used.

Not all use cases are accounted for in generic components and generic libraries. As developers, we attempt to create generic libraries and components that will solve the majority of problems of the majority of the community. We cannot envision all use cases or even edge cases behind a particular component. That said though doesn’t means that the outlying use cases are unimportant or should be unaccounted for. These use cases are handled by the secondary API: the “Extension API”.

The Extension API answers the question: “since this component does 90% of what I want, how can I extend it to fulfill the last few of my needs.” Clearly, it makes sense to leverage tools that do most of what you need especially if they can be extended in ways that are outside of the out-of-the-box feature-set. Object-oriented/class based code is particularly well suited to extension through the principle of overriding polymorphism.

The primary tool behind overriding polymorphism is method overriding. For this to be possible, base types, or the types that are shipped with the component/library you are extending, will be overridden to fulfill this new behavior that is your specialized use case. Consider the following code example:

namespace MyCompany\FooComponent\Adapter; // My Component
use SomeCompany\FooComponent\Adapter\SomeAdapter; // Consumed Component

// extend the provided Component with my special use case
class MyAdapter extends SomeAdapter
{
    protected function _someWorkToBeDone()
    {
        // do something special that fulfills our use case
        return parent::_someWorkToBeDone();  // protected method on parent class
    }
}

As you can see here, we’ve extended the functionality of the base adapter from the shipped component/library with our own functionality. This is possible since the base adapter tucked away the business logic we needed to alter inside a protected method. This is what allows us to rely on overriding polymorphism to extend code to suit more specific needs. This “Extension API” can therefore be defined by the totality of all protected members of a class: methods and properties that can be utilized in child classes. These protected methods are not all that important or even useful in the documented and de-facto use cases of a component, but become extremely important when extending.

API Philosophy

It’s hard to quantify importance of any one aspect of a codebase’s API over another without first talking about the general philosophy. In the land of a 1000 frameworks and libraries, being well written and poorly written divides the great majority of them. Of what is left of the (generally regarded) well written ones, philosophy divides the rest.

There exist two common philosophical “goals” that most libraries/components generally subscribe to that, depending on your perspective, might be contradictory. For arguments sake, let’s assume that each is as important as the other. The first: “easy to use”. A component’s like-ability by developers is greatly determined by how easy something is to use, if it’s intuitive, if it’s fulfills the majority of one’s needs. The other: “easy to extend”. The majority of the time, a component is written for some well known use cases. Generally, that will suite the majority of the needs of any one developer, but there are always some unknown use cases. A components ability to be able to deliver a mostly working solution while allowing the developer to extend it for the unknown is what determined how easy it is to extend said component.

More often than not, ease of use and extensibility live at two ends of the spectrum. Things that are easy to use are generally hard to extend, and things that are simple to extend are generally harder to use. This is the case because to accommodate one usually comes at the expense of the other.

Getting back to philosophy and this example at hand, both ease of use and extensibility are both equally important. The goal, in terms of API design, is to be able to accommodate each equally and strike a balance between the two so that each goal is represented in the API.

Basic Tips And Tricks For Better APIs

The tips and tricks for building better component API’s could get fairly long, so this article will attempt to cover some of the more “basic” ideas.

Adopt A Common Namespace & Class Naming Scheme

While it is true that the PHP platform has no built-in packaging, or file based import mechanism… the PHP autoloader with the help of some common conventions can get you 99% of the way there. Large projects like Zend Framework, Symfony, PHPUnit, and PEAR have all settled on a pretty simple and common naming scheme based on the PEAR naming standards. By utilizing this naming scheme, your code will be instantly familiar to developers who already have knowledge of this scheme in other projects. The benefit here is that developers will know exactly where to find classes inside the filesystem.

namespace MyCompany\MyComponent;
class Foo {
    // will be found relative to the include_path, or some path
    // managed by an autoloader at
    // MyCompany/MyComponent/Foo.php, pretty simple eh?
}
Avoid Doing Too Much In the Constructor

There’s lots of places on the web that discuss this, so I’ll link to them here and not go into too much detail. I’ve seen it called a “unified constructor”, but that’s not what we are talking about here, or at least, that is not the goal. The goal is to allow the consumer to give as much or as little information about the identity of the object at instantiation time. The common signature that I like for this is the following:

class Foo
{
    public function __construct($options = null)
    {
        if (is_array($options)) {
            $this->setOptions($options);
        } elseif (is_string($options)) {
            $this->setValueThatIsDocumentAndWellKnown($options);
        }
    }
}

Generally, the call to setOptions() will in turn call various setters if they exist. What is important is that at construction/instantiation time a consumer is not required to fulfill all of the classes requirements. Why is this important? It reverses order in which dependencies are required to be interacted with. Lets examine this in code:

// Example 1
// assuming: class Foo { __construct(A $a, B $b, C $c) {} }
$a = new A($aOption1, $aOption2);
$b = new B();
$c = new C($cOption, $a);

$foo = new Foo($a, $b, $c); // and finally
$foo->doSomethingInteresting();

/** OR ALTERNATIVELY **/

// Example 2
// assuming: class Foo { __construct($options = null) {} }
$foo = new Foo(array(
    'a' => ($a = new A($aOption1, $aOption2)),
    'b' => new B(),
    'c' => new C($cOption, $a)
    ));
$foo->doSomethingInteresting();

// Example 3
// or better:
$foo = new Foo();
$a = new A($aOption1, $aOption2);
$foo->setA($a)
    ->setB(new B())
    ->setC(new C($cOption, $a));
$foo->doSomethingInteresting();

The difference is that in Example 1, even though our target use case is handled by class Foo, we are forced to interact with the dependencies first. Conversely, examples 2 and 3 show that our target object Foo is created up front, and dependencies are handled after instantiation. If code clarity is a goal, reading the code top down in example 2 and 3 makes more sense than in example 1 since the API has allowed the developer to code his use case in a top-down or story-like code block. Why do I like this pattern of usage? Simple: it highlights PHP’s loose nature and flexibility in it’s use case… but mostly because it’s more readable.

Avoid final And private

This one speaks to extensibility. Unless you are attempting to restrict a user from utilizing some kind of use case, there is little gain in marking members as final or private. Sooner or later, someone somewhere will need to override a method you’ve implemented for some obscure use case. A better approach is to provide them with a codebase that will meet most of their needs and can be extended to fulfill the rest if they are outside the original scope. That way, they are not forced to patch your codebase.

Summary

This is by far not an exhaustive list. As more of the larger projects move to using namespaces, closures and the other PHP 5.3 features, we’ll start to see a few more best-practices emerge as they relate to API design. In the mean time, this overview will serve as a springboard for a few discussions on API design moving forward with ZF2 and PHP 5.3 component development that is currently on-going.

Composite Rowsets For Many-To-Many Relationships Via Zend_Db_Table

November 15th, 2010 by Ralph Schindler

One of the hardest problems to solve when developing an ORM of any complexity is in deciding how to handle the retrieval of rows that satisfy a many-to-many relationship, also known as a M:N relationship. From the perspective of an object, there is no such thing as a many to many relationship. There are only two relationships an object understands. The first is the relationship of itself to another object, which is a one to one (1:1) relationship. The second is the relationship of itself to a group of other objects, or a one-to-many (1:N) relationship. It’s not until you look at the relationship of all objects in a system that the many-to-many relationship pattern emerges.

In RDBM systems, rows and their relationships are modeled through the use of foreign keys and foreign key constraints between a left table and a right table. Foreign key constraints, by themselves, can only model 1:1 and 1:N relationship of rows. To model M:N relationships, database developers must get creative. By employing the use of a “3rd party”, and by utilizing foreign keys that model a 1:N relationship, database developers can model a M:N relationship. This 3rd party comes in the form of another table that may or may not have any data model specific information attached to it. This table is generally known as a junction table, but has also been known as a cross-reference table, bridge table, join table, map table, intersection table, linking table, many-to-many resolver, link table, or association table.

Zend_Db_Table_Row And Junction Tables

Zend_Db_Table is a component in Zend Framework that implements the Table Data and Row Data Gateway patterns. In short, a row object attempts to create a single PHP object per actual row in the database table. Furthermore, Zend_Db_Table_Row objects can go as far as to describe, understand, and interrogate these various 1:1, 1:N and M:N relationships. This allows row objects to be able to find and return related row objects in the form of a rowset.

One of the primary tenets of Zend_Db_Table and Zend_Db_Table_Row is to be able to produce consistent row objects. This means that the properties of these row objects should be a complete and logical representation of how the row might look inside the table of the RDBMS.

Some time ago an issue (ZF-6232) was filed against Zend_Db_Table to report that columns from the junction table were being included in the resulting rowset’s row objects. This was causing issues for people who then attempted to save() the row object back to the database. If a developer mistakenly altered one of the junction table values that was accidentally included in the row, Zend_Db_Table_Row would throw an exception since the row object had more columns than the actual row in the database. Given that we want to create consistent, complete and logical row objects, a solution was devised to ensure that the junction table’s row information was not included in the resulting rowset’s rows. Consequently, this meant that anyone relying on this undocumented behavior would no longer be able to get data stored inside the junction table as part of the result set’s row object. This fix was incorporated into the 1.10.2 release.

Over the past several years of working on Zend Framework, I’ve noticed the developer population at large is really good at finding undocumented and previously unthought-of use-cases of Zend Framework components. These use-cases, while sometimes “inventive” to say the least- are also sometimes blatant misuses of a component. It suffices to say that these use-cases are not captured in a unit test and consequently are not protected by backwards compatibility.

Relying on Zend_Db_Table_Row to include junction data is not only an unintended use case but also a misuse of the findManyToManyRowset() functionality provided by Zend_Db_Table_Row. That said, I do want to provide a solution for developers that relied on this behavior of Zend_Db_Table_Row in Zend Framework previous to 1.10.2.

A Solution

While the motivation for creating this class is based on providing a solution to developers who relied on utilizing junction table data in Zend_Db_Table_Row’s many-to-many rowsets, this same technique can be utilized with any ORM or database abstraction layer that handles many-to-many result sets.

Basically, I’ve created a single class that effectively take the place of Zend_Db_Table_Row::findManyToManyRowset() for the purposes of creating an iterable rowset that allows access to both the target many-to-many rowset as well as the junction rowset. This solution is called a Composite Rowset. In this solution, both rowsets (iterators) are kept in sync with one another. This proves to be an ideal solution in a couple of ways. First, it will produce consistent row objects that are explicitly tied to a row in a database. Second, the cost of creating this composite rowset is at the expense of 2 queries: the original many-to-many query and a similar query to retrieve the junction rowset. This is ideal since previously, to get the junction data, findDependentRowset() would have had to been called on each row within the rowset produced by the Zend_Db_Table_Row::findManyToManyRowset().

The API for this Composite Rowset looks like this:


/**
 * @link https://github.com/gooeylabs/Gooey-PHP-5.2-Components/blob/master/library/Gooey/Db/Table/ManyToManyCompositeRowset.php
 */
class Gooey_Db_Table_ManyToManyCompositeRowset implements SeekableIterator, ArrayAccess, Countable
{

    public function __construct(Zend_Db_Table_Row_Abstract $row, $matchTableName, $junctionTableName, $matchRefRule = null);
    public function seek($position);
    public function current();
    public function currentJunction();
    public function next();
    public function rewind();
    public function key();
    public function valid();
    public function offsetSet($offset, $value);
    public function offsetGet($offset);
    public function offsetExists($offset);
    public function offsetUnset($offset);
    public function count();
    public function getRow($position, $seek = false);
    public function getJunctionRow($position, $seek = false);
    public function toArray();
    public function junctionRowsetToArray();
}

NOTE: Full class located here.

As you can see, the API mirrors that of Zend_Db_Table_Rowset to provide a something that is immediately recognizable. Below is an example of sample usage. For this example, assume there is a typically artist/genre data model that demonstrates a many-to-many relationship. Inside of the junction table we are attempting to track the date that the relationship was created. This examples shows this usage:


$aTable = new ArtistTable();
$artist1 = $aTable->find(1)->current();
echo 'Artist: ' . $artist1->name . PHP_EOL;
// instead of $genres = $a->findManyToManyRowset('GenreTable', 'ArtistGenreTable');
$genres = new Gooey_Db_Table_ManyToManyCompositeRowset($artist1, 'GenreTable', 'ArtistGenreTable');

// iterate
foreach ($genres as $genre) {
    echo '  Genre ' . $genre->name . ' added on ' . $genres->currentJunction()->added_on . PHP_EOL;
}

/**
 * Sample Output:
 *
 *    Artist: Foo Artist
 *      Genre Rock & Roll added on 2010-11-10
 *      Genre Hiphop added on 2010-11-11
 *
 */

Where To Get It & Conclusions

This code is available on my GooeyLabs github account, specifically inside of the Gooey-PHP-5.2-Components repository. (Gooey is my namespace and moniker for my open source code contributions.) Hopefully, those who have found they’ve had issues with the above mentioned fix for Zend_Db_Table_Row::findManyToManyRowset() and junction table data might find value in this class.

Exception Best Practices in PHP 5.3

September 15th, 2010 by Ralph Schindler

Every new feature added to the PHP runtime creates an exponential number of ways developers can use and abuse that new feature-set. However, it’s not until developers have had that chance that some agreed-upon good usage and bad usage cases start to emerge. Once they do emerge, we can finally start to classify them as best or worst practices.

Exception handling in PHP is not a new feature by any stretch. In this article, we’ll discuss two new features in PHP 5.3 based around exceptions. The first is nested exceptions and the second is a new set of exception types offered by the SPL extension (which is now a core extension of the PHP runtime). Both of these new features have found their way into the book of best best practices and deserve to be examined in detail.

Special note: some of these features have existed in PHP < 5.3 or are at least capable of being implemented in PHP < 5.3. When this article mentions PHP 5.3, it is not in the strictest sense of the PHP runtime. Instead, it is meant that code bases and projects that are adopting PHP 5.3 as a minimum version but also all of the best practices that have emerged in this new phase of development. This phase of development highlighted by the “2.0″ efforts of projects like Zend Framework, Symfony, Doctrine and PEAR to name a select few.

Background

Previously in PHP 5.2, there was a single exception class Exception. Generally, speaking from a Zend Framework / PEAR coding standard perspective, this exception class became the root for all exceptions that might be thrown from within your library. For example, if you created a library for your company MyCompany, then you would, according to ZF/PEAR standards, have prefixed all code with MyCompany_. For this library, you might create a base exception for your library code: MyCompany_Exception, which extends the PHP class Exception and from which all your components might inherit, subclass, and throw. So, if you created a component MyCompany_Foo, it might have a base exception class called MyCompany_Foo_Exception that is expected to be thrown from within the MyCompany_Foo component. These exceptions can be caught by attempting to catch MyCompany_Foo_Exception, MyCompany_Exception, or simply Exception. This would allow 3 levels of granularity (or more depending on how many times the MyCompany_Foo_Exception was subclassed) to consumers of this component in this particular library, and handle that exception in a way they deem fit.

New Feature: Nesting

In PHP 5.3, the base exception class now handles nesting. What is nesting? Nesting is the ability to catch a particular exception, create a new exception object to be thrown with a reference to the original exception. This then allows the caller access to both the exception thrown from within the consumed library of the more well known type, but also access to the exception that originated this exceptional behavior as well.

Why is this useful? Typically, this is most useful in code that consumes other code that throws exceptions of its own type. This might be code that utilizes the adapter pattern to wrap 3rd party code to deliver some kind of adaptable functionality, or simply code that utilizes some exception throwing PHP extension.

For example, in the component Zend_Db, it uses the adapter pattern to wrap specific PHP extensions in order to create a database abstraction layer. In one adapter, Zend_Db wraps PDO, and PDO throws its own exception PDOException, Zend_Db needs to catch these PDO specific exceptions and re-throw them as the expected and known type of Zend_Db_Exception. This gives developers the assurance that Zend_Db will always throw exceptions of type Zend_Db_Exception (so it can be caught), but they will also have access to the original PDOException that was thrown in case it is needed.

The following is an example of how a fictitious database adapter might implement nested exceptions:


class MyCompany_Database
{
    /**
     * @var PDO object setup during construction
     */
    protected $_pdoResource = null;

    /**
     * @throws MyCompany_Database_Exception
     * @return int
     */
    public function executeQuery($sql)
    {
        try {
            $numRows = $this->_pdoResource->exec($sql);
        } catch (PDOException $e) {
            throw new MyCompany_Database_Exception('Query was unexecutable', null, $e);
        }
        return $numRows;
    }

}

To utilize a nested exception, you would call the getPrevious() method of the caught exception:


// $sql and $connectionParameters assumed
try {
    $db = new MyCompany_Database('PDO', $connectionParams);
    $db->executeQuery($sql);
} catch (MyCompany_Database_Exception $e) {
    echo 'General Error: ' . $e->getMessage() . "\n";
    $pdoException = $e->getPrevious();
    echo 'PDO Specific error: ' . $pdoException->getMessage() . "\n";
}

Most recent PHP extensions have OO interfaces. As such, those API’s tend to lean on throwing exceptions instead of raising errors. A short list of exception throwing extensions in PHP include PDO, DOM, Mysqli, Phar, Soap and SQLite.

New Feature: New Core Exception Types

Also in PHP 5.3 development we are shining a light on some new and interesting Exception types. These exceptions have been in place since the PHP 5.2.x, but it has not been till recently and the “re-evaluation” exception best practices that they are now gaining some limelight. They are implemented in the SPL extension and are listed on the manual pages located here. Since these new exception types are part of core PHP as part of SPL, they can be used by anyone who targets PHP 5.3 as the minimum runtime for their code. While this might seem less important for when writing application layer code, the way we adopt and use these new exception types becomes even more important when we are writing and consuming library code.

So why new exception types in general? Previously, developers attempted to give more meaning to their exceptions by putting more information into the message of the exception. While this is good, it has a few drawbacks. One is that you cannot catch an exception based on a message. This can be a problem if you know a set of code is throwing the same exception type with various message for various exceptional conditions that can be handled differently. For example, an authentication class that during $auth->authenticate(); it throws the same type of exception (let’s assume Exception), but with different messages for two specific failures: a failure where the authentication server cannot be reached and the same exception type but different message for a failed authentication attempt. In this case (nevermind that using Exceptions might not be the best way to handle authentication responses), it would require string parsing the message to handle those two scenarios differently.

The solution to this is clearly some way to codify exceptions so that they can be easily interrogated when trying to discern how to react to this exceptional situation. The first response libraries have had is to use the $code property of the Exception base class. The other is to create multiple types, or new exception classes, that can be thrown to describe the behavior. Both of these approaches have the same simple drawback. Neither has emerged as a best practice and as such, neither is considered a standard, thus each project attempting to replicate this solution might do so with small variations that force the consumer to go back to the documentation to understand the library specific solution that was created. Now with the new types approach in the SPL, otherwise known as the Standard PHP Library; developers can utilize these new types in the same way in their projects and the projects they are consuming since a best practice for these new types has emerged.

The second drawback of the detailed message approach is that it makes understanding the exceptional situation harder for non-english or limited-english speaking developers. This might slow down some developers when trying to decipher what an exception message is trying to convey. As many developers as there are writing exceptions, there are equally as many variations in how they will describe that situation in the message since there is no standard for conformity or for codification.

So How Do I Use Them, Give Me The Dirty Details?

There are a total of 13 new exceptions in the SPL. Two of them can be considered “base” types: LogicException and RuntimeException; both extend the PHP Exception class. The remainder of the methods can thusly be broken down into three logical groups: the dynamic call group, the logic group and the runtime group.

The dynamic call group contains the exceptions BadFunctionCallException and BadMethodCallException. BadMethodCallException is a subclass of BadFunctionCallException which in turn is a subclass of LogicException. That means that these exceptions can be caught by either their direct type, LogicException, or simply Exception. When do you use these? Generally, these should be used when an exceptional situation arises as a result of an unresolvable __call() during a method or when a callback cannot find a valid function to call (or better put, when something is not is_callable()).

For example:


// OO variant
class Foo
{
    public function __call($method, $args)
    {
        switch ($method) {
            case 'doBar': /* ... */ break;
            default:
                throw new BadMethodCallException('Method ' . $method . ' is not callable by this object');
        }
    }

}

// procedural variant
function foo($bar, $baz) {
    $func = 'do' . $baz;
    if (!is_callable($func)) {
        throw new BadFunctionCallException('Function ' . $func . ' is not callable');
    }
}

While the direct example is inside __call and anywhere near something that will call_user_func(), this group of exceptions are also useful when developing any kind of API where dynamic method call and function call lookups are utilized. An example of this would be a SOAP or XML-RPC client/server who is capable of issuing and/or interpreting method requests.

The second group is the logic group. This group consists of DomainException, InvalidArgumentException, LengthException, and OutOfRangeException. These exceptions are a subclass of LogicException which is in turn a subclass of the PHP Exception class. You use these exceptions when there is an exceptional situation that arises from either a mutation of state or as a result of bad method or function parameters. To get a better understanding of this, we will first look at the last group of exceptions.

The final group is the runtime group. It consists of OutOfBoundsException, OverflowException, RangeException, UnderflowException, and UnexpectedValueException. These exceptions are a subclass of RuntimeException which is in turn a subclass of the PHP Exception class. These exception should be used when an exceptional situation arises during the “runtime” of a function or method call.

How do these logic group and runtime group work together? If you look at the anatomy of an object, one of two things is generally happening. First, the object will be tracking and mutating state. This means the object is generally not doing anything (yet); it might have configuration passed to it; it might be setting up properties (via setters and getters); or, it might be getting references to other objects. Second, when the object is not tracking and mutating state, it is operating – doing what it was designed to do. This is the object’s runtime. For instance, during the objects lifetime, it might be created, passed a configure object, then it might have setFoo($foo), setBar($bar) called. During these times any kind of LogicException should be raised. In addition, when the object is asked to do something, with parameters, for example $object->doSomething($someVariation); during the first few lines when it interrogates that $someVariation variable, it would throw a LogicException. After it is done interrogating $someVariation, and it goes on about doing its job of doSomething(), this is considered its “runtime” and in this code it would throw RuntimeExcpetions.

To better understand, we’ll look at this concept in code:


class Foo
{
    protected $number = 0;
    protected $bar = null;

    public function __construct($options)
    {
        /** this area throws LogicException types **/
    }

    public function setNumber($number)
    {
        /** this method throws LogicException types **/
    }

    public function setBar(Bar $bar)
    {
        /** this method throws LogicException types **/
    }

    public function doSomething($differentNumber)
    {
        if ($differentNumber != $expectedCondition) {
            /** this area throws LogicException types **/
        }

        /**
         * From here on down, this method throws
         * RuntimeException types
         */
    }

}

Now that this concept is understood, what does this do for a consumer of this code base? The caller can be sure that anytime they are mutating the state of an object, they can catch exceptions with the most specific type, for example InvalidArgumentException or LengthException, and at least LogicException. By having this level of granularity, and multiple types involved, they can catch the exception minimally with LogicException, but also get greater understanding of what when wrong via the actual type of the exception. This same concept applies for the Runtime group of exceptions as well, more specific types can be thrown and either the specific or the less specific type will be caught. This offers a greater deal of knowledge about the situation and granularity of control to the caller.

Below is a table of the information you might find of interest concerning these SPL exceptions

Best Practices In Library Code

Since the advent of these new exception types in PHP 5.3, a new best practice for library code has also emerged. While it is most beneficial to get a standard specialized exception type like InvalidArgumentException or RuntimeException, it would also be useful to catch component level exceptions. You can read a more in-depth discussion of the concepts on the ZF2 wiki or the PEAR2 wiki.

The long and short of this, in addition to the best practices listed above, is that there should be a component level type that can be caught for any exception that emanates. This is accomplished by using what is known as a Marker Interface. By creating a component level marker interface, real exception types inside a given component can extends the SPL exception types and be caught by any number of class types at runtime. Let’s examine the following code:


// usage of bracket syntax for brevity
namespace MyCompany\Component {

    interface Exception
    {}

    class UnexpectedValueException
        extends \UnexpectedValueException
        implements Exception
    {}

    class Component
    {
        public static function doSomething()
        {
            if ($somethingExceptionalHappens) {
                throw new UnexpectedValueException('Something bad happened');
            }
        }
    }

}

Assuming the above code, if one were to execute MyCompany\Component\Component::doSomething(), the exception that is emitted from the doSomething() method can be caught by any of the following types: PHP’s Exception, SPL’s UnexpectedValueException, SPL’s RuntimeException the component’s MyCompany\Component\UnexpectedValueException, or the component’s MyCompany\Component\Exception. This affords the caller any number of opportunities to catch an exception that emanates from a given component within your library. Furthermore, by analyzing the types that make up the exception, more semantic meaning can be given to the exceptional situation that just occurred.

Summary

In summary, this article should help guide you in creating and throwing more meaningful exceptions in a standards based and best practices way by negating the emphasis of the exception message and putting more emphasis on the exception type. If you’d like to carry on the discussion of these concepts feel free to comment here, on the PHP documentation pages, or in the ZF2 wiki comments section for the Exception proposal linked above.

Compiling Gearman (or anything) for Zend Server CE on Snow Leopard

May 12th, 2010 by Ralph Schindler

The first thing you need to know about Mac OS.X Snow Leopard all Mac’s and Macbook Pro’s is that this hardware is 64 bit capable. This may not mean you are running a 64 bit kernel, it simply means that the operating system is capable of executing x86 64bit executables. We won’t go into the details of kernel architecture, you can read more about that here.

What is important though is that both x86_64 and i386 based executables can run on snow leopard. What is not uncommon on OS.X is to have executables (and libraries) that have multiple architectures compiled in. To see what architectures are inside a particular file, run something like this:

    /usr/local# file /usr/bin/php
    /usr/bin/php: Mach-O universal binary with 3 architectures
    /usr/bin/php (for architecture x86_64): Mach-O 64-bit executable x86_64
    /usr/bin/php (for architecture i386):   Mach-O executable i386
    /usr/bin/php (for architecture ppc7400):        Mach-O executable ppc

    /usr/local# file /usr/local/zend/apache2/bin/httpd
    /usr/local/zend/apache2/bin/httpd: Mach-O executable i386

This means that PHP (supplied by apple), has been compiled with 3 architectures inside. What does that mean? It means there is basically 3 versions on PHP compiled into a single binary, and that when it is loaded into memory, only one particular version will be used at a time. To demonstrate, lets take a pretty common difference between 32bit and 64bit architectures: integer size. We know that 64 bit integer space is larger than that of the 32bit space. The following demo will show running different architectures from the same binary:

    /usr/local# arch -arch x86_64 /usr/bin/php -nr 'echo PHP_INT_MAX;'
    9223372036854775807

    /usr/local# arch -arch i386 /usr/bin/php -nr 'echo PHP_INT_MAX;'
    2147483647

We know we are running same command though different architectures since we know PHP has different max integer sizes.

The next important thing to understand is the nature of the PHP stack. PHP is generally regarded as a glue language. That might mean several things to different people, but we will be looking strictly at this statement in the purest technical sense. PHP is made of the core language and features, but also a rich set of extensions. These extensions are typically written in C, and have interfaced with the C layer PHPAPI. Most of the really useful extensions are linked against libraries on your system, for example the openssl set of functions are not actually implemented in PHP’s source code, the openssl extension is simple a wrapper that calls out to libssl.so (or .dylib on mac, .dll on windows). This is what is meant by PHP being a glue language/platform.

Since PHP relies on existing compiled libraries, you further have to understand how things are linked and compiled. There are generally two options here: linking dynamically, or statically compiling. Either way, one thing remains true: you cannot mix architectures. This means that if your apache/mod_php and/or php binary are only i386, then all of the libraries on your system that will be used must contain the i386 architecture. Likewise, apache/mod_php and/or php binary are only x86_64, then all of your libraries must contain the x86_64 architecture. Failing to have this, you will get a message like this for example:

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/local/zend/lib/php_extensions/gearman.so' - dlopen(/usr/local/zend/lib/php_extensions/gearman.so, 9): no suitable image found.  Did find:
/usr/local/zend/lib/php_extensions/gearman.so: mach-o, but wrong architecture in Unknown on line 0

Now that we understand that executables and libraries can have multiple architectures, let’s get to the task at hand: making sure new extensions can run with Zend Server CE.

Zend Server CE for Mac (as of this writing), comes compiled as an i386 executable only. This includes the PHP binary, php library, and apache binaries that come shipped with ZSCE. While ZSCE works great out the box with all the provided extensions, you might find that you want some additional 3rd party PHP extensions compiled/linked into this stack. That’s where things get a little confusing, and in this post, we’ll look at how to install the gearman extension.

PHP Extensions are basically wrappers around existing libraries, so generally, these extensions require the base library to already be on the system. In our case, we need “libgearman” compiled and on our system for us to be able to compile and use the PHP Gearman Extension.

At this point, I would generally instruct you to compile Gearman with multiple architectures and install (–prefix=/usr/local). (Note: to compile for multiple architectures, simply do the following):

    export CFLAGS='-arch i386 -arch x86_64'

In the particular case of Gearman, this will not work as the Gearman makefile utilizes flags that are not compatible with multiple architecture targets. As such, we go to plan B.

Plan B is something I generally do to keep my system clean: statically building libraries. I have a personal rule of not keeping i386 only libraries installed in common places like /usr/lib or /usr/local/lib, in this case /usr/local/lib/libgearman.dylib. Since this is the case, I’ll build Gearman statically, compile it into the PHP Gearman Extension, and this will allow me to remove the temporary Gearman installation which will have to be i386 only.

    # check to ensure we have a multi-arch libevent (if not go create it as
    # normal with CFLAGS="-arch i386 -arch x86_64" and install to /usr/local)

    /usr/local/src/gearmand-0.13# file /usr/local/lib/libevent.dylib
        /usr/local/lib/libevent.dylib: Mach-O universal binary with 2 architectures
        /usr/local/lib/libevent.dylib (for architecture i386):  Mach-O dynamically linked shared library i386
        /usr/local/lib/libevent.dylib (for architecture x86_64):        Mach-O 64-bit dynamically linked shared library x86_64

    # next compile gearman to a temp location

    /usr/local/src/gearmand-0.13# export "CFLAGS=-arch i386"
    /usr/local/src/gearmand-0.13# ./configure --disable-shared --prefix=/usr/local/gearman-tmp
    /usr/local/src/gearmand-0.13# make && make install
        [gearman installed now, this should only have static files]

    # ensure we only have a .a library file for gearman
    /usr/local/src/gearmand-0.13# ls /usr/local/gearman-tmp/lib/
        libgearman.a    libgearman.la   pkgconfig

    # make sure zend/bin is first on your PATH
    /usr/local/zend/tmp# echo $PATH
        /usr/local/zend/bin:/var/root/.bin:/usr/local/git/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
    /usr/local/zend/tmp# which phpize
        /usr/local/zend/bin/phpize

    # next, go to our zend server location, and pull down gearman extension
    /usr/local/src/gearmand-0.13# cd /usr/local/zend/tmp/
    /usr/local/zend/tmp# pecl download gearman-beta
        downloading gearman-0.7.0.tgz ...
        Starting to download gearman-0.7.0.tgz (29,258 bytes)
        .........done: 29,258 bytes
    File /usr/local/zend/tmp/gearman-0.7.0.tgz downloaded

    # next, unpack, phpize, and statically compile
    /usr/local/zend/tmp# tar zxf gearman-0.7.0.tgz
    /usr/local/zend/tmp# cd gearman-0.7.0
    /usr/local/zend/tmp/gearman-0.7.0# phpize
        Configuring for:
        PHP Api Version:         20090626
        Zend Module Api No:      20090626
        Zend Extension Api No:   220090626
    /usr/local/zend/tmp/gearman-0.7.0# ./configure --with-gearman=/usr/local/gearman-tmp/ --disable-shared
    /usr/local/zend/tmp/gearman-0.7.0# make
    /usr/local/zend/tmp/gearman-0.7.0# make install
        Installing shared extensions:     /usr/local/zend/lib/php_extensions/

    # Now go add extension=gearman.so to your php.ini file inside /usr/local/zend/etc/php.ini

    # Now go check that php will have gearman support
    /usr/local/zend# php -i | grep gearman
        gearman
        gearman support => enabled
        libgearman version => 0.13

    # Since we statically compiled it, we can remove our temp install of gearman
    /usr/local/zend# rm -Rf /usr/local/gearman-tmp/

At this point, you now have a 3rd party PECL extension that is compiled and working with ZSCE on Mac OS.X.

PHPundamentals Series: A Background on Statics (Part 1 on Statics)

May 6th, 2010 by Ralph Schindler

Just beyond reading the title, you’ve more than likely come to this article as the curious yet uninformed, the mad and raving lunatic, or as an enlightened one. Static class members (from here on called simply, “statics”) in PHP conjure both the best and worst in developers for a variety of reasons. In part 1 of this series of articles on statics, we’ll explore some background to get a better understanding of statics in PHP.

Some Static Background And Understanding

Before we can move into the arguments that surround statics, we first need to understand what they are in the context of PHP.  The core of the PHP language and runtime can draw some pretty big corollaries from the Java/JVM and C#/.NET language platforms. The biggest, and most important for the purposes of this article, is PHP’s object model. Like Java and .NET, PHP follows a class-based, single-inheritance, multiple-interface model- a tenet described by the grandfather of OO languages: smalltalk. Of course, PHP applies its own “perspective” when it comes to the actual implementation details in that of typing, casting, mixed-paradigm usage, and so on; but the foundation for the object model is clearly defined.

That said, it is easy for the PHP community to draw comparisons and, more importantly, “borrow” best practices from both the Java and .NET communities. We certainly have borrowed our fair share with regards to development time tools, infrastructure tools and design patterns. Over the past 5 to 7 years, there has been an increasing adoption of best practices and patterns from the enterprise Java community, particularly in the form of two major texts: GoF and PoEAA. The GoF (Gang of Four) text primarily discusses best practices in the form of code structure and reuse: factory, singleton, adapter, composite, facade, iterator and observer to name a few. PoEAA (Patterns of Enterprise Application Architecture), on the other hand, attempts to solve higher order problems, particularly architectural problems at the application layer: MVC, Page Controller, Front Controller, Domain Model, Table and Row Gateway, and so on. While the examples are primarily executed in Java, they are structurally similar when implemented in PHP, so much so that PHP developers can read the Java examples as pseudo-code. This is what makes these patterns so applicable and thus popular in the PHP community.

Since we now know where these usage patterns originated, we should have a look at the target language platform: PHP. The key concept which delineates the PHP platform from the JVM and .NET platforms, is that PHP by default assumes a shared-nothing architecture. What does this mean? It means out of the box, PHP is not a persistent application platform. PHP’s runtime is built around the notion of primarily solving the web problem. In turn, since the web is request driven, you might say that an application written in PHP is also request driven. Put another way, the scope of your application is bound to a single request. The shared-nothing aspect means that the state of the application is built-up and torn-down upon the start and completion of each request to your application. Conversely, Java and .NET offer a persistent application stack which means the application’s state exists separate from the requests that come in via the web server. So, in PHP, the many requests each contain a single running instance of your application. In Java/.NET, the single application running handles the many requests.

Statics in Analogies

Still don’t get it? Let’s talk in a couple of analogies. Let’s assume we’ve built a basic application with the “out-of-the-box” technologies offered; one built on top of PHP and the other built on top of Java (or .NET, you can choose.) With your Java/.NET application, if a request is never received from your web server, the application is indeed still running. In PHP, if a request is never received from your web server, the application has NEVER run. The runtime of a Java/.NET application might be hours or days, whereas the runtime of a PHP application is a long as it takes to service the request. This analogy’s mileage may vary, and it is surely intended for demonstrative purposes. You could inject any number of monkey wrenches into it, but for all intents and purposes- it’s correct and it works.

Understanding the full scope of an applications runtime state is the most important aspect into understanding the role of static class members in OO programming. Static class members live as long as the application runtime is valid and alive. What this means it is that any class member state that has been set during any operation during the applications runtime will persist until the application ceases to exist. Looking back at our main platform differences, we can see that in the Java/.NET platform, statics members created in the scope of an application layer will be around until someone pushes the “shutdown” button on that application. This could mean a static member or static state is persisted for hours, days, or even longer. Like these persistent application stacks, PHP will destroy any static members and state at the end of the applications lifecycle. Unlike these persistent application stacks, the application lifecycle ends with the completion of a web request. This means that static members and static state in PHP, for the average web application, sticks around for seconds or less and is only valid in the context of a single web request.

Statics in Pictures

Still don’t get it? Lets have a look at a few images to better explain these concepts.

The following images will attempt to explain the various layers of a web application, one from the perspective of the JVM/.NET platform, the other from the perspective of the PHP platform. (For all intents and purposes, the PHP platform could also be any scripting language executed by an apache module or fastcgi.)

The green layer is the web server layer, this is the process that will attach to port 80 and listen for requests. The blue layer represents the application process itself. This layer is responsible for global application state and class-based static state. The orange layer is a request which comes in from the web, this is typically what we’ve called a page request. Inside of each web request is the yellow layer, which represents the page-lifecycle. In terms of the application, this is where all of the request specific application routines happen including page startup and business logic.

Contrasted against …

The most important thing to take away from these images, particularly with respect to understanding statics, is the blue layer, or the layer that best represents the scope of globals and static members. This is the heart of what is meant by a “shared-nothing” architecture. It is this key difference that affects how we architect the code for our web applications.

In the next article in this series, we’ll have a look at PHP’s application architecture in greater detail and how it solves problems that might arise from a shared-nothing style architecture, why this architecture is arguably better for the web and cloud based services, but most importantly, how statics fit into this paradigm.

The Anatomy Of A Bug/Issue Reproduction Script

February 18th, 2010 by Ralph Schindler

“There is a problem with component Fooey-Bar-Bazzy, I think it’s related to Nanny-Nanny-Neener. Please Fix Now.” If you’ve written a bug/issue report like that in the past with no other details- shame on you! This may come as a shock, but as great as some developers might be, they cannot read minds. Each has their own way of coding, custom working environment as well as their own favorite tools; aside from variances in coding standards and best practices. Some could argue these little intricacies are outside of the realm of coding standards and best practices and that these are the differences between good, great, and even terrible developers. Each developer has a different opinion on how particular applications, libraries of code, or even features of a particular project are expected to behave in practice. These varying expectations are why bugs/issues exist. No one developer producing code for mass consumption can anticipate every possible use case. Additionally, no one developer can replicate every environment surrounding every pre-conceived use case. There are simply not enough resources at hand; be it in the form of a variety of systems or simply the number of hours in a developers day.

With that in mind, I write this as a plea to all developers to be good to the maintainer of code you use. In the simplest form of advice, I suggest that before you click submit on that bug/issue report form, ask yourself two questions: “Did I do enough due-diligence in determining if this is really a bug?” AND “If I got this bug report, would I be able to reproduce it.. let alone understand it?”. If the answer is YES to both of those questions. Go ahead- click submit. If your answer is no, you’ve got some more work to do.

Some Tenets Of the Good Reproduction Script

In this short article, I’d like to outline a few details of what should go into a bug/issue report. These are some simple guidelines that should be considered when you write a bug/issue report. It should be noted that this list is by all means not exhaustive, but if you at least consider the list below before clicking submit- you’ll make a code maintainers day. I promise.

  1. List Out All Assumptions Clearly

    PHP specifically is well known for being a “glue language”. What that means is that PHP is generally sitting between multiple pieces of software that is, of course, not PHP. This means that these pieces of software each have their own set of configurations and environments that PHP is “gluing” together. That being the case, any assumptions about non-PHP assumptions should be clearly listed in the reproduction script. This could include database flavor and its settings, a PHP library component, or perhaps a specific version of an extension that is being used and the underlying unmanaged/c-based library your PHP environment is consuming.

  2. Use The Shortest Possible Use Case

    As tempting as it is to copy a script from your project and paste it into the bug/issue submission box, don’t do this. If you are truly invested in seeing the bug/issue fixed in a timely fashion, take the time to create a small reproduction script. In this script should be the absolute minimal amount of code to demonstrate to another human that there is indeed a problem that needs solving. By keeping the script minimal and short, you are also removing any other distractions from the script that otherwise might confuse the maintainer and prevent him from fully understanding the real problem.

  3. Use Generic Yet Meaningful Names

    It cannot be stressed enough that any non-meaningful names should be discouraged at all costs. And as mentioned above, you want to have as few distractions as possible in the use case. For example, supplying your database table of customers, with first_name, last_name, etc has virtually nothing to do with the problem at hand. In these cases where table and column names are ancillary to the actual problem, they should be generalized: a table named ‘foo’, and columns named ‘bar1′ and ‘bar2′. Unless …

    … the variable name can add context to the problem. What does this mean? $customer would be bad; but $faultyTableObject is good. The latter naming makes it easy for the maintainer to focus on the variable that need to be tracked leading up to the problem.

  4. Document Both What You Expect, And The Actual Result

    Claiming something is broken without offering what you expect and what the actual result is offers next to nothing to the maintainer attempting to fix the problem. Generally speaking, most use cases that end up being bugs/issues are outside of the original preconceived use cases for the actual component. That said, the maintainer is going to need the context of the use case that you’ve found to be problematic. It also helps to point out any existing documentation that describe the more well-defined uses cases, and how your use case relates and/or deviates from those already defined use cases.

  5. Make The Reproduction Script As Generic As Possible

    Perhaps this is redundant, but it’s important to know the minimal requirements for reproducing a bug/issue. You are not expected to be an expert on how to fix the actual problem, but you should do your own due-diligence in order to hand the problem off to the maintainer. It’s already been said to “List out all assumptions clearly”, but it is just as important to peel off any specific pieces of the problem that are not directly part of the problem.

    This concept can best be described by example. While MySQL is a widely available database platform, SQLite is widely known as the easiest to use and most portable database platform, at least in the PHP runtime. If you find a problem while using mysql, but it’s clear it can be replicated using SQLite, use SQLite. SQLite is built into PHP by default, and in a single script, you can create a memory based database and its schema in just a few lines of code.

    Sometimes a issue cannot be described in a single script. This is ok. This would be the case if, for example, you found an issue in a larger system, like Zend Frameworks MVC layer. In this case, it makes sense that you need to provide a minimal ZF project to demonstrate the issue. In these cases, make sure to again, use a few files and as little code as possible to demonstrate the issue. Also, in the spirit of using generic code, ensure to make all file system paths relative. This will help the maintainer get up and running with the problematic project in a minimal amount of time, with minimal configuration.

A Reproduction Script By Example

The following is a reproduction script I have written based on an issue (ZF-3709) provided to Zend Framework in our issue tracker. I chose this issue to write a reproduction for because it offers the ability to talk about how one might go about describing the environment, more specifically what the database should look like in order to replicate the problem.

(This script can also be found at http://gist.github.com/307396)

<?php

/**
 * This reproduction script shall accompany the issue reported at
 * http://framework.zend.com/issues/browse/ZF-3709
 *
 * Assumptions:
 *   Zend_Db_Table_* from trunk
 *   PHP Environment has SQLite with :memory: capabilities
 *
 * Result:
 *   This script should run without any assertions failing (empty output)
 */

// ensure that Zend Framework trunk is being tested against & classes are available
// set_include_path('/path/to/ZendFramework/library');
require_once 'Zend/Loader/Autoloader.php';
Zend_Loader_Autoloader::getInstance();

// setup the adapter, this uses SQLite so that its minimally invasive
// to anyone wishing to reproduce the issue on their local machine
$dbAdapter = Zend_Db::factory(
    'Pdo_Sqlite',
    array('dbname' => ':memory:')
    );

// ensure all tables have access to the adapter
Zend_Db_Table::setDefaultAdapter($dbAdapter);

// setup the database, classes, & assertion system
setup();

/**
 * BEGIN Reproduction Code
 */

// find a record that has a relationship to some bars through foo_to_bar
$fooTable = new Foo();
$fooRow = $fooTable->fetchRow('id = 2');
$fooIdOnesBars = $fooRow->findManyToManyRowset('Bar', 'FooToBar');

// the expected values for the next call
$expectedValues = array(
    array('id' => '2', 'name' => 'bravo'),
    array('id' => '3', 'name' => 'charlie')
    );

// when we loop through the rows, they should match the expected results above
foreach ($fooIdOnesBars as $index => $barRow) {
    // I'll use assert here to throw warnings when expected does not match actual
    $actualValue = $barRow->toArray();
    assert($expectedValues[$index] === $actualValue);
}

/**
 * END Reproduction Code
 *
 * Supporting code below
 */ 

// setup function
function setup() {
    setup_database();
    setup_classes();
    setup_assertions();
}

// This function will setup the proper database structure with test data
function setup_database() {
    global $dbAdapter;

    $conn = $dbAdapter->getConnection();
    $conn->query('
        CREATE TABLE foo (
            id INTEGER PRIMARY KEY,
            name VARCHAR(25)
            );
        ');

    foreach (array('one', 'two', 'three', 'four') as $numberName) {
        $conn->query('INSERT INTO foo (name) VALUES ("' . $numberName . '");');
    }

    $conn->query('
        CREATE TABLE bar (
            id INTEGER PRIMARY KEY,
            name VARCHAR(25));
        ');

    foreach (array('alpha', 'bravo', 'charlie', 'delta') as $word) {
        $conn->query('INSERT INTO bar (name) VALUES ("' . $word . '");');
    }

    $conn->query('
        CREATE TABLE foo_to_bar (
            id INTEGER PRIMARY KEY,
            foo_id INTEGER,
            bar_id INTEGER,
            extra VARCHAR(20)
            );
        ');
    $datas = array(
        array('foo_id' => 2, 'bar_id' => 2, 'extra' => 'Two to Two'),
        array('foo_id' => 2, 'bar_id' => 3, 'extra' => 'Two to Three'),
        array('foo_id' => 3, 'bar_id' => 4, 'extra' => 'Three to Four'),
        );
    foreach ($datas as $datum) {
        $conn->query('INSERT INTO foo_to_bar '
            . '(' . implode(',', array_keys($datum)) . ')'
            . ' VALUES ("' . implode('", "', array_values($datum))
            . '");');
    }
}

// This function will define the proper Zend_Db_Tables and their relationships
function setup_classes() {

    class Foo extends Zend_Db_Table_Abstract
    {
        protected $_name = 'foo';
    }

    class Bar extends Zend_Db_Table_Abstract
    {
        protected $_name = 'bar';
    }

    class FooToBar extends Zend_Db_Table_Abstract
    {
        protected $_name = 'foo_to_bar';
        protected $_referenceMap = array(
            'Foo' => array(
                'columns' => 'foo_id',
                'refTableClass' => 'Foo',
                'refColumn' => 'id'
                ),
            'Bar' => array(
                'columns' => 'bar_id',
                'refTableClass' => 'Bar',
                'refColumn' => 'id'
                )
            );
    }

}

// assertion setup
function setup_assertions() {
    assert_options(ASSERT_ACTIVE, true);
    assert_options(ASSERT_WARNING, false);
    assert_options(ASSERT_CALLBACK, 'assert_failure');
}

// callback for assertion failures
function assert_failure() {
    global $expectedValues, $index, $actualValue;
    echo 'Was expecting an array that looked like:' . PHP_EOL;
    var_dump($expectedValues[$index]);
    echo 'But got array that looked like:' . PHP_EOL;
    var_dump($actualValue);
    echo PHP_EOL . PHP_EOL;
}

To the best of my ability, this script passes both of my earlier questions: “Yes, I did enough due-diligence in determining if this is really a bug.” AND “Yes, if I got this bug report, would I be able to reproduce it and understand it.”

A Few Considerations

This above script does not have unit tests, nor does it represent a patch to the existing framework. While that would be the most ideal, that sets the bar much too high for people to report worthwhile issues. The consumers of the code are not expected to be experts on the actual issue at hand, or even how to write valid unit tests that fully exercise a feature or bug. Ultimately, as a code maintainer, I simply want to be able to see the issue you are attempting to describe.

If you’d like to go above and beyond the standard reproduction script, you might also considering offering lines of code that you feel might be problematic. What that allows is maintainers to set breakpoints at specific locations and really drill down into the offending code.

I hope this helps developers understand what is expected of them as they file issue reports on open source code they use. By following these guidelines you’ll be doing a service to the maintainer by making their life easier, and even your own since reproduction scripts offer quicker turn around time for issues over those that require in-depth research.

Dynamic Assertions for Zend_Acl in ZF

August 13th, 2009 by Ralph Schindler

In Zend Framework 1.9.1, Zend_Acl gets two major issues resolved and a simple API change that now make it possible to create a more robust, more expressive ACL definition with less code. ZF issues ZF-1721 and ZF-1722, each nearly two years old, have both been solved. Over the last two years, I’ve seen a variety of duplicate issues come into the issue tracker, which stem from two fundamental flaws in Zend_Acl – “Zend_Acl::isAllowed does not support Role/Resource Inheritance down to Assertions” and “Zend_Acl assertions breaks when inheritance is required (ie DepthFirstSearch)”. In this article, we’ll explore the API changes that alleviate these two problems, and we’ll demonstrate how to leverage the Zend_Acl assertion system to create expressive, dynamic assertions that work with your applications models.

Backwards Compatible API Changes

Before discussing the issues, let’s go over the API change and how that affects the component. Previously, the two methods for setting up an ACL that were used by a developer were add() and addRole(). Interestingly, add() was intended to imply addResource(). Since add() implied that you were adding a resource, its clear that this component was created from the perspective of resources as a primary actor, and then roles and assertions as secondary actors.

The new API allows for the creation of an ACL by using strings instead of having to use Zend_Acl_Role and Zend_Acl_Resource objects explicitly. To me, this is a pretty important step towards what I’d like to see in 2.0. In 2.0, I would ideally like to see addRole() and addResource() accept strings for types of roles and resources to query against, and accept objects for explicit role and resource objects to query against (even if they match an already registered type). To put simply, I would expect addRole('user') and addRole($userObjectForRalph) to have different behaviors if different permissions were registered for each. This would allow me to specify specific access for the user object ‘ralph’ separately from the ACL’s for objects of role type ‘user’. The behavior can be further defined to either inherit from the type, or override type ACL’s depending on the desired effect. Ultimately, this would allow for a more dynamic experience with Zend_Acl.

Dynamic Assertions Example

In the following example, we’ll have a look at a common use case that is now possible in Zend_Acl. In plain English, what developers want to be able to do is be able to design assertions that can accept application models that implement the Resource or Role interface, and be able to apply some dynamic or custom logic to assess whether or not the given role has access to the given resource. As mentioned previously, this was not possible because in the process of checking the ACL tree, using a depth-first search, the calling resource and roles was lost, and only the original registered objects was being persisted into the assertions. Well, that’s fixed now.

For the purposes of this example, we’ll take a simple concept: a user needs to be able to only edit their own blog post. The user in this case, would be our applications model for users. The actual class will implement the Zend_Acl_Role_Interface. We will also have a BlogPost model which will serve as the resource in question, thus implementing the Zend_Acl_Resource_Interface. Naturally, our system will be able to handle users of different role ‘types’, but our BlogPost will only be of a single resource type ‘blogPost’.

Note: the following code is demonstration only. As such, some coding standards or conventions are not necessarily what you’d expect in proper object-oriented code or even a Zend Framework MVC based application. Some of the code might contain rouge ‘echo’ statements so that the demonstration below will be more expressive of what its actually doing.

class User implements Zend_Acl_Role_Interface
{
    // using public members here for brevity in this article
	public $id = null;
    public $role = 'guest';

    public function getRoleId()
    {
        return $this->role;
    }
}

class BlogPost implements Zend_Acl_Resource_Interface
{
	public $id          = null;
    public $ownerUserId = null;

    public function getResourceId()
    {
        return 'blogPost';
    }
}

Next, we’ll create the dynamic assertion. We generally would expect this assertion to be called when a User is requested to modify a BlogPost. This assertion will ensure that the BlogPost‘s owner id (the user id that owns said BlogPost), is the same as the provided User objects id. If it is, pass, if not, fail. Fairly common use case, right? Here is what our assertion should look like, with a few inline comments:

class UserCanModifyBlogPostAssertion implements Zend_Acl_Assert_Interface
{
    /**
     * This assertion should receive the actual User and BlogPost objects.
     *
     * @param Zend_Acl $acl
     * @param Zend_Acl_Role_Interface $user
     * @param Zend_Acl_Resource_Interface $blogPost
     * @param $privilege
     * @return bool
     */
    public function assert(Zend_Acl $acl, Zend_Acl_Role_Interface $user = null, Zend_Acl_Resource_Interface $blogPost = null, $privilege = null)
    {
    	echo ' == Checking the assertion ==' . PHP_EOL; // only here for the purposes of article

        if (!$user instanceof User) {
            throw new InvalidArgumentException(__CLASS__ . '::' . __METHOD__ . ' expects the role to be an instance of User');
        }

        if (!$blogPost instanceof BlogPost) {
            throw new InvalidArgumentException(__CLASS__ . '::' . __METHOD__ . ' expects the resource to be an instance of BlogPost');
        }

        // if role is publisher, he can always modify a post
        if ($user->getRoleId() == 'publisher') {
        	return true;
        }

        // check to ensure that everyone else is only modifying their own post
        if ($user->id != null && $blogPost->ownerUserId == $user->id) {
        	return true;
        } else {
        	return false;
        }
    }
}

Note: Assertions, as with ACL’s can be treated, and most likely should be treated, as application models. As such, if you are using the Zend Framework MVC application structure, you might want to name this one similarly to Default_Model_Acl_UserCanModifyBlogPostAssertion, and would live in application/models/Acl/UserCanModifyBlogPostAssertion.php. Likewise, the User class would actually be Default_Model_User, and BlogPost might be Default_Model_BlogPost.

Now that we have our models setup for our ACL to interact with, its time to define the actual ACL definition itself. For the purposes of this exercise, we’ll not assume that the ACL itself is a model, but our consuming script below will simply interact with it. In a Zend Framework MVC application, one might find the ACL defined as a model within your application, depending on your needs.

$acl = new Zend_Acl();

// setup the various roles in our system
$acl->addRole('guest');
$acl->addRole('contributor', 'guest');
$acl->addRole('publisher', 'contributor');

// add the resources
$acl->addResource('blogPost');

// add privileges to roles and resource combiniations
$acl->allow('guest', 'blogPost', 'view');
$acl->allow('contributor', 'blogPost', 'contribute');
$acl->allow('contributor', 'blogPost', 'modify', new UserCanModifyBlogPostAssertion());
$acl->allow('publisher', 'blogPost', 'publish');

The above code has produced a fully defined ACL object, at least for the purposes of this article, that we can now start interacting with. In the follow examples, we’ll interact with this ACL object. The User and BlogPost objects utilize public properties for brevity and illustrative purposes, but you can assume that these object properties might be populated and persisted via Zend_Db_Table row, a web service, or some other data source persistence layer.

$user = new User();
$post = new BlogPost();

// some default values
$user->id = 1;
$post->ownerUserId = 1;

/**
 * Demonstrate guest Privileges
 */
echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL; 

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') modify?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

/**
 * Demonstrate contributor Privileges
 */

$user->role = 'contributor';

echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL; 

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 5;

// the following two examples should demonstrate the assertion being checked

echo 'Can user (' . $user->role . ') modify someone elses blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 1;

echo 'Can user (' . $user->role . ') modify own blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

/**
 * Demonstrate publisher Privileges
 */

$user->role = 'publisher';

echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL; 

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 5;

echo 'Can user (' . $user->role . ') modify someone elses blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 1;

echo 'Can user (' . $user->role . ') modify own blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

Once you have all of that in place, you can see a the run of such a script would produce these results:

/home/ralph/test-script/$ php acl-inheritance.php

Demonstrating guest privileges
------------------------------------------

Can user (guest) view?
yes

Can user (guest) contribute?
no

Can user (guest) modify?
no

Can user (guest) publish?
no

Demonstrating contributor privileges
------------------------------------------

Can user (contributor) view?
yes

Can user (contributor) contribute?
yes

 == Checking the assertion ==
Can user (contributor) modify someone elses blogPost?
no

 == Checking the assertion ==
Can user (contributor) modify own blogPost?
yes

Can user (contributor) publish?
no

Demonstrating publisher privileges
------------------------------------------

Can user (publisher) view?
yes

Can user (publisher) contribute?
yes

 == Checking the assertion ==
Can user (publisher) modify someone elses blogPost?
yes

 == Checking the assertion ==
Can user (publisher) modify own blogPost?
yes

Can user (publisher) publish?
yes

Conclusion

Zend_Acl can now be used to make concise, dynamic and expressive ACL systems. The assertion system that is in place in Zend_Acl can be leveraged in ways never seen before out of the box. While the User/BlogPost example is on the simple side, you can use this article to start thinking about the different ways such a system can be leveraged in your own projects where dynamic assertions would simplify controller or model code that is already in place.

Database Abstraction Layers Must Live!

July 15th, 2009 by Ralph Schindler

I come preaching true hope, against the fallacies.

I’ve heard the arguments for and against database abstraction layers (DALs) time and time again. I must say first, I agree with them all, both sides, equally. Interestingly, I can put the vocal proponents of each side of the argument in one of two boxes: a programmer guy box, or a database guy box. For some unknown reason though, they never seem to see eye to eye.

Honestly though, I like to put myself in the middle of that argument. I see both sides. I think fine tuning an application’s core business with vendor specific features is tremendously important, after all, that is why there are so many competing database vendors. Generally speaking of database driven projects, I feel like planning to use a specific vendor up front, knowing its pro’s and con’s, and tailoring an application to the chosen database’s strengths can only help in the long run. Also, I feel that building a database model first before any code, offers many performance and scalability advantages than does code first development.

That said, I also see value in using a database as a simple data-store when the actual database is not a key component of the overall application. That’s right, it is completely valid to say that the data-storage & database component of an application sometimes is not the key component; a database guy probably will never agree with you there. Just as there are programmers who swear by this code first, database later mantra, there are database developers that will swear by the database first, code later mantra.

The fact is, each project is unique. It’s this uniqueness of projects and their execution that ultimately shapes the perspectives of developers as well as the tools they write and consume. To say that one mantra is clearly a better choice over another is simply being ignorant.

The Use Case of Abstraction Layers

To be honest, I don’t really buy the “I might switch database vendors at some point” argument either, as Jeremy Zawodny points out. For larger projects (on the scale of the facebooks, the twitters, etc), switching the database underneath after a project has been in production is a monumental task- regardless if you have an abstraction layer or not. Chances are, you used some of the database specific features, not to mention, you now have a large set of mission critical data that also has to be ported. Long story short, its never as easy as swapping the abstraction layers database adapter out.

What I will buy though, is there are some problems that fall in thicker end of the Pareto Principle that can be solved with a database abstraction layer. For the uninitiated, the Pareto Principle is effectively the 80/20 rule. In software use cases, when applying this term- the 80% use case is the majority of use cases. These use cases are generally not that interesting in terms of database interaction. To give it a label, we can call these the CRUD, BREAD, or <<insert your favorite terminology here>> operations. That is not to say that these operations are not important, but they are not special. In fact, they are so un-special, that we can just about apply a standard query syntax (SQL 92) to them, and expect that the query is both portable between databases and common across applications that wish to use them.

This is where database abstraction fits in. As a developer, you’ll come across this problem time and time again. A large portion of an application are CRUD screens and the smaller more interesting part of your application is your reporting screens. With an abstraction layer, we are able to code against both a unified API as well as have a layer that will produce consistent and vendor compatible queries. This allows us to build more specialized data access layers (patterns) for multiple database vendors with great ease. You want Table Gateway- done, you want Row Gateway- done, you want Active Record- done. Each can be implemented to tackle the 80% part of the 80/20 rule when applied to the database centric business code of an application.

The Slow Path & The Fast Path

When I talk about this 80/20 rule in terms of the applications we write, I like to further refine the terminology so that it easier to visualize. The most prominent terms that helps developers visualize the 80/20 rule in their application is the slow path of your application, and the fast path of your application. Each of these terms has a set of characteristics that set each apart from one another:

Slow Path:

  • Performance is not of primary importance
  • Has an interactive nature
  • Validation and verification of data are of high priority
  • Application to data-store interactions are fairly trivial
  • Does not comprise applications core business logic

Fast Path:

  • Performance is of importance
  • Limited interactive nature, information flow is fairly static (non-interactive)
  • Flow of information consist of already verified and validated data (originates from the databsae)
  • Application to data-store interaction can become complex (JOINs, SUB-SELECTS, VIEWS)
  • Is the core business of the application

To get a better understanding of how the terms are applied, lets look at a typical web application. Generally speaking, there are a few web based forms that users interact with. These forms are the entry point of a code path that does not get a lot of throughput. This is generally because forms are submitted by people, and people can only type and submit forms so fast. In addition to this being a less traveled code path, it also has a few checks along the way- validation of data, and verification of data. Typically, the problems of verification and validation of data are not too unique to the application being executed. In fact, the web forms, validation and verification problems have been solved over and over again by various libraries.

On the other side of the equation, there is the aggregation and merging of the stored data (which inevitably came from the aforementioned web forms.) Since the unique aggregation and processing of this data is the core aspect of business of said application, it stands to reason that this code path will be more well traveled by users. This, is the fast path. The problems solved in this code path are generally unique and since they are unique, it’s hard to find an off the shelf solution to these problems.

Since this is where the money is to be made, it also stands to reason that developers should concentrate their efforts in the fast path of their application. This means they should solve the slow path problems of their application with existing tried and tested solutions- this includes generic forms solutions, validation and verification libraries and yes, database abstraction layers.

Getting Cozy With Zend_Db, a Database Abstraction Layer

Not that we’ve made a use case for DAL’s, what would one look like? Well, I’ll use Zend Frameworks Zend_Db as my use case.

The connection code:

$dbAdapter = Zend_Db::factory(array(
    'adapter' => 'Pdo_Mysql', // could be Pdo_Sqlite, Mysqli, Pdo_Mysql, Db2, or even Oracle
    'params' => array(
        'username' => 'test_user',
        'password' => 'test_pwd',
        'dbname' => 'test'
        )
    ));

You’ll note that since this factory takes a standardized array, it makes it trivial to swap out various connection information for different adapters.

Simple queries:

$data = array(
    'name'        => 'Remember the Milk',
    'description' => '2% Milk'
    'due_on'      => '2009-07-15',
    );
$dbAdapter->insert('todo_list', $data); // insert that data

// or
$lastInsertId = $dbAdapter->lastInsertId('todo_list');
$dbAdapter->update('todo_list', array('completed' => 'YES'), 'id = ' . $lastInsertId);

$dbAdapter->delete('todo_list', 'id = ' . $lastInsertId);

Here you’ll notice the generic and abstracted nature of this API. Since there are several tasks in database interaction that are consistent across the board, those such as INSERT, UPDATE and DELETE, it makes sense that we can create a generic API for handling such interactions. These interactions (INSERT, UPDATE and DELETE) represent the mutation methods of a database and as such, represent the most predominant way of getting data into a system.

For all intents and purposes though, simple SELECTs are fairly standardized too. They are standardized enough as to compliment the INSERT, UPDATE, and DELETE abstractions so that we can find actual rows to do these mutation operations.

Now that we have a simple and consistent API for doing simple SELECTs, INSERTs, UPDATEs, and DELETEs; we can implement something a little more interesting: the table & row gateway:

Zend_Db_Table_Abstract::setDefaultAdapter($dbAdapter);
$userTable = new Zend_Db_Table('user'); // ZF 1.9 feature
$userRow = $table->find(5); // find user by id 5 (primary key);
echo $userRow->username;

Immediately, you should see the inherent value in the above example. Rudimentary and common tasks can now be handled with a consistent and simple API. But what happens when you’ve started using this DAL, and you want to use a vendor specific feature? Well..

// assuming what you want is really REPLACE or INSERT IGNORE from mysql
$dbAdapter->query('INSERT IGNORE INTO configuration (name, value) VALUES (?, ?)', array($name, $value));

// OR
$dbAdapter->query('REPLACE INTO configuration (name, value) VALUES (?, ?)', array($name, $value));

As you can see, the query method of our database adapter will allow us to pass custom SQL into the database thus taking advantage of vendor specific features.

What if you want to combine both paradigms for ultimate flexibility?


// assuming Zend_Db_Table_Row, with a FriendshipReference rule
$friendRowset = $currentUserRow->findDependentRowset('User', 'FriendshipReference');

// collect friend id's
foreach ($friendRowset as $friendRow) {
    $friendIds[] = $friendRow->related_user_id;
}

$inClause = ' IN (' . implode(',', $friendIds) . ')';

$select = $dbAdapter->select();
$select
    ->from('user', array(
        'user_id',
        'related_user_id',
        'became_friends_on'
        ))
    ->where('user_id ' . $inClause);

// interact with driver directly
$mysqli = $dbAdapter->getConnection();
$mysqli->query('CREATE TEMPORARY TABLE friend ('
        . ' `user_id` int(11) NOT NULL,'
        . ' `related_user_id` int(11) NOT NULL,'
        . ' `became_friends_on` DATE NOT NULL'
        . ' ) ENGINE=MEMORY;'
    );
$mysqli->query('INSERT INTO friend ' . (string) $select);

// query new friend view
$friendTable = new Zend_Db_Table('friend');
$rows = $friendTable->fetchAll(
    'became_friends_on > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)',
    'became_friends_on'
    );

While that above example is “a bit out there”, it does show that even with a DAL, if it’s flexible enough, you can code as close to or as far away from the database as you like. Ultimately the mantra here is: lets get the job done in the most effective, efficient and sound way possible.

Conclusions

Simply put, a database abstraction layer is just another tool in the toolbox. You don’t have to completely change your paradigm of programming, nor do you have to apply an all-or-none approach to using a DAL. When applied correctly, you can build out the slow path of your application in little to no time, while leaving extra time for developing and fine-tuning the fast path of your application. And to keep code from becoming unruly, simply apply some best-practices code organization to your project.