PHP Constructor Best Practices And The Prototype Pattern

March 9th, 2012 by Ralph Schindler

If your knowledge of constructors ends with “the place where I put my object initialization code,” read on. While this is mostly what a constructor is, the way a developer crafts their class constructor greatly impacts the initial API of a particular class/object; which ultimately affects usability and extensibility. After all, the constructor is the first impression a particular class can make.

Constructors, in their current form, have been in PHP since 5.0.0. Previous to 5.0, PHP loosely followed the style similar to that of C++ where the name of the method matching the name of the class would act as the class constructor. PHP 5 brought us the __construct() “magic method” which greatly formalized the new object initialization routine.

Before jumping into some of the topics covered in this post, there are a few things you might want to be familiar with. First, be familiar with the SOLID principles, particularly the S (single responsibility principle), the L (Liskov substitution principle, commonly referred to as the LSP), and the D (dependency inversion principle). More to the point of the latter, review a previous post on Dependency Injection in PHP for background dependency injection specific to PHP.

The Constructor Signature

In PHP, you create a constructor by adding a method called __construct() to your class. The __construct() method is an instance method and as such, is not marked static. For all intents and purposes, consider the __construct() magic method as a special type of static object factory, one which will always return the type of the object requested via the new keyword.

class Foo {
    public function __construct() {
    }
}

$object = new Foo();

In the above code, PHP will, upon executing new Foo(), internally create a new object from scratch, execute __construct() in the Foo class, and assign this object to the variable $object. Pretty standard stuff. What’s important to know here is that before new Foo(), the object did not exist. It is this fact alone that makes this completely different from any other kind of instance method. That said, without getting into the gritty details, it is this fact alone that excuses the __construct() method from the same rules of the LSP that might apply to other instance methods.

This means that all of the following are legal:

class Foo {
    public function __construct() {
    }
}

class Bar extends Foo {
    public function __construct(ArrayObject $arrayObj, $number = 0) {
        /* do stuff with $arrayObj and $number */
    }
}

class Baz extends Bar {
    public function __construct(Bar $bar) {
        // yes, this is the proxy pattern
    }
}

The above, with E_STRICT enabled, will not produce a warning. Yet, if you renamed all of the __construct methods to anything else, they will produce a E_STRICT warning like:

Strict standards: Declaration of Bar::somemethod() should be compatible with that of Foo::somemethod()

Why is this the case? Simply put, the LSP referrers to sub-types of a particular object, and since before the __construct() method, no type exists (yet). This rules simply cannot apply to something that does not exist. For a more detailed response, go here.

What you should take away from this is that the best-practice is that each concrete object has a constructor with a signature that best represents how a consumer should fully instantiate that particular object. In some cases where inheritance is involved, “borrowing” the parents constructor is acceptable and useful. Furthermore, it is encouraged that when you subclass a particular type, that your new type should, when appropriate, have its own constructor that makes the most sense to the new subtype.

At this point, it should be noted that most other languages do not allow constructors to be marked final, be abstract, or be marked as statics (see above on the static note). Moreover, constructors should not appear in interfaces. In PHP, these rules do not apply, and are all possible. For the reasons listed above, a developer should avoid the practice of marking constructors final, making them abstract, and putting them interfaces, assuming they are trying to utilize PHP’s OO model in a SOLID way. In PHP 5.4, it is also worth knowing that by having constructors in interfaces breaks the common expectation that subtypes are capable of creating their own constructors in favor of enforcing a particular method signature.

Constructor Overloading

PHP does not have method overloading. This also applies to constructors. A class of a specific type can only have one constructor. Since this is the case, PHP developers sometimes loosen a methods signature in order to accommodate multiple use cases. This is done by removing or reducing the types enforced in the constructors signature to allow for more varied types to be passed in by the consumer.

This is an acceptable best practice when done appropriately. What does appropriately mean? What is “appropriate” is, of course, very much subjective. Generally speaking, the differences in the various signatures supported should be minimal at best, yet meaning should still communicated through the name of the parameters. For example, let’s take this constructor:

class Db {
    /**
     * @var string|array|DriverInterface $driver
     */
    public function __construct($driver) {
        if (is_string($driver)) {
            $driver = $this->createDriverFromString($driver);
        } elseif (is_array($driver)) {
            $driver = $this->createDriverFromArray($driver);
        }

        if (!$driver instanceof DriverInterface) {
            throw new Exception();
        }
    }
}

The above signature __construct($driver) technically supports 3 effective signatures:

__construct(/* string */ $driver);
__construct(/* array */ $driver);
__construct(DriverInterface $driver);

The actual signature has not changed, but it is represented all 3 effective signatures that can be further described by the PHP DocBlock.

Constructor Injection

At this point in the PHP community and in PHP-centric developer circles, it is generally accepted that injecting your dependencies is a best-practice. How developers go about injecting these dependencies is still very much debated and, in-part, up to personal and/or team preference.

There are several such methods of dependency injection: interface injection, setter injection and constructor injection to name the primary forms. For the purposes of this post, constructor injection is our primary candidate for discussion.

In short, constructor injection is a pattern of injecting all of your required dependencies into a constructor. These dependencies are usually other objects, often called services. The primary benefit of constructor injection is that after you instantiate the target object, generally, it is in the complete “ready state,” meaning that it is ready to do real work. A typical constructor signature sporting constructor injection looks like this:

class UserMapper {
}

class UserRepository {
    public function __construct(UserMapper $userMapper) {
        $this->userMapper = $userMapper;
    }
}

The above example clearly demonstrates that before a developer can use a UserRepository object, they must first inject it with a UserMapper object.

In PHP, while in recent times we’ve started favoring dependency injection (which can add some complexity to code), we have traditionally gravitated towards code that is easy write and easy to use. Practicing good dependency injection can be tedious at times and, in many cases, dependencies for objects can be stubbed by a sensible default. This practice is also known by the name of Poka-Yoke. It allows us to develop an API that supports explicit injection of dependencies while promoting ease of use in common or majority use cases. Consider the following code:

class UserMapper implements UserMapperInterface {
}

class UserRepository {
    protected $userMapper;
    public function __construct(UserMapperInterface $userMapper = null) {
        $this->userMapper = ($userMapper) ?: new UserMapper;
    }
}

While the UserRepository allows you to inject your dependency of the UserMapper, it will, if one was not provided, instantiate a sensible default UserMapper for you. The benefits are that in the most common use cases, it is a one step usage scenario (just instantiate the UserRepository). But in unit testing scenarios or scenarios where you want to inject an alternate implementation of a UserMapper, that can be achieved through the constructor.

Dynamic Class Instantiation

Generally speaking, the following code, while legal, should be used very seldom, and only when other possible instantiation patterns have been exhausted:

$obj = new $className();
if (!$obj instanceof SomeBaseType) {
    throw new \InvalidTypeException();
}

Why is this a bad pattern? First, it makes the assumption up front that the constructor signature is free from any required parameters. While this is good for object types that are already known to this factory, it might not always be true of a consumers subtype of the base object in question. This patten should never be used on objects that have dependencies, or in situations where it is conceivable that a subtype might have dependencies because this takes away the possibility for a subtype to practice constructor injection.

Another problem is that instead of managing an object, or a list of objects, you are now managing a class name, or list of class names in addition to an object or list of objects. Instead, one could simply manage the objects.

If, on the other hand, you know this particular object type is no more than a value object (or similar), with no chance of it needing dependencies in subtypes, you can then cautiously use this instantiation pattern.

Prototype Pattern

So how does one create an unlimited number of objects of a particular type, with dependencies in tact, each with slight variations? Enter the prototype pattern. This is an important pattern to keep handy when you know that you’ll have objects that need to be replicated in some way and they also have service dependencies that need to be injected.

To draw a parallel, this is similar to how Javascript handles its object model. To sum prototyping up in Javascript: functions and properties are defined once per prototype rather than once per object. The new keyword instructs the engine/runtime to create a copy of the prototype and assign to a variable for further specification and interaction.

This is similar to what the Prototype Pattern does in an object-oriented inheritance model. Up front, you create a prototypical instance. This instance will have all its dependencies injected, and any shared configuration and/or values setup. Then, instead of calling new again, a factory (or the consumer) will call clone on the object (a shallow clone will be made), and a new object will be created from the original prototypical object. This newly cloned object can then be further specified, injected with the variations that make this new object unique, thus interacted with as a unique object.

Lets consider the following example involving a database connection and the Row Gateway pattern. We want to iterate a dataset from a database and during iteration, present each row as a RowGateway object. One way of handling this would be to get the array of data from the database, then during iteration, create a new RowObject from scratch injecting the database connection:

class DbAdapter {

    public function fetchAllFromTable($table) {
        return $arrayOfData;
    }

}

class RowGateway {

    public function __construct(DbAdapter $dbAdapter, $tableName, $data) {
        $this->dbAdapter = $dbAdapter;
        $this->tableName = $tableName;
        $this->data = $data;
    }

    /**
     * Both methods require access to the database adapter
     * to fulfill their duties
     */
    public function save() {}
    public function delete() {}
    public function refresh() {}
}

class UserRepository {

    public function __construct(DbAdapter $dbAdapter) {}

    public function getUsers() {
        $rows = array();
        foreach ($this->dbAdapter->fetchAllFromTable('user') as $rowData) {
            $rows[] = new RowGateway($dbAdapter, 'user', $rowData);
        }
        return $rows;
    }
}

A UserRepository will be constructed with a database adapter object. It will then query the database, returning an array of all the rows that satisfied that query. With each row of data, it will create a fresh RowObject from scratch, injecting all the dependencies, configuration and the row data.

At first glance, you might ask “what if I have a specialized version of RowGateway I want to use?” That solution can be easily handled by instead of hard-coding the RowGateway class, but by use the Dynamic Class Instantiation pattern described above:

class UserRepository {

    public function __construct(DbAdapter $dbAdapter, $rowClass = 'RowGateway') {}

    public function getUsers() {
        $rows = array();
        foreach ($this->dbAdapter->fetchAllFromTable('user') as $rowData) {
            $rowClass = $this->rowClass;
            $row = new $rowClass($dbAdapter, 'user', $rowData);
            if (!$instance of RowGateway) {
                throw new InvalidClassType();
            }
            $rows[] = $row;
        }
        return $rows;
    }
}

This partially solves the problem in that now we can now use our own specialized class for the RowGateway implementation, but this too has its own special set of limitations. First, we are incorrectly making the assumption that the constructor signature of a subtype of RowGateway is exactly the same as the base type. This means that if a subtype has additional dependencies, that class will need to do the static dance in order to locate and consume those dependencies that it needs to achieve its specialized functionality. By making this assumption of the classes constructor signature, we’re limiting the consumers ability to practice polymorphism in the subtypes that they might need to have created.

For example, if a consumer wanted to be able to have a RowGateway object that wrote data to one specific database, but refreshed its data from a different database, how might one be able to inject two different DbAdapters into a RowGateway object to achieve this end result?

The answer is to use the Prototype Pattern, and in practice (via pseudo-code), looks like this:

class DbAdapter {
    // same as before
}

class RowGateway {

    public function __construct(DbAdapter $dbAdapter, $tableName) {
        $this->dbAdapter = $dbAdapter;
        $this->tableName = $tableName;
    }

    public function initialize($data) {
        $this->data = $data;
    }

    /**
     * Both methods require access to the database adapter
     * to fulfill their duties
     */
    public function save() {}
    public function delete() {}
    public function refresh() {}

}

class UserRepository {

    public function __construct(DbAdapter $dbAdapter, RowGateway $rowGatewayPrototype = null) {
        $this->dbAdapter = $dbAdapter;
        $this->rowGatewayPrototype = ($rowGatewayPrototype) ? new RowGateway($this->dbAdapter, 'user')
    }

    public function getUsers() {
        $rows = array();
        foreach ($this->dbAdapter->fetchAllFromTable('user') as $rowData) {
            $rows[] = $row = clone $this->rowGatewayPrototype;
            $row->initialize($rowData);
        }
        return $rows;
    }

}

By using a prototypical instance as the base for all future instances, we now allow the consumer the ability to extend this base implementation using sound object-oriented/polymorphic best-practices to achieve their end result. So, assuming our above example of the read/write adapter, a consumer can write:

class ReadWriteRowGateway extends RowGateway {
    public function __construct(DbAdapter $readDbAdapter, DbAdapter $writeDbAdapter, $tableName) {
        $this->readDbAdapter = $readDbAdapter;
        parent::__construct($writeDbAdapter, $tableName);
    }

    public function refresh() {
        // utilize $this->readDbAdapter instead of $this->dbAdapter in RowGateway base implementation
    }
}

// usage:
$userRepository = new UserRepository(
    $dbAdapter,
    new ReadWriteRowGateway($readDbAdapter, $writeDbAdapter, 'user')
);
$users = $userRepository->getUsers();
$user = $users[0]; // instance of ReadWriteRowGateway with a specific row of data from the db

Parting Words

Be nice to people who want to consume and extend your code. A constructor is more than just a place for initialization code. How you craft your constructors, the patterns you use for their signatures, and how you expect to get new instances of objects greatly affects the ability of consumers to extend your code without having to jump through too many hoops in order form them to achieve their specialized use case. It is always better to fall back on SOLID object-oriented practices than to limit someones possibilities by forcing them into coding patterns that require reading in-depth documentation on how the original author expects someone to extend their code.

Learning About Dependency Injection and PHP

May 18th, 2011 by Ralph Schindler

Over the past few years, there are a few concepts and programming patterns that have muscled their way into the hearts and minds of PHP developers from other languages and programming communities. These concepts range from the MVC application architecture as well as various modeling techniques (think ActiveRecord and Data Mapper), to a pure shift in the way we think about application architectures, like aspect-oriented programming (AoP) and event-driven programming. Perhaps it’s because PHP has been adopted at an enterprise level thus increasing the demand for what developers might call enterprise quality programming patterns, or perhaps it’s simply because of PHP’s ever evolving object model that makes new things possible. After all, who doesn’t like new shiny things? Whatever the reason, one of the newest concepts (at least over the past 3 years or so) that has emerged as one of our heated topics of debate is how to manage object dependencies. Interestingly, the argument of how to manage dependencies is generally named by the solution which its proponents give as the solution: dependency injection (the abstract principle is actually called Inversion of control).

In any circle of developers that are of the object-oriented persuasion, you’ll never hear an argument that dependency injection itself, is bad. In these circles, it is generally accepted that injecting dependencies is the best way to go. Injecting object dependencies in PHP looks like this:


// construction injection
$dependency = new MyRequiredDependency;
$consumer = new ThingThatRequiresMyDependency($dependency);

That’s basically it. There are many variations of this: setter injection, interface injection, call time injection, in addition to the above mentioned constructor injection. These are all valid ways of injecting the dependencies into the consuming object. Ultimately, the goal here is to avoid this:


class ThingThatHasAnExternalDependency
{
    public function __construct() {
        $this->dependency = new ARequiredDependency;
        // or
        $this->secondDependency = ARequiredDependency::getInstance();
    }
}

The above code is an example of a violation of the Hollywood Principle, which basically states: “Don’t call us, we’ll call you.”.

Yet, this is not the heart of the argument. Perhaps it was 4-5 years ago in the PHP community, but it’s not anymore. The heart of the argument is not should we be doing it, but how do we go about doing it.

This article is not about the intricacies and implementation details of DI containers and DI frameworks. It’s also not about the various ways and means of injecting dependencies into other objects, or which method might be better. In fact, this article has no opinion if injecting dependencies is even good for you or your application. This article is an exploration how adopting any DI framework for PHP affects the lifecycle of a project, both the code as well as the developer, team or organization that is constructing it.

A Brief History of Dependency Management In PHP

It is important to know why PHP is as popular as it is, after all, it’s this popularity that DI Frameworks fight against for adoption inside a PHP application framework. To understand PHP’s popularity, history, and evolution, let’s look at this code:

// these 6 lines actually represent 5 different web centric "langauges"!
include_once 'includes/config.php'; // ultimately there is a mysql_connect() call in here somewhere
include_once 'templates/header.php';
$rows = mysql_query('SELECT * FROM users'); // magically uses the mysql_connect() resource
foreach ($rows as $row) {
    echo '<div class="user-row"><a href="/delete-user.php" onclick="someJSFunction();">' . $row['username'] . '</div>';
}
include_once 'templates/footer.php';

From the beginning, we’ve been trained into thinking that our dependencies are magically managed. As you can see above, the mysql_query() function, while it will accept a connection resource, does not require it. In fact, if it’s not supplied, it will use the first open mysql connection it can find inside the PHP runtime. Assuming that the above mentioned delete-user.php script is part of a larger collection of PHP scripts, which we will call “the application” … it is important to note that even this script itself is pulling in its dependencies instead of them being injected. For all intents and purposes, the config.php, header.php and footer.php are all dependencies of this script, much like other scripts similar in nature to this delete-user.php. To sum it up, if there is a new dependency that is now required by the business logic portion of this application (ie: the lines between the header and footer), they now have to be introduced to all scripts in this application. This does not exactly adhere to the DRY principle.

But, let’s take a step back and look at this snippet of code from the organizational perspective. To do this, we must first understand the various phases of the code’s lifecycle within any organization. For the purposes of this example, let’s assume that from idea to production, code will go through the following phases: development, build, deployment, to application start-up (in production). If this were a C/C++ or Java project, code will have been written (developed), it will have been compiled (built), then it would have been packaged or some deployment tool’s process invoked (deployed); it them would have been run (executed via some startup script, or executing a binary.) PHP, and Perl at the time, achieved all of the same objectives but in fewer steps making it a wildly popular platform for highly iterative web projects. This same application in PHP would have been coded in some text editor (developed), and FTP’d up to a production server (deployed). You’ll notice that it neither had to be built/compiled, or started on the server since the target, Apache, was already running with PHP embedded into it. For all intents and purposes, a cheap and easy FTP tool was both the build and deployment tool for this application’s lifecycle.

It was this simplicity that made PHP the popular choice for web applications. This popularity was attained because the simplicity of the PHP platform allowed for two extremely important facets of development to emerge: the idea of building an application became approachable to even the novice individual, and without all the cruft that came along with the application lifecycle, building and deploying applications in PHP increased PHP’s “fun-ness” factor.

While this style of building applications allowed for a proliferation of PHP applications to be developed, there was in fact a negative side to be revealed later in time. As applications quickly grew, their ability to be maintained decreased. We give them the name “Spaghetti code”, and for all the right reasons. Objects, if they were even being used, were generally wrappers around procedural functionality. So object dependency management wasn’t even a consideration for most developers. Looking back, perhaps it was this original simplicity that allowed developers to create applications without even having to know what a dependency was or how to find it. In any case, as these applications grew uncontrollably, maintaining them and hacking them started to lose the PHP fun factor exponentially.

A Brief History of DI Frameworks

As PHP developers started identifying the problems with their Model 1 applications, they started looking for solutions in other programming communities. At this time, the Java community was still heavily rooted in the enterprise/software development/software engineering world, and problems such as dependency management already had some interesting solutions. Most notably, there was the Spring Framework, who’s primary facility for dependency management was a component called the IoC Container, or the Inversion of Control container. This container managed the fully lifecycle of object creation using callbacks. This meant that you no longer has to use the “new” keyword (the same new keyword in PHP). Also, it wired the dependencies for you at instantiation time. This meant that you no longer had to concern yourself with how dependencies were injection; be it through the constructor, properties or setter methods. The Spring Framework was one of the first frameworks that encouraged the use of definition files to manage the knowledge required to wire all your dependencies together. True to form in the Java community, these definition files were created in XML.

As it might seem, this is indeed a deviation from the PHP philosophy that had made PHP so popular. PHP allowed you to write the most minimal amount of code to complete your application. In the Java/DI world, particularly with the Spring framework, you had a much richer application lifecycle. Not only were you developing code for your appliation, but you were creating code about code to manage code. This is known as meta-programming. In addition to this meta-programming that was going on, you also now had this compilation phase required by the Java platform which was generally tucked away inside your build time tasks. Moreover, this application had to be deployed (there were generally tools for this too), and (for good measure), due to the platform, your application had to be started. Needless to say, this application lifecycle might seem heavier, for lack of a better term, to the average PHP developer.

Since then, several frameworks have cropped up that sport some kind of dependency management. Before this technique was picked up in PHP, they were all heavily rooted in the Java and .NET communities. A quick google search will return a few notable names like PicoContainer, Spring.NET, Unity, Butterfly and google-guice to name a few. These frameworks attain popularity since they attempt to ease some of the burdens that DI places upon the developer whether it be by using reflection to create definitions, or even adding an annotation system so that DI definitions can be written inside the code they are set to manage.

DI and PHP

To understand the attainability of having a dependency management framework for PHP, one should first understand how the counterparts in Java and .NET rely upon their respective platforms to do certain jobs. For a quick reference, see the images from this blog post. One of the more important facets to remember is that the expected application lifecycle of a Java/.NET application is much richer. You are expected to have build-time tasks. You are expected to have deployment tasks. And, generally, your application understand the difference between being in development, staging and production – so it can adjust how it runs accordingly. Moreover, the platform itself has facilities in place that aid the developer both in development time with code generation as well as in production.

PHP never expects or facilitates the usage of any kind of build-time tasks. PHP also does not have any kind of built-in annotation support (a meta-programming technique), nor does it have any kind of application scope or per-application memory space. What does this mean for someone who is creating a DI container? Let’s explore.

Development Time

General speaking, any time you are writing, altering or just shifting code around, you are in development mode, your application should be running in a development environment. The structure of your application’s classes, functions and files within the filesystem is probably changing with each time you click save. Dependency management systems require knowledge of your code in order to effectively do their job. This knowledge generally comes in the form of some kind of definition.

This definition can be created by hand, by the developer, generated at runtime by some application hooks, or generated with the use of a special tool. If this is done by hand, a developer is required to explicitly map the various functions/methods that will need to be called in order to inject a particular object dependency. The more dependencies you have, the more verbose this definition might become.

A better route would be to generate this definition file, after all, the code you’ve written, if written correctly will self-describe its dependencies. There are two options for generation, manual and automatic. An example of manual generation would be a developer giving a command line tool the minimal information it needs to be able to go parse your code, figure out the dependency map for itself, and generate some kind of definition to be used during runtime. Minimal information might include some kind of seed information like where to find your classes or perhaps what filters to use when inspecting classes. Sometimes, these tools might make use of special interfaces (also called interface injection) to understand that their purpose is to describe the various dependencies of the class implementing said interface. Another approach might be to utilize special annotations on classes and class methods that describe the various required and optional dependencies and how they are to be injected.

The same techniques employed in this manual approach could also be put to use in an automatic approach. In automatic approach, imagine this same command line tool from the manual approach was now a service of the application itself. While in development mode, it would run as often as need be in order to determine if code changes have happened. If they have, the service would regenerate the dependency definition file so that the rest of the application can utilize the dependency definition inside the DI container available to the application during runtime.

There are a couple of concerns that are specific to PHP with regards to dependency management. Since PHP is a share-nothing architecture with no application level memory, this definition would need to be loaded and parsed and put into memory on each request. The larger the dependency tree that you track, the larger the memory footprint of the dependency definition graph. Furthermore, since this definition has to be loaded on each request, if it is in a non-native format (meaning anything other than PHP code), there are certain costs with converting this format, be it XML, YAML, JSON, or INI to the in-memory structure that the dependency management container requires. What’s more, the PHP platform does not keep track of file changes. So without some kind of user-land tracking, it is hard to know what files during development have changed. Thus, your dependency management system, if it’s taking an automatic approach, would have to rescan the filesystem for changes upon each request during development – which has its own consequences.

Deployment Time

When one is done writing code and is ready to push this application into production, the act of pushing this application is called deployment. The mode for this application is now considered “production”. In production, you can be sure that the structure of your code is stable and will not change, thus your dependency graph is now safe from changes too. Since this is the case, there is no longer a need to keep updating and regenerating this dependency definition file like you were during development.

Even though the definition is no longer changing, there still is the concern about how expensive it is to load this definition each request. Naturally, the cheapest form of definition would be a PHP array or structure describing the definition that can then be loaded in-memory. Other file types like XML, YAML, JSON, etc first have to go through a parsing phase before they can be used. This activity of parsing these files could be expensive, and could benefit from some kind of caching. Caching the definition in some way shape or form, would ensure there is minimal overhead per-request when the application is using this dependency management container.

Other Observations & Criticisms

It is important to realize that dependency management solutions in and of themselves are, in all the available words, full frameworks. They require that you understand both their philosophy as well have a minimal understanding of what facilities they are offering in order to use them effectively. To understand the true benefits of any framework one must first know the pain points the framework is attempting to solve. Seeing the end result of a framework without knowing what it is facilitating might lead to one to dismiss it as overkill or unintuitive. For example, take the following code (typical of dependency management systems)

$userRepository = $dic->get('UserRepository');

If you encounter this line of code without fully understanding the dependency injection container being used, you wouldn’t be able to appreciate its usefulness. You could instantiate your Application\Model\UserRepository yourself, sure, but you’d also have to locate and inject the database adapter to use and into that you’d have to inject and load the configuration for that database connection. If you are doing this in multiple controller actions, there is a lot of repeated boilerplate code that is required to “wire” the UserRepository object. Internally, the DiC object is loading and consulting a definition, creating objects, injecting those objects, and returning the requested object that has been fully wired and ready to use.

The above code also demonstrates two common criticism of dependency management frameworks, which is also a criticism of frameworks in general. By using this framework, you are moving further away from the facilities of the language or platform itself. Instead of using the “new” keyword to create a new object, you’ve asked another object to create this requested object for you. What this has done has shifted developers away from utilizing the language’s well understood API and onto the framework’s API. Additionally, this kind of code is not easily understood by IDE’s. While special features could be added to the IDE to support this framework, it does not inherently know what kind of object is being returned by the $dic->get(..) method call.

Summary

While dependency management frameworks have clear drop-in benefits, there exist a few considerations that have unknown or unexplored consequences. For example, if the benefit is such that all dependencies are managed, and all a developer has to do is configure it, does that encourage deeper object graphs when creating classes and class dependencies? If so, what is the performance impact of these deep object graphs, particularly on the PHP platform. What are the memory implications of such object graphs, what are the speed implications of them? Furthermore, if one needed to debug an object that has been generated by a dependency management framework, is that easily possible?

At the end of the day, whether or not to use a dependency management framework is a matter of cost versus benefit. In order to be able to make an informed decision, a developer should consider a few scenarios. First, one should know what code might look like with and without this new framework. This will give an indication of the cost/benefit at the code level, does it actually save lines of code, and developer headaches? Secondly, one should consider how much added knowledge a developer or a team of developers need in order to understand this framework. Lastly, one should consider what kind of performance impact implementing this new framework has on the application’s throughput.

Compiling Gearman (or anything) for Zend Server CE on Snow Leopard

May 12th, 2010 by Ralph Schindler

The first thing you need to know about Mac OS.X Snow Leopard all Mac’s and Macbook Pro’s is that this hardware is 64 bit capable. This may not mean you are running a 64 bit kernel, it simply means that the operating system is capable of executing x86 64bit executables. We won’t go into the details of kernel architecture, you can read more about that here.

What is important though is that both x86_64 and i386 based executables can run on snow leopard. What is not uncommon on OS.X is to have executables (and libraries) that have multiple architectures compiled in. To see what architectures are inside a particular file, run something like this:

    /usr/local# file /usr/bin/php
    /usr/bin/php: Mach-O universal binary with 3 architectures
    /usr/bin/php (for architecture x86_64): Mach-O 64-bit executable x86_64
    /usr/bin/php (for architecture i386):   Mach-O executable i386
    /usr/bin/php (for architecture ppc7400):        Mach-O executable ppc

    /usr/local# file /usr/local/zend/apache2/bin/httpd
    /usr/local/zend/apache2/bin/httpd: Mach-O executable i386

This means that PHP (supplied by apple), has been compiled with 3 architectures inside. What does that mean? It means there is basically 3 versions on PHP compiled into a single binary, and that when it is loaded into memory, only one particular version will be used at a time. To demonstrate, lets take a pretty common difference between 32bit and 64bit architectures: integer size. We know that 64 bit integer space is larger than that of the 32bit space. The following demo will show running different architectures from the same binary:

    /usr/local# arch -arch x86_64 /usr/bin/php -nr 'echo PHP_INT_MAX;'
    9223372036854775807

    /usr/local# arch -arch i386 /usr/bin/php -nr 'echo PHP_INT_MAX;'
    2147483647

We know we are running same command though different architectures since we know PHP has different max integer sizes.

The next important thing to understand is the nature of the PHP stack. PHP is generally regarded as a glue language. That might mean several things to different people, but we will be looking strictly at this statement in the purest technical sense. PHP is made of the core language and features, but also a rich set of extensions. These extensions are typically written in C, and have interfaced with the C layer PHPAPI. Most of the really useful extensions are linked against libraries on your system, for example the openssl set of functions are not actually implemented in PHP’s source code, the openssl extension is simple a wrapper that calls out to libssl.so (or .dylib on mac, .dll on windows). This is what is meant by PHP being a glue language/platform.

Since PHP relies on existing compiled libraries, you further have to understand how things are linked and compiled. There are generally two options here: linking dynamically, or statically compiling. Either way, one thing remains true: you cannot mix architectures. This means that if your apache/mod_php and/or php binary are only i386, then all of the libraries on your system that will be used must contain the i386 architecture. Likewise, apache/mod_php and/or php binary are only x86_64, then all of your libraries must contain the x86_64 architecture. Failing to have this, you will get a message like this for example:

PHP Warning:  PHP Startup: Unable to load dynamic library '/usr/local/zend/lib/php_extensions/gearman.so' - dlopen(/usr/local/zend/lib/php_extensions/gearman.so, 9): no suitable image found.  Did find:
/usr/local/zend/lib/php_extensions/gearman.so: mach-o, but wrong architecture in Unknown on line 0

Now that we understand that executables and libraries can have multiple architectures, let’s get to the task at hand: making sure new extensions can run with Zend Server CE.

Zend Server CE for Mac (as of this writing), comes compiled as an i386 executable only. This includes the PHP binary, php library, and apache binaries that come shipped with ZSCE. While ZSCE works great out the box with all the provided extensions, you might find that you want some additional 3rd party PHP extensions compiled/linked into this stack. That’s where things get a little confusing, and in this post, we’ll look at how to install the gearman extension.

PHP Extensions are basically wrappers around existing libraries, so generally, these extensions require the base library to already be on the system. In our case, we need “libgearman” compiled and on our system for us to be able to compile and use the PHP Gearman Extension.

At this point, I would generally instruct you to compile Gearman with multiple architectures and install (–prefix=/usr/local). (Note: to compile for multiple architectures, simply do the following):

    export CFLAGS='-arch i386 -arch x86_64'

In the particular case of Gearman, this will not work as the Gearman makefile utilizes flags that are not compatible with multiple architecture targets. As such, we go to plan B.

Plan B is something I generally do to keep my system clean: statically building libraries. I have a personal rule of not keeping i386 only libraries installed in common places like /usr/lib or /usr/local/lib, in this case /usr/local/lib/libgearman.dylib. Since this is the case, I’ll build Gearman statically, compile it into the PHP Gearman Extension, and this will allow me to remove the temporary Gearman installation which will have to be i386 only.

    # check to ensure we have a multi-arch libevent (if not go create it as
    # normal with CFLAGS="-arch i386 -arch x86_64" and install to /usr/local)

    /usr/local/src/gearmand-0.13# file /usr/local/lib/libevent.dylib
        /usr/local/lib/libevent.dylib: Mach-O universal binary with 2 architectures
        /usr/local/lib/libevent.dylib (for architecture i386):  Mach-O dynamically linked shared library i386
        /usr/local/lib/libevent.dylib (for architecture x86_64):        Mach-O 64-bit dynamically linked shared library x86_64

    # next compile gearman to a temp location

    /usr/local/src/gearmand-0.13# export "CFLAGS=-arch i386"
    /usr/local/src/gearmand-0.13# ./configure --disable-shared --prefix=/usr/local/gearman-tmp
    /usr/local/src/gearmand-0.13# make && make install
        [gearman installed now, this should only have static files]

    # ensure we only have a .a library file for gearman
    /usr/local/src/gearmand-0.13# ls /usr/local/gearman-tmp/lib/
        libgearman.a    libgearman.la   pkgconfig

    # make sure zend/bin is first on your PATH
    /usr/local/zend/tmp# echo $PATH
        /usr/local/zend/bin:/var/root/.bin:/usr/local/git/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
    /usr/local/zend/tmp# which phpize
        /usr/local/zend/bin/phpize

    # next, go to our zend server location, and pull down gearman extension
    /usr/local/src/gearmand-0.13# cd /usr/local/zend/tmp/
    /usr/local/zend/tmp# pecl download gearman-beta
        downloading gearman-0.7.0.tgz ...
        Starting to download gearman-0.7.0.tgz (29,258 bytes)
        .........done: 29,258 bytes
    File /usr/local/zend/tmp/gearman-0.7.0.tgz downloaded

    # next, unpack, phpize, and statically compile
    /usr/local/zend/tmp# tar zxf gearman-0.7.0.tgz
    /usr/local/zend/tmp# cd gearman-0.7.0
    /usr/local/zend/tmp/gearman-0.7.0# phpize
        Configuring for:
        PHP Api Version:         20090626
        Zend Module Api No:      20090626
        Zend Extension Api No:   220090626
    /usr/local/zend/tmp/gearman-0.7.0# ./configure --with-gearman=/usr/local/gearman-tmp/ --disable-shared
    /usr/local/zend/tmp/gearman-0.7.0# make
    /usr/local/zend/tmp/gearman-0.7.0# make install
        Installing shared extensions:     /usr/local/zend/lib/php_extensions/

    # Now go add extension=gearman.so to your php.ini file inside /usr/local/zend/etc/php.ini

    # Now go check that php will have gearman support
    /usr/local/zend# php -i | grep gearman
        gearman
        gearman support => enabled
        libgearman version => 0.13

    # Since we statically compiled it, we can remove our temp install of gearman
    /usr/local/zend# rm -Rf /usr/local/gearman-tmp/

At this point, you now have a 3rd party PECL extension that is compiled and working with ZSCE on Mac OS.X.

Dynamic Assertions for Zend_Acl in ZF

August 13th, 2009 by Ralph Schindler

In Zend Framework 1.9.1, Zend_Acl gets two major issues resolved and a simple API change that now make it possible to create a more robust, more expressive ACL definition with less code. ZF issues ZF-1721 and ZF-1722, each nearly two years old, have both been solved. Over the last two years, I’ve seen a variety of duplicate issues come into the issue tracker, which stem from two fundamental flaws in Zend_Acl – “Zend_Acl::isAllowed does not support Role/Resource Inheritance down to Assertions” and “Zend_Acl assertions breaks when inheritance is required (ie DepthFirstSearch)”. In this article, we’ll explore the API changes that alleviate these two problems, and we’ll demonstrate how to leverage the Zend_Acl assertion system to create expressive, dynamic assertions that work with your applications models.

Backwards Compatible API Changes

Before discussing the issues, let’s go over the API change and how that affects the component. Previously, the two methods for setting up an ACL that were used by a developer were add() and addRole(). Interestingly, add() was intended to imply addResource(). Since add() implied that you were adding a resource, its clear that this component was created from the perspective of resources as a primary actor, and then roles and assertions as secondary actors.

The new API allows for the creation of an ACL by using strings instead of having to use Zend_Acl_Role and Zend_Acl_Resource objects explicitly. To me, this is a pretty important step towards what I’d like to see in 2.0. In 2.0, I would ideally like to see addRole() and addResource() accept strings for types of roles and resources to query against, and accept objects for explicit role and resource objects to query against (even if they match an already registered type). To put simply, I would expect addRole('user') and addRole($userObjectForRalph) to have different behaviors if different permissions were registered for each. This would allow me to specify specific access for the user object ‘ralph’ separately from the ACL’s for objects of role type ‘user’. The behavior can be further defined to either inherit from the type, or override type ACL’s depending on the desired effect. Ultimately, this would allow for a more dynamic experience with Zend_Acl.

Dynamic Assertions Example

In the following example, we’ll have a look at a common use case that is now possible in Zend_Acl. In plain English, what developers want to be able to do is be able to design assertions that can accept application models that implement the Resource or Role interface, and be able to apply some dynamic or custom logic to assess whether or not the given role has access to the given resource. As mentioned previously, this was not possible because in the process of checking the ACL tree, using a depth-first search, the calling resource and roles was lost, and only the original registered objects was being persisted into the assertions. Well, that’s fixed now.

For the purposes of this example, we’ll take a simple concept: a user needs to be able to only edit their own blog post. The user in this case, would be our applications model for users. The actual class will implement the Zend_Acl_Role_Interface. We will also have a BlogPost model which will serve as the resource in question, thus implementing the Zend_Acl_Resource_Interface. Naturally, our system will be able to handle users of different role ‘types’, but our BlogPost will only be of a single resource type ‘blogPost’.

Note: the following code is demonstration only. As such, some coding standards or conventions are not necessarily what you’d expect in proper object-oriented code or even a Zend Framework MVC based application. Some of the code might contain rouge ‘echo’ statements so that the demonstration below will be more expressive of what its actually doing.

class User implements Zend_Acl_Role_Interface
{
    // using public members here for brevity in this article
	public $id = null;
    public $role = 'guest';

    public function getRoleId()
    {
        return $this->role;
    }
}

class BlogPost implements Zend_Acl_Resource_Interface
{
	public $id          = null;
    public $ownerUserId = null;

    public function getResourceId()
    {
        return 'blogPost';
    }
}

Next, we’ll create the dynamic assertion. We generally would expect this assertion to be called when a User is requested to modify a BlogPost. This assertion will ensure that the BlogPost‘s owner id (the user id that owns said BlogPost), is the same as the provided User objects id. If it is, pass, if not, fail. Fairly common use case, right? Here is what our assertion should look like, with a few inline comments:

class UserCanModifyBlogPostAssertion implements Zend_Acl_Assert_Interface
{
    /**
     * This assertion should receive the actual User and BlogPost objects.
     *
     * @param Zend_Acl $acl
     * @param Zend_Acl_Role_Interface $user
     * @param Zend_Acl_Resource_Interface $blogPost
     * @param $privilege
     * @return bool
     */
    public function assert(Zend_Acl $acl, Zend_Acl_Role_Interface $user = null, Zend_Acl_Resource_Interface $blogPost = null, $privilege = null)
    {
    	echo ' == Checking the assertion ==' . PHP_EOL; // only here for the purposes of article

        if (!$user instanceof User) {
            throw new InvalidArgumentException(__CLASS__ . '::' . __METHOD__ . ' expects the role to be an instance of User');
        }

        if (!$blogPost instanceof BlogPost) {
            throw new InvalidArgumentException(__CLASS__ . '::' . __METHOD__ . ' expects the resource to be an instance of BlogPost');
        }

        // if role is publisher, he can always modify a post
        if ($user->getRoleId() == 'publisher') {
        	return true;
        }

        // check to ensure that everyone else is only modifying their own post
        if ($user->id != null && $blogPost->ownerUserId == $user->id) {
        	return true;
        } else {
        	return false;
        }
    }
}

Note: Assertions, as with ACL’s can be treated, and most likely should be treated, as application models. As such, if you are using the Zend Framework MVC application structure, you might want to name this one similarly to Default_Model_Acl_UserCanModifyBlogPostAssertion, and would live in application/models/Acl/UserCanModifyBlogPostAssertion.php. Likewise, the User class would actually be Default_Model_User, and BlogPost might be Default_Model_BlogPost.

Now that we have our models setup for our ACL to interact with, its time to define the actual ACL definition itself. For the purposes of this exercise, we’ll not assume that the ACL itself is a model, but our consuming script below will simply interact with it. In a Zend Framework MVC application, one might find the ACL defined as a model within your application, depending on your needs.

$acl = new Zend_Acl();

// setup the various roles in our system
$acl->addRole('guest');
$acl->addRole('contributor', 'guest');
$acl->addRole('publisher', 'contributor');

// add the resources
$acl->addResource('blogPost');

// add privileges to roles and resource combiniations
$acl->allow('guest', 'blogPost', 'view');
$acl->allow('contributor', 'blogPost', 'contribute');
$acl->allow('contributor', 'blogPost', 'modify', new UserCanModifyBlogPostAssertion());
$acl->allow('publisher', 'blogPost', 'publish');

The above code has produced a fully defined ACL object, at least for the purposes of this article, that we can now start interacting with. In the follow examples, we’ll interact with this ACL object. The User and BlogPost objects utilize public properties for brevity and illustrative purposes, but you can assume that these object properties might be populated and persisted via Zend_Db_Table row, a web service, or some other data source persistence layer.

$user = new User();
$post = new BlogPost();

// some default values
$user->id = 1;
$post->ownerUserId = 1;

/**
 * Demonstrate guest Privileges
 */
echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL; 

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') modify?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

/**
 * Demonstrate contributor Privileges
 */

$user->role = 'contributor';

echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL; 

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 5;

// the following two examples should demonstrate the assertion being checked

echo 'Can user (' . $user->role . ') modify someone elses blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 1;

echo 'Can user (' . $user->role . ') modify own blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

/**
 * Demonstrate publisher Privileges
 */

$user->role = 'publisher';

echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL; 

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 5;

echo 'Can user (' . $user->role . ') modify someone elses blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 1;

echo 'Can user (' . $user->role . ') modify own blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

Once you have all of that in place, you can see a the run of such a script would produce these results:

/home/ralph/test-script/$ php acl-inheritance.php

Demonstrating guest privileges
------------------------------------------

Can user (guest) view?
yes

Can user (guest) contribute?
no

Can user (guest) modify?
no

Can user (guest) publish?
no

Demonstrating contributor privileges
------------------------------------------

Can user (contributor) view?
yes

Can user (contributor) contribute?
yes

 == Checking the assertion ==
Can user (contributor) modify someone elses blogPost?
no

 == Checking the assertion ==
Can user (contributor) modify own blogPost?
yes

Can user (contributor) publish?
no

Demonstrating publisher privileges
------------------------------------------

Can user (publisher) view?
yes

Can user (publisher) contribute?
yes

 == Checking the assertion ==
Can user (publisher) modify someone elses blogPost?
yes

 == Checking the assertion ==
Can user (publisher) modify own blogPost?
yes

Can user (publisher) publish?
yes

Conclusion

Zend_Acl can now be used to make concise, dynamic and expressive ACL systems. The assertion system that is in place in Zend_Acl can be leveraged in ways never seen before out of the box. While the User/BlogPost example is on the simple side, you can use this article to start thinking about the different ways such a system can be leveraged in your own projects where dynamic assertions would simplify controller or model code that is already in place.

Database Abstraction Layers Must Live!

July 15th, 2009 by Ralph Schindler

I come preaching true hope, against the fallacies.

I’ve heard the arguments for and against database abstraction layers (DALs) time and time again. I must say first, I agree with them all, both sides, equally. Interestingly, I can put the vocal proponents of each side of the argument in one of two boxes: a programmer guy box, or a database guy box. For some unknown reason though, they never seem to see eye to eye.

Honestly though, I like to put myself in the middle of that argument. I see both sides. I think fine tuning an application’s core business with vendor specific features is tremendously important, after all, that is why there are so many competing database vendors. Generally speaking of database driven projects, I feel like planning to use a specific vendor up front, knowing its pro’s and con’s, and tailoring an application to the chosen database’s strengths can only help in the long run. Also, I feel that building a database model first before any code, offers many performance and scalability advantages than does code first development.

That said, I also see value in using a database as a simple data-store when the actual database is not a key component of the overall application. That’s right, it is completely valid to say that the data-storage & database component of an application sometimes is not the key component; a database guy probably will never agree with you there. Just as there are programmers who swear by this code first, database later mantra, there are database developers that will swear by the database first, code later mantra.

The fact is, each project is unique. It’s this uniqueness of projects and their execution that ultimately shapes the perspectives of developers as well as the tools they write and consume. To say that one mantra is clearly a better choice over another is simply being ignorant.

The Use Case of Abstraction Layers

To be honest, I don’t really buy the “I might switch database vendors at some point” argument either, as Jeremy Zawodny points out. For larger projects (on the scale of the facebooks, the twitters, etc), switching the database underneath after a project has been in production is a monumental task- regardless if you have an abstraction layer or not. Chances are, you used some of the database specific features, not to mention, you now have a large set of mission critical data that also has to be ported. Long story short, its never as easy as swapping the abstraction layers database adapter out.

What I will buy though, is there are some problems that fall in thicker end of the Pareto Principle that can be solved with a database abstraction layer. For the uninitiated, the Pareto Principle is effectively the 80/20 rule. In software use cases, when applying this term- the 80% use case is the majority of use cases. These use cases are generally not that interesting in terms of database interaction. To give it a label, we can call these the CRUD, BREAD, or <<insert your favorite terminology here>> operations. That is not to say that these operations are not important, but they are not special. In fact, they are so un-special, that we can just about apply a standard query syntax (SQL 92) to them, and expect that the query is both portable between databases and common across applications that wish to use them.

This is where database abstraction fits in. As a developer, you’ll come across this problem time and time again. A large portion of an application are CRUD screens and the smaller more interesting part of your application is your reporting screens. With an abstraction layer, we are able to code against both a unified API as well as have a layer that will produce consistent and vendor compatible queries. This allows us to build more specialized data access layers (patterns) for multiple database vendors with great ease. You want Table Gateway- done, you want Row Gateway- done, you want Active Record- done. Each can be implemented to tackle the 80% part of the 80/20 rule when applied to the database centric business code of an application.

The Slow Path & The Fast Path

When I talk about this 80/20 rule in terms of the applications we write, I like to further refine the terminology so that it easier to visualize. The most prominent terms that helps developers visualize the 80/20 rule in their application is the slow path of your application, and the fast path of your application. Each of these terms has a set of characteristics that set each apart from one another:

Slow Path:

  • Performance is not of primary importance
  • Has an interactive nature
  • Validation and verification of data are of high priority
  • Application to data-store interactions are fairly trivial
  • Does not comprise applications core business logic

Fast Path:

  • Performance is of importance
  • Limited interactive nature, information flow is fairly static (non-interactive)
  • Flow of information consist of already verified and validated data (originates from the databsae)
  • Application to data-store interaction can become complex (JOINs, SUB-SELECTS, VIEWS)
  • Is the core business of the application

To get a better understanding of how the terms are applied, lets look at a typical web application. Generally speaking, there are a few web based forms that users interact with. These forms are the entry point of a code path that does not get a lot of throughput. This is generally because forms are submitted by people, and people can only type and submit forms so fast. In addition to this being a less traveled code path, it also has a few checks along the way- validation of data, and verification of data. Typically, the problems of verification and validation of data are not too unique to the application being executed. In fact, the web forms, validation and verification problems have been solved over and over again by various libraries.

On the other side of the equation, there is the aggregation and merging of the stored data (which inevitably came from the aforementioned web forms.) Since the unique aggregation and processing of this data is the core aspect of business of said application, it stands to reason that this code path will be more well traveled by users. This, is the fast path. The problems solved in this code path are generally unique and since they are unique, it’s hard to find an off the shelf solution to these problems.

Since this is where the money is to be made, it also stands to reason that developers should concentrate their efforts in the fast path of their application. This means they should solve the slow path problems of their application with existing tried and tested solutions- this includes generic forms solutions, validation and verification libraries and yes, database abstraction layers.

Getting Cozy With Zend_Db, a Database Abstraction Layer

Not that we’ve made a use case for DAL’s, what would one look like? Well, I’ll use Zend Frameworks Zend_Db as my use case.

The connection code:

$dbAdapter = Zend_Db::factory(array(
    'adapter' => 'Pdo_Mysql', // could be Pdo_Sqlite, Mysqli, Pdo_Mysql, Db2, or even Oracle
    'params' => array(
        'username' => 'test_user',
        'password' => 'test_pwd',
        'dbname' => 'test'
        )
    ));

You’ll note that since this factory takes a standardized array, it makes it trivial to swap out various connection information for different adapters.

Simple queries:

$data = array(
    'name'        => 'Remember the Milk',
    'description' => '2% Milk'
    'due_on'      => '2009-07-15',
    );
$dbAdapter->insert('todo_list', $data); // insert that data

// or
$lastInsertId = $dbAdapter->lastInsertId('todo_list');
$dbAdapter->update('todo_list', array('completed' => 'YES'), 'id = ' . $lastInsertId);

$dbAdapter->delete('todo_list', 'id = ' . $lastInsertId);

Here you’ll notice the generic and abstracted nature of this API. Since there are several tasks in database interaction that are consistent across the board, those such as INSERT, UPDATE and DELETE, it makes sense that we can create a generic API for handling such interactions. These interactions (INSERT, UPDATE and DELETE) represent the mutation methods of a database and as such, represent the most predominant way of getting data into a system.

For all intents and purposes though, simple SELECTs are fairly standardized too. They are standardized enough as to compliment the INSERT, UPDATE, and DELETE abstractions so that we can find actual rows to do these mutation operations.

Now that we have a simple and consistent API for doing simple SELECTs, INSERTs, UPDATEs, and DELETEs; we can implement something a little more interesting: the table & row gateway:

Zend_Db_Table_Abstract::setDefaultAdapter($dbAdapter);
$userTable = new Zend_Db_Table('user'); // ZF 1.9 feature
$userRow = $table->find(5); // find user by id 5 (primary key);
echo $userRow->username;

Immediately, you should see the inherent value in the above example. Rudimentary and common tasks can now be handled with a consistent and simple API. But what happens when you’ve started using this DAL, and you want to use a vendor specific feature? Well..

// assuming what you want is really REPLACE or INSERT IGNORE from mysql
$dbAdapter->query('INSERT IGNORE INTO configuration (name, value) VALUES (?, ?)', array($name, $value));

// OR
$dbAdapter->query('REPLACE INTO configuration (name, value) VALUES (?, ?)', array($name, $value));

As you can see, the query method of our database adapter will allow us to pass custom SQL into the database thus taking advantage of vendor specific features.

What if you want to combine both paradigms for ultimate flexibility?


// assuming Zend_Db_Table_Row, with a FriendshipReference rule
$friendRowset = $currentUserRow->findDependentRowset('User', 'FriendshipReference');

// collect friend id's
foreach ($friendRowset as $friendRow) {
    $friendIds[] = $friendRow->related_user_id;
}

$inClause = ' IN (' . implode(',', $friendIds) . ')';

$select = $dbAdapter->select();
$select
    ->from('user', array(
        'user_id',
        'related_user_id',
        'became_friends_on'
        ))
    ->where('user_id ' . $inClause);

// interact with driver directly
$mysqli = $dbAdapter->getConnection();
$mysqli->query('CREATE TEMPORARY TABLE friend ('
        . ' `user_id` int(11) NOT NULL,'
        . ' `related_user_id` int(11) NOT NULL,'
        . ' `became_friends_on` DATE NOT NULL'
        . ' ) ENGINE=MEMORY;'
    );
$mysqli->query('INSERT INTO friend ' . (string) $select);

// query new friend view
$friendTable = new Zend_Db_Table('friend');
$rows = $friendTable->fetchAll(
    'became_friends_on > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)',
    'became_friends_on'
    );

While that above example is “a bit out there”, it does show that even with a DAL, if it’s flexible enough, you can code as close to or as far away from the database as you like. Ultimately the mantra here is: lets get the job done in the most effective, efficient and sound way possible.

Conclusions

Simply put, a database abstraction layer is just another tool in the toolbox. You don’t have to completely change your paradigm of programming, nor do you have to apply an all-or-none approach to using a DAL. When applied correctly, you can build out the slow path of your application in little to no time, while leaving extra time for developing and fine-tuning the fast path of your application. And to keep code from becoming unruly, simply apply some best-practices code organization to your project.

PHP: Environments, Libraries, and Applications – Oh My!

May 24th, 2009 by Ralph Schindler

Over the past 10 years or so, I’ve worked with many different code bases and libraries. Originally, the “libraries” were my own because in my earlier programming days, I had a bad case of “NCH” syndrome. That’s “Not Coded Here” syndrome for the uninitiated. As time had gone on, there were some solutions that I needed for a simple project and did not have the time nor the patience to develop a custom library for. That’s when I started relying on others experience and code to get me through projects.

The first “library” I remember using was px.sklar.com by David Sklar. There were some great components in there that were worth integrating into projects, but I hesitate to call it a true library though since its both a repository of both reusable components as well as complete solutions/applications. Moving on into the 21st century, a more “official” PHP library was being born; the PEAR project. The first component I really started depending on for many projects was the Spreadsheet_Excel_Writer. PEAR is not without issues of its own, but thats a topic for a separate article.

A Little History

My earliest PHP applications where fairly simple. A PHP page that would interact with a database, and render some html. Looking back at them, they all look like oodles of hacks and spaghetti code. Of course this was 1999ish, so it was OK because after all, it got the job done. As projects grew larger, so did a desire for better organization. This new wave of applications I was writing at the time was the first divergence from Model 1 applications, and came with the introduction of the second library I started using.

Smarty (which used to be part of the PHP Project), was a library I came to depend on in every project. The single greatest aspect of Smarty from a code organization standpoint was that it separated scripts into “business logic” scripts and “presentation logic” scripts. If an application was a soup of code, Smarty was the tool which divided out the presentation specific code, or what we’d call the ‘view’ in the MVC paradigm, from the business specific code, or what we’d call the controller and model in the MVC paradigm. This was the first step many took towards what is known in the JSP world as Model 2 programming.

So why this history wrapped in with a little personal experience? Well, I’d say the path I have followed is pretty typical of programmers that use scripting languages to build applications, specifically web-applications. That said, as the technologies we’ve used evolved and grown.. we tend to move towards solutions that offer a sense of best practices, better code organization, and most importantly- reduce the time to market.

What does that have to do with you? Well, I’ve seen my share of PHP centric projects come and go. In addition to those projects, I’ve kept a watchful eye on projects in other communities such as the Ruby, Perl, Java and .NET communities. From them, we’ve borrowed concepts, ideas and tools to create better solutions for the PHP community. With that, I’ll continue on with explaining several of the most common facets of any PHP project. If this seems basic at first, its actually laying the groundwork for a few more in-depth articles down the line.

What is an Environment?

In PHP, the environment is the set of resources, capabilities and settings for immediate use within the lifespan of any one php process. I know thats a very general statement, but lets explore that a bit. On most systems, you’ll find a php.ini file. This ini file generally sets values for the php process to initialize with when it starts up. Some of these can be modified by the SAPI (command line layer, apache layer, etc), while other can be modified during runtime via set_ini, and others cannot be modified at all.

Each time a script is executed, it first inherits these php.ini values. This means, by default, if none have changed, a script is subject to the rules defined by the php.ini on the system. If these values (php.ini system values) are out of your control, this means that the script running has an ambiguous initial environment. This environment might have been defined by the system administrator or by the packager of the php distribution you are using.

If you are subject to an ambiguous environment setup, there are greater the chances your application will fail upon setup or during execution. At least one of these situations has come to plague a PHP developer at one time or another:

  • display_errors might be off, causing a WTF moment when an error arises.
  • error_reporting level is set to E_STRICT and the script was not written with respect to the error_reporting including this mode, thus creating 100′s of notices.
  • open_basedir was set and your script doesn’t have access to some resources it expects to have access to.

Those are just 3 of the more popular examples stemming from 3 different keys that can be set within a php.ini. To put it in a bigger perspective: there are 100s of these values. The point that needs to be most impressed is that for any given php script or php application, it should either check the environment at script startup, or in the least provide all of the environment prerequisites and assumptions the script or application makes. The ideal solution is to supply a script that will check the environment and report at installation time if the ini values are correct.

One of the more interesting environment variables in PHP, much like other languages and systems, is the common path. In PHP, the common path is called the include_path. The include_path just might be the most important php.ini based value to any script or project. During a PHP scripts runtime, the loading of files and components are generally checked against the paths defined within the include_path. This means that any scripts or classes (effectively any PHP code) can be located and loaded with a relative path, a path that is relative to any of the paths defined in the include_path.

The include_path is a pretty powerful thing. It makes it easier to bundle components and packages into “libraries”, and use them within projects. This helps facility DRY principals by encouraging good code reuse and solid library design. On the other hand, if you don’t properly manage your libraries that are on your include_path, this could pose some pretty significant problems down the line. More on that later though.

The general rule of thumb is this: take control of the php process’s environment as much as possible to ensure consistent behavior.

What is a Library?

Its seems like library is a fairly generic term, but I want to add some specific meaning to it at least in terms of PHP. A general definition of a library would effectively be a “collection of reusable code”; and that statement is true for all intents and purposes. For the purposes of this article, I’d like to take that a little further.

A library is a collection of components. While a library solves a less specific general problem, components solve a more specific general problem. Get it yet?

For demonstration purposes, I’ll use the Zend Framework.. since I’m a little biased towards that one. The Zend Framework has a couple of libraries, the main one called the Standard Library. The ZF Standard Library solves a pretty general problem: “The PHP Application problem”. As you can see, thats a fairly general (relatively speaking) problem it attempts to solve. This library is made up of several components that solve specific problems within the “PHP Application problem.” For example, Zend_View and Zend_Controller solve the “web application structure” problem. Zend_Form solves the “web forms” problem. So on and so forth. These are problems that can be solved with tried, tested, and true solutions. These solutions can generally be considered “best practices“. They are solved so that you can get onto solving the even more specific problems… those inside the “application”.

Its worth noting that the definition of a library is also relative to the audience its targeted at. In our above example, the Zend Framework’s intended audience is all PHP developers. Your company, on the other hand, has a smaller target audience: its internal developers. Since that audience is a smaller and more concise group, their needs are more specific than those of the global developer community. That means that a company’s “library” might solve “more specific general problems” on a company wide scale. For example, a company might have 10 applications that use a single-sign-on system. Since those 10 applications within that company have the less specific problem of user sign on, that solution would be best fitted inside the company’s “library”.

In general, libraries solve problems that are generic enough for the entire intended audience, and each problem solved into a component of the “library”. Everything else goes into your “application”.

What is an Application?

As hinted above in the section on libraries, an application too is defined by the problem it attempts to solve. An application is a collection of business specific code which solves a very specific business problem. Again, this sounds generic, but it can be further defined and explained.

A business problem is the most specific problem that can be solved with code; this is the application. It will be the sum of all target environments, target audiences, and target tasks that should be solved. These business problems have a very narrow focus. While applications can be further defined into specific areas of code, the whole of the application’s object is to solve the business problem.

Depending on how complicated the business problem is that is target of the application to solve; an application might be modular. If an application is modular, that implies that the application’s problem area can be divided into even more specific areas of code with specific responsibilities. Lets take a community website for example. The site might include forums, user management, mail, calendaring and news. Each of these respective areas of the site could be considered modules of the main application or website. While this is a generic example, it does demonstrated a logical division of responsibility which is ultimately the point of introducing modules into an application. Each project and business should evaluate their application and decide upfront how granular the application’s problem is, and how best to further divide it. Doing this up front will alleviate many issues that could arise later as the code base starts to grow.

Beyond the modularity of an application, a further, more logical division and organization of code is generally applied. While there are several paradigms of application organization, we’ll focus on the MVC architecture (if you are not familiar with the MVC architecture it might be best to read the wikipedia article first before moving forward). Both an applications module and a non-modular application can be organized into Models, Views, and Controllers.. the main constituents of the MVC paradigm. Without getting to involved into what MVC is, one should know that:

  • The model represents the code base for solving the business problem at hand in a UI and environment agnostic way.
  • The controller represents the code base responsible for bridging a user’s interaction with the UI to the business model, and setting up new UI.
  • The view represents the code base responsible for creating the environment specific UI.

The above grouping of purposes is what is called as a separation of concerns.

Recap

Here is a recap of the terms defined within this article:

  • An Environment is the sum of all resources, capabilities and settings that exist in a PHP process. This generally includes what extensions and ini settings are preset for the PHP process.
  • A Library is collection of code that solves a less specific problem which is further defined by the libraries target audience and problem area.
  • A Component is a collection of code that solves a more specific problem within a library.
  • An Application is collection of code that solves a specific business problem. Ideally, applications consume libraries and components to facilitate quicker and more standardized development.
  • A Module is a collection of code that solves a more specific atomic problem of the larger business problem. The sum of all modules within an application attempt the solve the larger business problem.
  • MVC is a way to group code within both a module and application into a code base that facilitate a better separation of concerns.

PHPAustin Meetup Slides – Software Engineering In PHP

May 15th, 2009 by Ralph Schindler

On Tuesday, Josh Butts and I gave a presentation at the monthly Austin PHP Meetup titled “Software Engineering In PHP”.  Around 30 people were present and judging by the number of questions that were raised on each slide, the interest in the subject matter was fairly high.  In the end, it took around 2:15 to get through the 35 or so slides.

Read the rest of this entry »