The Anatomy Of A Bug/Issue Reproduction Script

February 18th, 2010 by Ralph Schindler

“There is a problem with component Fooey-Bar-Bazzy, I think it’s related to Nanny-Nanny-Neener. Please Fix Now.” If you’ve written a bug/issue report like that in the past with no other details- shame on you! This may come as a shock, but as great as some developers might be, they cannot read minds. Each has their own way of coding, custom working environment as well as their own favorite tools; aside from variances in coding standards and best practices. Some could argue these little intricacies are outside of the realm of coding standards and best practices and that these are the differences between good, great, and even terrible developers. Each developer has a different opinion on how particular applications, libraries of code, or even features of a particular project are expected to behave in practice. These varying expectations are why bugs/issues exist. No one developer producing code for mass consumption can anticipate every possible use case. Additionally, no one developer can replicate every environment surrounding every pre-conceived use case. There are simply not enough resources at hand; be it in the form of a variety of systems or simply the number of hours in a developers day.

With that in mind, I write this as a plea to all developers to be good to the maintainer of code you use. In the simplest form of advice, I suggest that before you click submit on that bug/issue report form, ask yourself two questions: “Did I do enough due-diligence in determining if this is really a bug?” AND “If I got this bug report, would I be able to reproduce it.. let alone understand it?”. If the answer is YES to both of those questions. Go ahead- click submit. If your answer is no, you’ve got some more work to do.

Some Tenets Of the Good Reproduction Script

In this short article, I’d like to outline a few details of what should go into a bug/issue report. These are some simple guidelines that should be considered when you write a bug/issue report. It should be noted that this list is by all means not exhaustive, but if you at least consider the list below before clicking submit- you’ll make a code maintainers day. I promise.

  1. List Out All Assumptions Clearly

    PHP specifically is well known for being a “glue language”. What that means is that PHP is generally sitting between multiple pieces of software that is, of course, not PHP. This means that these pieces of software each have their own set of configurations and environments that PHP is “gluing” together. That being the case, any assumptions about non-PHP assumptions should be clearly listed in the reproduction script. This could include database flavor and its settings, a PHP library component, or perhaps a specific version of an extension that is being used and the underlying unmanaged/c-based library your PHP environment is consuming.

  2. Use The Shortest Possible Use Case

    As tempting as it is to copy a script from your project and paste it into the bug/issue submission box, don’t do this. If you are truly invested in seeing the bug/issue fixed in a timely fashion, take the time to create a small reproduction script. In this script should be the absolute minimal amount of code to demonstrate to another human that there is indeed a problem that needs solving. By keeping the script minimal and short, you are also removing any other distractions from the script that otherwise might confuse the maintainer and prevent him from fully understanding the real problem.

  3. Use Generic Yet Meaningful Names

    It cannot be stressed enough that any non-meaningful names should be discouraged at all costs. And as mentioned above, you want to have as few distractions as possible in the use case. For example, supplying your database table of customers, with first_name, last_name, etc has virtually nothing to do with the problem at hand. In these cases where table and column names are ancillary to the actual problem, they should be generalized: a table named ‘foo’, and columns named ‘bar1′ and ‘bar2′. Unless …

    … the variable name can add context to the problem. What does this mean? $customer would be bad; but $faultyTableObject is good. The latter naming makes it easy for the maintainer to focus on the variable that need to be tracked leading up to the problem.

  4. Document Both What You Expect, And The Actual Result

    Claiming something is broken without offering what you expect and what the actual result is offers next to nothing to the maintainer attempting to fix the problem. Generally speaking, most use cases that end up being bugs/issues are outside of the original preconceived use cases for the actual component. That said, the maintainer is going to need the context of the use case that you’ve found to be problematic. It also helps to point out any existing documentation that describe the more well-defined uses cases, and how your use case relates and/or deviates from those already defined use cases.

  5. Make The Reproduction Script As Generic As Possible

    Perhaps this is redundant, but it’s important to know the minimal requirements for reproducing a bug/issue. You are not expected to be an expert on how to fix the actual problem, but you should do your own due-diligence in order to hand the problem off to the maintainer. It’s already been said to “List out all assumptions clearly”, but it is just as important to peel off any specific pieces of the problem that are not directly part of the problem.

    This concept can best be described by example. While MySQL is a widely available database platform, SQLite is widely known as the easiest to use and most portable database platform, at least in the PHP runtime. If you find a problem while using mysql, but it’s clear it can be replicated using SQLite, use SQLite. SQLite is built into PHP by default, and in a single script, you can create a memory based database and its schema in just a few lines of code.

    Sometimes a issue cannot be described in a single script. This is ok. This would be the case if, for example, you found an issue in a larger system, like Zend Frameworks MVC layer. In this case, it makes sense that you need to provide a minimal ZF project to demonstrate the issue. In these cases, make sure to again, use a few files and as little code as possible to demonstrate the issue. Also, in the spirit of using generic code, ensure to make all file system paths relative. This will help the maintainer get up and running with the problematic project in a minimal amount of time, with minimal configuration.

A Reproduction Script By Example

The following is a reproduction script I have written based on an issue (ZF-3709) provided to Zend Framework in our issue tracker. I chose this issue to write a reproduction for because it offers the ability to talk about how one might go about describing the environment, more specifically what the database should look like in order to replicate the problem.

(This script can also be found at http://gist.github.com/307396)

<?php

/**
 * This reproduction script shall accompany the issue reported at
 * http://framework.zend.com/issues/browse/ZF-3709
 *
 * Assumptions:
 *   Zend_Db_Table_* from trunk
 *   PHP Environment has SQLite with :memory: capabilities
 *
 * Result:
 *   This script should run without any assertions failing (empty output)
 */



// ensure that Zend Framework trunk is being tested against & classes are available
// set_include_path('/path/to/ZendFramework/library');
require_once 'Zend/Loader/Autoloader.php';
Zend_Loader_Autoloader::getInstance();

// setup the adapter, this uses SQLite so that its minimally invasive
// to anyone wishing to reproduce the issue on their local machine
$dbAdapter = Zend_Db::factory(
    'Pdo_Sqlite',
    array('dbname' => ':memory:')
    );

// ensure all tables have access to the adapter
Zend_Db_Table::setDefaultAdapter($dbAdapter);

// setup the database, classes, & assertion system
setup();



/**
 * BEGIN Reproduction Code
 */



// find a record that has a relationship to some bars through foo_to_bar
$fooTable = new Foo();
$fooRow = $fooTable->fetchRow('id = 2');
$fooIdOnesBars = $fooRow->findManyToManyRowset('Bar', 'FooToBar');

// the expected values for the next call
$expectedValues = array(
    array('id' => '2', 'name' => 'bravo'),
    array('id' => '3', 'name' => 'charlie')
    );


// when we loop through the rows, they should match the expected results above
foreach ($fooIdOnesBars as $index => $barRow) {
    // I'll use assert here to throw warnings when expected does not match actual
    $actualValue = $barRow->toArray();
    assert($expectedValues[$index] === $actualValue);
}



/**
 * END Reproduction Code
 *
 * Supporting code below
 */


// setup function
function setup() {
    setup_database();
    setup_classes();
    setup_assertions();
}

// This function will setup the proper database structure with test data
function setup_database() {
    global $dbAdapter;

    $conn = $dbAdapter->getConnection();
    $conn->query('
        CREATE TABLE foo (
            id INTEGER PRIMARY KEY,
            name VARCHAR(25)
            );
        ');

    foreach (array('one', 'two', 'three', 'four') as $numberName) {
        $conn->query('INSERT INTO foo (name) VALUES ("' . $numberName . '");');
    }

    $conn->query('
        CREATE TABLE bar (
            id INTEGER PRIMARY KEY,
            name VARCHAR(25));
        ');

    foreach (array('alpha', 'bravo', 'charlie', 'delta') as $word) {
        $conn->query('INSERT INTO bar (name) VALUES ("' . $word . '");');
    }

    $conn->query('
        CREATE TABLE foo_to_bar (
            id INTEGER PRIMARY KEY,
            foo_id INTEGER,
            bar_id INTEGER,
            extra VARCHAR(20)
            );
        ');
    $datas = array(
        array('foo_id' => 2, 'bar_id' => 2, 'extra' => 'Two to Two'),
        array('foo_id' => 2, 'bar_id' => 3, 'extra' => 'Two to Three'),
        array('foo_id' => 3, 'bar_id' => 4, 'extra' => 'Three to Four'),
        );
    foreach ($datas as $datum) {
        $conn->query('INSERT INTO foo_to_bar '
            . '(' . implode(',', array_keys($datum)) . ')'
            . ' VALUES ("' . implode('", "', array_values($datum))
            . '");');
    }
}

// This function will define the proper Zend_Db_Tables and their relationships
function setup_classes() {

    class Foo extends Zend_Db_Table_Abstract
    {
        protected $_name = 'foo';
    }

    class Bar extends Zend_Db_Table_Abstract
    {
        protected $_name = 'bar';
    }

    class FooToBar extends Zend_Db_Table_Abstract
    {
        protected $_name = 'foo_to_bar';
        protected $_referenceMap = array(
            'Foo' => array(
                'columns' => 'foo_id',
                'refTableClass' => 'Foo',
                'refColumn' => 'id'
                ),
            'Bar' => array(
                'columns' => 'bar_id',
                'refTableClass' => 'Bar',
                'refColumn' => 'id'
                )
            );
    }

}

// assertion setup
function setup_assertions() {
    assert_options(ASSERT_ACTIVE, true);
    assert_options(ASSERT_WARNING, false);
    assert_options(ASSERT_CALLBACK, 'assert_failure');
}

// callback for assertion failures
function assert_failure() {
    global $expectedValues, $index, $actualValue;
    echo 'Was expecting an array that looked like:' . PHP_EOL;
    var_dump($expectedValues[$index]);
    echo 'But got array that looked like:' . PHP_EOL;
    var_dump($actualValue);
    echo PHP_EOL . PHP_EOL;
}

To the best of my ability, this script passes both of my earlier questions: “Yes, I did enough due-diligence in determining if this is really a bug.” AND “Yes, if I got this bug report, would I be able to reproduce it and understand it.”

A Few Considerations

This above script does not have unit tests, nor does it represent a patch to the existing framework. While that would be the most ideal, that sets the bar much too high for people to report worthwhile issues. The consumers of the code are not expected to be experts on the actual issue at hand, or even how to write valid unit tests that fully exercise a feature or bug. Ultimately, as a code maintainer, I simply want to be able to see the issue you are attempting to describe.

If you’d like to go above and beyond the standard reproduction script, you might also considering offering lines of code that you feel might be problematic. What that allows is maintainers to set breakpoints at specific locations and really drill down into the offending code.

I hope this helps developers understand what is expected of them as they file issue reports on open source code they use. By following these guidelines you’ll be doing a service to the maintainer by making their life easier, and even your own since reproduction scripts offer quicker turn around time for issues over those that require in-depth research.

Dynamic Assertions for Zend_Acl in ZF

August 13th, 2009 by Ralph Schindler

In Zend Framework 1.9.1, Zend_Acl gets two major issues resolved and a simple API change that now make it possible to create a more robust, more expressive ACL definition with less code. ZF issues ZF-1721 and ZF-1722, each nearly two years old, have both been solved. Over the last two years, I’ve seen a variety of duplicate issues come into the issue tracker, which stem from two fundamental flaws in Zend_Acl – “Zend_Acl::isAllowed does not support Role/Resource Inheritance down to Assertions” and “Zend_Acl assertions breaks when inheritance is required (ie DepthFirstSearch)”. In this article, we’ll explore the API changes that alleviate these two problems, and we’ll demonstrate how to leverage the Zend_Acl assertion system to create expressive, dynamic assertions that work with your applications models.

Backwards Compatible API Changes

Before discussing the issues, let’s go over the API change and how that affects the component. Previously, the two methods for setting up an ACL that were used by a developer were add() and addRole(). Interestingly, add() was intended to imply addResource(). Since add() implied that you were adding a resource, its clear that this component was created from the perspective of resources as a primary actor, and then roles and assertions as secondary actors.

The new API allows for the creation of an ACL by using strings instead of having to use Zend_Acl_Role and Zend_Acl_Resource objects explicitly. To me, this is a pretty important step towards what I’d like to see in 2.0. In 2.0, I would ideally like to see addRole() and addResource() accept strings for types of roles and resources to query against, and accept objects for explicit role and resource objects to query against (even if they match an already registered type). To put simply, I would expect addRole('user') and addRole($userObjectForRalph) to have different behaviors if different permissions were registered for each. This would allow me to specify specific access for the user object ‘ralph’ separately from the ACL’s for objects of role type ‘user’. The behavior can be further defined to either inherit from the type, or override type ACL’s depending on the desired effect. Ultimately, this would allow for a more dynamic experience with Zend_Acl.

Dynamic Assertions Example

In the following example, we’ll have a look at a common use case that is now possible in Zend_Acl. In plain English, what developers want to be able to do is be able to design assertions that can accept application models that implement the Resource or Role interface, and be able to apply some dynamic or custom logic to assess whether or not the given role has access to the given resource. As mentioned previously, this was not possible because in the process of checking the ACL tree, using a depth-first search, the calling resource and roles was lost, and only the original registered objects was being persisted into the assertions. Well, that’s fixed now.

For the purposes of this example, we’ll take a simple concept: a user needs to be able to only edit their own blog post. The user in this case, would be our applications model for users. The actual class will implement the Zend_Acl_Role_Interface. We will also have a BlogPost model which will serve as the resource in question, thus implementing the Zend_Acl_Resource_Interface. Naturally, our system will be able to handle users of different role ‘types’, but our BlogPost will only be of a single resource type ‘blogPost’.

Note: the following code is demonstration only. As such, some coding standards or conventions are not necessarily what you’d expect in proper object-oriented code or even a Zend Framework MVC based application. Some of the code might contain rouge ‘echo’ statements so that the demonstration below will be more expressive of what its actually doing.

class User implements Zend_Acl_Role_Interface
{
    // using public members here for brevity in this article
	public $id = null;
    public $role = 'guest';

    public function getRoleId()
    {
        return $this->role;
    }
}

class BlogPost implements Zend_Acl_Resource_Interface
{
	public $id          = null;
    public $ownerUserId = null;

    public function getResourceId()
    {
        return 'blogPost';
    }
}

Next, we’ll create the dynamic assertion. We generally would expect this assertion to be called when a User is requested to modify a BlogPost. This assertion will ensure that the BlogPost’s owner id (the user id that owns said BlogPost), is the same as the provided User objects id. If it is, pass, if not, fail. Fairly common use case, right? Here is what our assertion should look like, with a few inline comments:

class UserCanModifyBlogPostAssertion implements Zend_Acl_Assert_Interface
{
    /**
     * This assertion should receive the actual User and BlogPost objects.
     *
     * @param Zend_Acl $acl
     * @param Zend_Acl_Role_Interface $user
     * @param Zend_Acl_Resource_Interface $blogPost
     * @param $privilege
     * @return bool
     */
    public function assert(Zend_Acl $acl, Zend_Acl_Role_Interface $user = null, Zend_Acl_Resource_Interface $blogPost = null, $privilege = null)
    {
    	echo ' == Checking the assertion ==' . PHP_EOL; // only here for the purposes of article

        if (!$user instanceof User) {
            throw new InvalidArgumentException(__CLASS__ . '::' . __METHOD__ . ' expects the role to be an instance of User');
        }

        if (!$blogPost instanceof BlogPost) {
            throw new InvalidArgumentException(__CLASS__ . '::' . __METHOD__ . ' expects the resource to be an instance of BlogPost');
        }

        // if role is publisher, he can always modify a post
        if ($user->getRoleId() == 'publisher') {
        	return true;
        }

        // check to ensure that everyone else is only modifying their own post
        if ($user->id != null && $blogPost->ownerUserId == $user->id) {
        	return true;
        } else {
        	return false;
        }
    }
}

Note: Assertions, as with ACL’s can be treated, and most likely should be treated, as application models. As such, if you are using the Zend Framework MVC application structure, you might want to name this one similarly to Default_Model_Acl_UserCanModifyBlogPostAssertion, and would live in application/models/Acl/UserCanModifyBlogPostAssertion.php. Likewise, the User class would actually be Default_Model_User, and BlogPost might be Default_Model_BlogPost.

Now that we have our models setup for our ACL to interact with, its time to define the actual ACL definition itself. For the purposes of this exercise, we’ll not assume that the ACL itself is a model, but our consuming script below will simply interact with it. In a Zend Framework MVC application, one might find the ACL defined as a model within your application, depending on your needs.

$acl = new Zend_Acl();

// setup the various roles in our system
$acl->addRole('guest');
$acl->addRole('contributor', 'guest');
$acl->addRole('publisher', 'contributor');

// add the resources
$acl->addResource('blogPost');

// add privileges to roles and resource combiniations
$acl->allow('guest', 'blogPost', 'view');
$acl->allow('contributor', 'blogPost', 'contribute');
$acl->allow('contributor', 'blogPost', 'modify', new UserCanModifyBlogPostAssertion());
$acl->allow('publisher', 'blogPost', 'publish');

The above code has produced a fully defined ACL object, at least for the purposes of this article, that we can now start interacting with. In the follow examples, we’ll interact with this ACL object. The User and BlogPost objects utilize public properties for brevity and illustrative purposes, but you can assume that these object properties might be populated and persisted via Zend_Db_Table row, a web service, or some other data source persistence layer.

$user = new User();
$post = new BlogPost();

// some default values
$user->id = 1;
$post->ownerUserId = 1;

/**
 * Demonstrate guest Privileges
 */
echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') modify?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

/**
 * Demonstrate contributor Privileges
 */

$user->role = 'contributor';

echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 5;


// the following two examples should demonstrate the assertion being checked

echo 'Can user (' . $user->role . ') modify someone elses blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 1;

echo 'Can user (' . $user->role . ') modify own blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

/**
 * Demonstrate publisher Privileges
 */

$user->role = 'publisher';

echo 'Demonstrating ' . $user->role . ' privileges' . PHP_EOL
    . '------------------------------------------'
    . PHP_EOL . PHP_EOL;

echo 'Can user (' . $user->role . ') view?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'view') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') contribute?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'contribute') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 5;

echo 'Can user (' . $user->role . ') modify someone elses blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

$post->ownerUserId = 1;

echo 'Can user (' . $user->role . ') modify own blogPost?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'modify') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

echo 'Can user (' . $user->role . ') publish?' . PHP_EOL
    . ($acl->isAllowed($user, $post, 'publish') ? 'yes' : 'no') . PHP_EOL
    . PHP_EOL;

Once you have all of that in place, you can see a the run of such a script would produce these results:

/home/ralph/test-script/$ php acl-inheritance.php

Demonstrating guest privileges
------------------------------------------

Can user (guest) view?
yes

Can user (guest) contribute?
no

Can user (guest) modify?
no

Can user (guest) publish?
no

Demonstrating contributor privileges
------------------------------------------

Can user (contributor) view?
yes

Can user (contributor) contribute?
yes

 == Checking the assertion ==
Can user (contributor) modify someone elses blogPost?
no

 == Checking the assertion ==
Can user (contributor) modify own blogPost?
yes

Can user (contributor) publish?
no

Demonstrating publisher privileges
------------------------------------------

Can user (publisher) view?
yes

Can user (publisher) contribute?
yes

 == Checking the assertion ==
Can user (publisher) modify someone elses blogPost?
yes

 == Checking the assertion ==
Can user (publisher) modify own blogPost?
yes

Can user (publisher) publish?
yes

Conclusion

Zend_Acl can now be used to make concise, dynamic and expressive ACL systems. The assertion system that is in place in Zend_Acl can be leveraged in ways never seen before out of the box. While the User/BlogPost example is on the simple side, you can use this article to start thinking about the different ways such a system can be leveraged in your own projects where dynamic assertions would simplify controller or model code that is already in place.

Database Abstraction Layers Must Live!

July 15th, 2009 by Ralph Schindler

I come preaching true hope, against the fallacies.

I’ve heard the arguments for and against database abstraction layers (DALs) time and time again. I must say first, I agree with them all, both sides, equally. Interestingly, I can put the vocal proponents of each side of the argument in one of two boxes: a programmer guy box, or a database guy box. For some unknown reason though, they never seem to see eye to eye.

Honestly though, I like to put myself in the middle of that argument. I see both sides. I think fine tuning an application’s core business with vendor specific features is tremendously important, after all, that is why there are so many competing database vendors. Generally speaking of database driven projects, I feel like planning to use a specific vendor up front, knowing its pro’s and con’s, and tailoring an application to the chosen database’s strengths can only help in the long run. Also, I feel that building a database model first before any code, offers many performance and scalability advantages than does code first development.

That said, I also see value in using a database as a simple data-store when the actual database is not a key component of the overall application. That’s right, it is completely valid to say that the data-storage & database component of an application sometimes is not the key component; a database guy probably will never agree with you there. Just as there are programmers who swear by this code first, database later mantra, there are database developers that will swear by the database first, code later mantra.

The fact is, each project is unique. It’s this uniqueness of projects and their execution that ultimately shapes the perspectives of developers as well as the tools they write and consume. To say that one mantra is clearly a better choice over another is simply being ignorant.

The Use Case of Abstraction Layers

To be honest, I don’t really buy the “I might switch database vendors at some point” argument either, as Jeremy Zawodny points out. For larger projects (on the scale of the facebooks, the twitters, etc), switching the database underneath after a project has been in production is a monumental task- regardless if you have an abstraction layer or not. Chances are, you used some of the database specific features, not to mention, you now have a large set of mission critical data that also has to be ported. Long story short, its never as easy as swapping the abstraction layers database adapter out.

What I will buy though, is there are some problems that fall in thicker end of the Pareto Principle that can be solved with a database abstraction layer. For the uninitiated, the Pareto Principle is effectively the 80/20 rule. In software use cases, when applying this term- the 80% use case is the majority of use cases. These use cases are generally not that interesting in terms of database interaction. To give it a label, we can call these the CRUD, BREAD, or <<insert your favorite terminology here>> operations. That is not to say that these operations are not important, but they are not special. In fact, they are so un-special, that we can just about apply a standard query syntax (SQL 92) to them, and expect that the query is both portable between databases and common across applications that wish to use them.

This is where database abstraction fits in. As a developer, you’ll come across this problem time and time again. A large portion of an application are CRUD screens and the smaller more interesting part of your application is your reporting screens. With an abstraction layer, we are able to code against both a unified API as well as have a layer that will produce consistent and vendor compatible queries. This allows us to build more specialized data access layers (patterns) for multiple database vendors with great ease. You want Table Gateway- done, you want Row Gateway- done, you want Active Record- done. Each can be implemented to tackle the 80% part of the 80/20 rule when applied to the database centric business code of an application.

The Slow Path & The Fast Path

When I talk about this 80/20 rule in terms of the applications we write, I like to further refine the terminology so that it easier to visualize. The most prominent terms that helps developers visualize the 80/20 rule in their application is the slow path of your application, and the fast path of your application. Each of these terms has a set of characteristics that set each apart from one another:

Slow Path:

  • Performance is not of primary importance
  • Has an interactive nature
  • Validation and verification of data are of high priority
  • Application to data-store interactions are fairly trivial
  • Does not comprise applications core business logic

Fast Path:

  • Performance is of importance
  • Limited interactive nature, information flow is fairly static (non-interactive)
  • Flow of information consist of already verified and validated data (originates from the databsae)
  • Application to data-store interaction can become complex (JOINs, SUB-SELECTS, VIEWS)
  • Is the core business of the application

To get a better understanding of how the terms are applied, lets look at a typical web application. Generally speaking, there are a few web based forms that users interact with. These forms are the entry point of a code path that does not get a lot of throughput. This is generally because forms are submitted by people, and people can only type and submit forms so fast. In addition to this being a less traveled code path, it also has a few checks along the way- validation of data, and verification of data. Typically, the problems of verification and validation of data are not too unique to the application being executed. In fact, the web forms, validation and verification problems have been solved over and over again by various libraries.

On the other side of the equation, there is the aggregation and merging of the stored data (which inevitably came from the aforementioned web forms.) Since the unique aggregation and processing of this data is the core aspect of business of said application, it stands to reason that this code path will be more well traveled by users. This, is the fast path. The problems solved in this code path are generally unique and since they are unique, it’s hard to find an off the shelf solution to these problems.

Since this is where the money is to be made, it also stands to reason that developers should concentrate their efforts in the fast path of their application. This means they should solve the slow path problems of their application with existing tried and tested solutions- this includes generic forms solutions, validation and verification libraries and yes, database abstraction layers.

Getting Cozy With Zend_Db, a Database Abstraction Layer

Not that we’ve made a use case for DAL’s, what would one look like? Well, I’ll use Zend Frameworks Zend_Db as my use case.

The connection code:

$dbAdapter = Zend_Db::factory(array(
    'adapter' => 'Pdo_Mysql', // could be Pdo_Sqlite, Mysqli, Pdo_Mysql, Db2, or even Oracle
    'params' => array(
        'username' => 'test_user',
        'password' => 'test_pwd',
        'dbname' => 'test'
        )
    ));

You’ll note that since this factory takes a standardized array, it makes it trivial to swap out various connection information for different adapters.

Simple queries:

$data = array(
    'name'        => 'Remember the Milk',
    'description' => '2% Milk'
    'due_on'      => '2009-07-15',
    );
$dbAdapter->insert('todo_list', $data); // insert that data

// or
$lastInsertId = $dbAdapter->lastInsertId('todo_list');
$dbAdapter->update('todo_list', array('completed' => 'YES'), 'id = ' . $lastInsertId);

$dbAdapter->delete('todo_list', 'id = ' . $lastInsertId);

Here you’ll notice the generic and abstracted nature of this API. Since there are several tasks in database interaction that are consistent across the board, those such as INSERT, UPDATE and DELETE, it makes sense that we can create a generic API for handling such interactions. These interactions (INSERT, UPDATE and DELETE) represent the mutation methods of a database and as such, represent the most predominant way of getting data into a system.

For all intents and purposes though, simple SELECTs are fairly standardized too. They are standardized enough as to compliment the INSERT, UPDATE, and DELETE abstractions so that we can find actual rows to do these mutation operations.

Now that we have a simple and consistent API for doing simple SELECTs, INSERTs, UPDATEs, and DELETEs; we can implement something a little more interesting: the table & row gateway:

Zend_Db_Table_Abstract::setDefaultAdapter($dbAdapter);
$userTable = new Zend_Db_Table('user'); // ZF 1.9 feature
$userRow = $table->find(5); // find user by id 5 (primary key);
echo $userRow->username;

Immediately, you should see the inherent value in the above example. Rudimentary and common tasks can now be handled with a consistent and simple API. But what happens when you’ve started using this DAL, and you want to use a vendor specific feature? Well..

// assuming what you want is really REPLACE or INSERT IGNORE from mysql
$dbAdapter->query('INSERT IGNORE INTO configuration (name, value) VALUES (?, ?)', array($name, $value));

// OR
$dbAdapter->query('REPLACE INTO configuration (name, value) VALUES (?, ?)', array($name, $value));

As you can see, the query method of our database adapter will allow us to pass custom SQL into the database thus taking advantage of vendor specific features.

What if you want to combine both paradigms for ultimate flexibility?


// assuming Zend_Db_Table_Row, with a FriendshipReference rule
$friendRowset = $currentUserRow->findDependentRowset('User', 'FriendshipReference');

// collect friend id's
foreach ($friendRowset as $friendRow) {
    $friendIds[] = $friendRow->related_user_id;
}

$inClause = ' IN (' . implode(',', $friendIds) . ')';

$select = $dbAdapter->select();
$select
    ->from('user', array(
        'user_id',
        'related_user_id',
        'became_friends_on'
        ))
    ->where('user_id ' . $inClause);

// interact with driver directly
$mysqli = $dbAdapter->getConnection();
$mysqli->query('CREATE TEMPORARY TABLE friend ('
        . ' `user_id` int(11) NOT NULL,'
        . ' `related_user_id` int(11) NOT NULL,'
        . ' `became_friends_on` DATE NOT NULL'
        . ' ) ENGINE=MEMORY;'
    );
$mysqli->query('INSERT INTO friend ' . (string) $select);

// query new friend view
$friendTable = new Zend_Db_Table('friend');
$rows = $friendTable->fetchAll(
    'became_friends_on > DATE_SUB(CURDATE(), INTERVAL 6 MONTH)',
    'became_friends_on'
    );

While that above example is “a bit out there”, it does show that even with a DAL, if it’s flexible enough, you can code as close to or as far away from the database as you like. Ultimately the mantra here is: lets get the job done in the most effective, efficient and sound way possible.

Conclusions

Simply put, a database abstraction layer is just another tool in the toolbox. You don’t have to completely change your paradigm of programming, nor do you have to apply an all-or-none approach to using a DAL. When applied correctly, you can build out the slow path of your application in little to no time, while leaving extra time for developing and fine-tuning the fast path of your application. And to keep code from becoming unruly, simply apply some best-practices code organization to your project.

PHP: Environments, Libraries, and Applications – Oh My!

May 24th, 2009 by Ralph Schindler

Over the past 10 years or so, I’ve worked with many different code bases and libraries. Originally, the “libraries” were my own because in my earlier programming days, I had a bad case of “NCH” syndrome. That’s “Not Coded Here” syndrome for the uninitiated. As time had gone on, there were some solutions that I needed for a simple project and did not have the time nor the patience to develop a custom library for. That’s when I started relying on others experience and code to get me through projects.

The first “library” I remember using was px.sklar.com by David Sklar. There were some great components in there that were worth integrating into projects, but I hesitate to call it a true library though since its both a repository of both reusable components as well as complete solutions/applications. Moving on into the 21st century, a more “official” PHP library was being born; the PEAR project. The first component I really started depending on for many projects was the Spreadsheet_Excel_Writer. PEAR is not without issues of its own, but thats a topic for a separate article.

A Little History

My earliest PHP applications where fairly simple. A PHP page that would interact with a database, and render some html. Looking back at them, they all look like oodles of hacks and spaghetti code. Of course this was 1999ish, so it was OK because after all, it got the job done. As projects grew larger, so did a desire for better organization. This new wave of applications I was writing at the time was the first divergence from Model 1 applications, and came with the introduction of the second library I started using.

Smarty (which used to be part of the PHP Project), was a library I came to depend on in every project. The single greatest aspect of Smarty from a code organization standpoint was that it separated scripts into “business logic” scripts and “presentation logic” scripts. If an application was a soup of code, Smarty was the tool which divided out the presentation specific code, or what we’d call the ‘view’ in the MVC paradigm, from the business specific code, or what we’d call the controller and model in the MVC paradigm. This was the first step many took towards what is known in the JSP world as Model 2 programming.

So why this history wrapped in with a little personal experience? Well, I’d say the path I have followed is pretty typical of programmers that use scripting languages to build applications, specifically web-applications. That said, as the technologies we’ve used evolved and grown.. we tend to move towards solutions that offer a sense of best practices, better code organization, and most importantly- reduce the time to market.

What does that have to do with you? Well, I’ve seen my share of PHP centric projects come and go. In addition to those projects, I’ve kept a watchful eye on projects in other communities such as the Ruby, Perl, Java and .NET communities. From them, we’ve borrowed concepts, ideas and tools to create better solutions for the PHP community. With that, I’ll continue on with explaining several of the most common facets of any PHP project. If this seems basic at first, its actually laying the groundwork for a few more in-depth articles down the line.

What is an Environment?

In PHP, the environment is the set of resources, capabilities and settings for immediate use within the lifespan of any one php process. I know thats a very general statement, but lets explore that a bit. On most systems, you’ll find a php.ini file. This ini file generally sets values for the php process to initialize with when it starts up. Some of these can be modified by the SAPI (command line layer, apache layer, etc), while other can be modified during runtime via set_ini, and others cannot be modified at all.

Each time a script is executed, it first inherits these php.ini values. This means, by default, if none have changed, a script is subject to the rules defined by the php.ini on the system. If these values (php.ini system values) are out of your control, this means that the script running has an ambiguous initial environment. This environment might have been defined by the system administrator or by the packager of the php distribution you are using.

If you are subject to an ambiguous environment setup, there are greater the chances your application will fail upon setup or during execution. At least one of these situations has come to plague a PHP developer at one time or another:

  • display_errors might be off, causing a WTF moment when an error arises.
  • error_reporting level is set to E_STRICT and the script was not written with respect to the error_reporting including this mode, thus creating 100’s of notices.
  • open_basedir was set and your script doesn’t have access to some resources it expects to have access to.

Those are just 3 of the more popular examples stemming from 3 different keys that can be set within a php.ini. To put it in a bigger perspective: there are 100s of these values. The point that needs to be most impressed is that for any given php script or php application, it should either check the environment at script startup, or in the least provide all of the environment prerequisites and assumptions the script or application makes. The ideal solution is to supply a script that will check the environment and report at installation time if the ini values are correct.

One of the more interesting environment variables in PHP, much like other languages and systems, is the common path. In PHP, the common path is called the include_path. The include_path just might be the most important php.ini based value to any script or project. During a PHP scripts runtime, the loading of files and components are generally checked against the paths defined within the include_path. This means that any scripts or classes (effectively any PHP code) can be located and loaded with a relative path, a path that is relative to any of the paths defined in the include_path.

The include_path is a pretty powerful thing. It makes it easier to bundle components and packages into “libraries”, and use them within projects. This helps facility DRY principals by encouraging good code reuse and solid library design. On the other hand, if you don’t properly manage your libraries that are on your include_path, this could pose some pretty significant problems down the line. More on that later though.

The general rule of thumb is this: take control of the php process’s environment as much as possible to ensure consistent behavior.

What is a Library?

Its seems like library is a fairly generic term, but I want to add some specific meaning to it at least in terms of PHP. A general definition of a library would effectively be a “collection of reusable code”; and that statement is true for all intents and purposes. For the purposes of this article, I’d like to take that a little further.

A library is a collection of components. While a library solves a less specific general problem, components solve a more specific general problem. Get it yet?

For demonstration purposes, I’ll use the Zend Framework.. since I’m a little biased towards that one. The Zend Framework has a couple of libraries, the main one called the Standard Library. The ZF Standard Library solves a pretty general problem: “The PHP Application problem”. As you can see, thats a fairly general (relatively speaking) problem it attempts to solve. This library is made up of several components that solve specific problems within the “PHP Application problem.” For example, Zend_View and Zend_Controller solve the “web application structure” problem. Zend_Form solves the “web forms” problem. So on and so forth. These are problems that can be solved with tried, tested, and true solutions. These solutions can generally be considered “best practices“. They are solved so that you can get onto solving the even more specific problems… those inside the “application”.

Its worth noting that the definition of a library is also relative to the audience its targeted at. In our above example, the Zend Framework’s intended audience is all PHP developers. Your company, on the other hand, has a smaller target audience: its internal developers. Since that audience is a smaller and more concise group, their needs are more specific than those of the global developer community. That means that a company’s “library” might solve “more specific general problems” on a company wide scale. For example, a company might have 10 applications that use a single-sign-on system. Since those 10 applications within that company have the less specific problem of user sign on, that solution would be best fitted inside the company’s “library”.

In general, libraries solve problems that are generic enough for the entire intended audience, and each problem solved into a component of the “library”. Everything else goes into your “application”.

What is an Application?

As hinted above in the section on libraries, an application too is defined by the problem it attempts to solve. An application is a collection of business specific code which solves a very specific business problem. Again, this sounds generic, but it can be further defined and explained.

A business problem is the most specific problem that can be solved with code; this is the application. It will be the sum of all target environments, target audiences, and target tasks that should be solved. These business problems have a very narrow focus. While applications can be further defined into specific areas of code, the whole of the application’s object is to solve the business problem.

Depending on how complicated the business problem is that is target of the application to solve; an application might be modular. If an application is modular, that implies that the application’s problem area can be divided into even more specific areas of code with specific responsibilities. Lets take a community website for example. The site might include forums, user management, mail, calendaring and news. Each of these respective areas of the site could be considered modules of the main application or website. While this is a generic example, it does demonstrated a logical division of responsibility which is ultimately the point of introducing modules into an application. Each project and business should evaluate their application and decide upfront how granular the application’s problem is, and how best to further divide it. Doing this up front will alleviate many issues that could arise later as the code base starts to grow.

Beyond the modularity of an application, a further, more logical division and organization of code is generally applied. While there are several paradigms of application organization, we’ll focus on the MVC architecture (if you are not familiar with the MVC architecture it might be best to read the wikipedia article first before moving forward). Both an applications module and a non-modular application can be organized into Models, Views, and Controllers.. the main constituents of the MVC paradigm. Without getting to involved into what MVC is, one should know that:

  • The model represents the code base for solving the business problem at hand in a UI and environment agnostic way.
  • The controller represents the code base responsible for bridging a user’s interaction with the UI to the business model, and setting up new UI.
  • The view represents the code base responsible for creating the environment specific UI.

The above grouping of purposes is what is called as a separation of concerns.

Recap

Here is a recap of the terms defined within this article:

  • An Environment is the sum of all resources, capabilities and settings that exist in a PHP process. This generally includes what extensions and ini settings are preset for the PHP process.
  • A Library is collection of code that solves a less specific problem which is further defined by the libraries target audience and problem area.
  • A Component is a collection of code that solves a more specific problem within a library.
  • An Application is collection of code that solves a specific business problem. Ideally, applications consume libraries and components to facilitate quicker and more standardized development.
  • A Module is a collection of code that solves a more specific atomic problem of the larger business problem. The sum of all modules within an application attempt the solve the larger business problem.
  • MVC is a way to group code within both a module and application into a code base that facilitate a better separation of concerns.

PHPAustin Meetup Slides – Software Engineering In PHP

May 15th, 2009 by Ralph Schindler

On Tuesday, Josh Butts and I gave a presentation at the monthly Austin PHP Meetup titled “Software Engineering In PHP”.  Around 30 people were present and judging by the number of questions that were raised on each slide, the interest in the subject matter was fairly high.  In the end, it took around 2:15 to get through the 35 or so slides.

Read the rest of this entry »

New Zend Framework Quickstart Available

September 16th, 2008 by Ralph Schindler

With the release of Zend Framework version 1.6.1, comes a new Quickstart guide on the framework.zend.com website. This quickstart builds on all the great quickstart material provided by Aldemar Bernal, Bradley Holt, & Wil Sinclair on the wiki.

Over the past few weeks, myself and Matthew have gone back and forth on building a simple Guestbook application that would demonstrate what we think a 1.6 ZF application should look like if it were to be created from scratch.

As with most applications, most of the the structure this quickstart application we are suggesting is indeed subjective. We anticipate it will cause some discussions over architecture and best practices design, but then again, that is the point after all.

So, what does it highlight?

  • Zend_Controller & Zend_View – These elements are brought in from the original quickstart from the wiki. This includes building action controllers, views and also highlights error controller usage.
  • Zend_Config & Zend_Registry – Considering you will be building an application that will need to move from development to staging to production, we should have a config file that will support this process. Also, we will demonstrate the usage of an application registry.
  • Zend_Layout – Headers? Footers? Navigation? – That is where Zend_Layout shines when it comes to removing the common page elements from your view scripts. View scripts should be concise and only include information and logic about the action controller calling said view script.
  • And the biggie: Zend_Db_Table & The Model – In this simple guestbook example, we have built a simple Table Module based model for handling guestbook entries. For data access, this model will utilize Zend_Db_Table to access the guestbook entries table.

The quickstart is located at http://framework.zend.com/docs/quickstart/ and you can download the application (which is about 80% comments for your reading enjoyment), on the first page.

So, go download, read, build the application. Once you get a good handle on this quickstart, we will have a few addendums demonstrating new addons and features to this base quickstart application. Already planned is an addendum where I will walk through building this same model as a Domain Model using Zend’s Db_Table for data access.