Learning About Dependency Injection and PHP

May 18th, 2011 § 31 comments § permalink

Over the past few years, there are a few concepts and programming patterns that have muscled their way into the hearts and minds of PHP developers from other languages and programming communities. These concepts range from the MVC application architecture as well as various modeling techniques (think ActiveRecord and Data Mapper), to a pure shift in the way we think about application architectures, like aspect-oriented programming (AoP) and event-driven programming. Perhaps it’s because PHP has been adopted at an enterprise level thus increasing the demand for what developers might call enterprise quality programming patterns, or perhaps it’s simply because of PHP’s ever evolving object model that makes new things possible. After all, who doesn’t like new shiny things? Whatever the reason, one of the newest concepts (at least over the past 3 years or so) that has emerged as one of our heated topics of debate is how to manage object dependencies. Interestingly, the argument of how to manage dependencies is generally named by the solution which its proponents give as the solution: dependency injection (the abstract principle is actually called Inversion of control).

In any circle of developers that are of the object-oriented persuasion, you’ll never hear an argument that dependency injection itself, is bad. In these circles, it is generally accepted that injecting dependencies is the best way to go. Injecting object dependencies in PHP looks like this:

That’s basically it. There are many variations of this: setter injection, interface injection, call time injection, in addition to the above mentioned constructor injection. These are all valid ways of injecting the dependencies into the consuming object. Ultimately, the goal here is to avoid this:

The above code is an example of a violation of the Hollywood Principle, which basically states: “Don’t call us, we’ll call you.”.

Yet, this is not the heart of the argument. Perhaps it was 4-5 years ago in the PHP community, but it’s not anymore. The heart of the argument is not should we be doing it, but how do we go about doing it.

This article is not about the intricacies and implementation details of DI containers and DI frameworks. It’s also not about the various ways and means of injecting dependencies into other objects, or which method might be better. In fact, this article has no opinion if injecting dependencies is even good for you or your application. This article is an exploration how adopting any DI framework for PHP affects the lifecycle of a project, both the code as well as the developer, team or organization that is constructing it.

A Brief History of Dependency Management In PHP

It is important to know why PHP is as popular as it is, after all, it’s this popularity that DI Frameworks fight against for adoption inside a PHP application framework. To understand PHP’s popularity, history, and evolution, let’s look at this code:

From the beginning, we’ve been trained into thinking that our dependencies are magically managed. As you can see above, the mysql_query() function, while it will accept a connection resource, does not require it. In fact, if it’s not supplied, it will use the first open mysql connection it can find inside the PHP runtime. Assuming that the above mentioned delete-user.php script is part of a larger collection of PHP scripts, which we will call “the application” … it is important to note that even this script itself is pulling in its dependencies instead of them being injected. For all intents and purposes, the config.php, header.php and footer.php are all dependencies of this script, much like other scripts similar in nature to this delete-user.php. To sum it up, if there is a new dependency that is now required by the business logic portion of this application (ie: the lines between the header and footer), they now have to be introduced to all scripts in this application. This does not exactly adhere to the DRY principle.

But, let’s take a step back and look at this snippet of code from the organizational perspective. To do this, we must first understand the various phases of the code’s lifecycle within any organization. For the purposes of this example, let’s assume that from idea to production, code will go through the following phases: development, build, deployment, to application start-up (in production). If this were a C/C++ or Java project, code will have been written (developed), it will have been compiled (built), then it would have been packaged or some deployment tool’s process invoked (deployed); it them would have been run (executed via some startup script, or executing a binary.) PHP, and Perl at the time, achieved all of the same objectives but in fewer steps making it a wildly popular platform for highly iterative web projects. This same application in PHP would have been coded in some text editor (developed), and FTP’d up to a production server (deployed). You’ll notice that it neither had to be built/compiled, or started on the server since the target, Apache, was already running with PHP embedded into it. For all intents and purposes, a cheap and easy FTP tool was both the build and deployment tool for this application’s lifecycle.

It was this simplicity that made PHP the popular choice for web applications. This popularity was attained because the simplicity of the PHP platform allowed for two extremely important facets of development to emerge: the idea of building an application became approachable to even the novice individual, and without all the cruft that came along with the application lifecycle, building and deploying applications in PHP increased PHP’s “fun-ness” factor.

While this style of building applications allowed for a proliferation of PHP applications to be developed, there was in fact a negative side to be revealed later in time. As applications quickly grew, their ability to be maintained decreased. We give them the name “Spaghetti code”, and for all the right reasons. Objects, if they were even being used, were generally wrappers around procedural functionality. So object dependency management wasn’t even a consideration for most developers. Looking back, perhaps it was this original simplicity that allowed developers to create applications without even having to know what a dependency was or how to find it. In any case, as these applications grew uncontrollably, maintaining them and hacking them started to lose the PHP fun factor exponentially.

A Brief History of DI Frameworks

As PHP developers started identifying the problems with their Model 1 applications, they started looking for solutions in other programming communities. At this time, the Java community was still heavily rooted in the enterprise/software development/software engineering world, and problems such as dependency management already had some interesting solutions. Most notably, there was the Spring Framework, who’s primary facility for dependency management was a component called the IoC Container, or the Inversion of Control container. This container managed the fully lifecycle of object creation using callbacks. This meant that you no longer has to use the “new” keyword (the same new keyword in PHP). Also, it wired the dependencies for you at instantiation time. This meant that you no longer had to concern yourself with how dependencies were injection; be it through the constructor, properties or setter methods. The Spring Framework was one of the first frameworks that encouraged the use of definition files to manage the knowledge required to wire all your dependencies together. True to form in the Java community, these definition files were created in XML.

As it might seem, this is indeed a deviation from the PHP philosophy that had made PHP so popular. PHP allowed you to write the most minimal amount of code to complete your application. In the Java/DI world, particularly with the Spring framework, you had a much richer application lifecycle. Not only were you developing code for your appliation, but you were creating code about code to manage code. This is known as meta-programming. In addition to this meta-programming that was going on, you also now had this compilation phase required by the Java platform which was generally tucked away inside your build time tasks. Moreover, this application had to be deployed (there were generally tools for this too), and (for good measure), due to the platform, your application had to be started. Needless to say, this application lifecycle might seem heavier, for lack of a better term, to the average PHP developer.

Since then, several frameworks have cropped up that sport some kind of dependency management. Before this technique was picked up in PHP, they were all heavily rooted in the Java and .NET communities. A quick google search will return a few notable names like PicoContainer, Spring.NET, Unity, Butterfly and google-guice to name a few. These frameworks attain popularity since they attempt to ease some of the burdens that DI places upon the developer whether it be by using reflection to create definitions, or even adding an annotation system so that DI definitions can be written inside the code they are set to manage.

DI and PHP

To understand the attainability of having a dependency management framework for PHP, one should first understand how the counterparts in Java and .NET rely upon their respective platforms to do certain jobs. For a quick reference, see the images from this blog post. One of the more important facets to remember is that the expected application lifecycle of a Java/.NET application is much richer. You are expected to have build-time tasks. You are expected to have deployment tasks. And, generally, your application understand the difference between being in development, staging and production – so it can adjust how it runs accordingly. Moreover, the platform itself has facilities in place that aid the developer both in development time with code generation as well as in production.

PHP never expects or facilitates the usage of any kind of build-time tasks. PHP also does not have any kind of built-in annotation support (a meta-programming technique), nor does it have any kind of application scope or per-application memory space. What does this mean for someone who is creating a DI container? Let’s explore.

Development Time

General speaking, any time you are writing, altering or just shifting code around, you are in development mode, your application should be running in a development environment. The structure of your application’s classes, functions and files within the filesystem is probably changing with each time you click save. Dependency management systems require knowledge of your code in order to effectively do their job. This knowledge generally comes in the form of some kind of definition.

This definition can be created by hand, by the developer, generated at runtime by some application hooks, or generated with the use of a special tool. If this is done by hand, a developer is required to explicitly map the various functions/methods that will need to be called in order to inject a particular object dependency. The more dependencies you have, the more verbose this definition might become.

A better route would be to generate this definition file, after all, the code you’ve written, if written correctly will self-describe its dependencies. There are two options for generation, manual and automatic. An example of manual generation would be a developer giving a command line tool the minimal information it needs to be able to go parse your code, figure out the dependency map for itself, and generate some kind of definition to be used during runtime. Minimal information might include some kind of seed information like where to find your classes or perhaps what filters to use when inspecting classes. Sometimes, these tools might make use of special interfaces (also called interface injection) to understand that their purpose is to describe the various dependencies of the class implementing said interface. Another approach might be to utilize special annotations on classes and class methods that describe the various required and optional dependencies and how they are to be injected.

The same techniques employed in this manual approach could also be put to use in an automatic approach. In automatic approach, imagine this same command line tool from the manual approach was now a service of the application itself. While in development mode, it would run as often as need be in order to determine if code changes have happened. If they have, the service would regenerate the dependency definition file so that the rest of the application can utilize the dependency definition inside the DI container available to the application during runtime.

There are a couple of concerns that are specific to PHP with regards to dependency management. Since PHP is a share-nothing architecture with no application level memory, this definition would need to be loaded and parsed and put into memory on each request. The larger the dependency tree that you track, the larger the memory footprint of the dependency definition graph. Furthermore, since this definition has to be loaded on each request, if it is in a non-native format (meaning anything other than PHP code), there are certain costs with converting this format, be it XML, YAML, JSON, or INI to the in-memory structure that the dependency management container requires. What’s more, the PHP platform does not keep track of file changes. So without some kind of user-land tracking, it is hard to know what files during development have changed. Thus, your dependency management system, if it’s taking an automatic approach, would have to rescan the filesystem for changes upon each request during development – which has its own consequences.

Deployment Time

When one is done writing code and is ready to push this application into production, the act of pushing this application is called deployment. The mode for this application is now considered “production”. In production, you can be sure that the structure of your code is stable and will not change, thus your dependency graph is now safe from changes too. Since this is the case, there is no longer a need to keep updating and regenerating this dependency definition file like you were during development.

Even though the definition is no longer changing, there still is the concern about how expensive it is to load this definition each request. Naturally, the cheapest form of definition would be a PHP array or structure describing the definition that can then be loaded in-memory. Other file types like XML, YAML, JSON, etc first have to go through a parsing phase before they can be used. This activity of parsing these files could be expensive, and could benefit from some kind of caching. Caching the definition in some way shape or form, would ensure there is minimal overhead per-request when the application is using this dependency management container.

Other Observations & Criticisms

It is important to realize that dependency management solutions in and of themselves are, in all the available words, full frameworks. They require that you understand both their philosophy as well have a minimal understanding of what facilities they are offering in order to use them effectively. To understand the true benefits of any framework one must first know the pain points the framework is attempting to solve. Seeing the end result of a framework without knowing what it is facilitating might lead to one to dismiss it as overkill or unintuitive. For example, take the following code (typical of dependency management systems)

If you encounter this line of code without fully understanding the dependency injection container being used, you wouldn’t be able to appreciate its usefulness. You could instantiate your Application\Model\UserRepository yourself, sure, but you’d also have to locate and inject the database adapter to use and into that you’d have to inject and load the configuration for that database connection. If you are doing this in multiple controller actions, there is a lot of repeated boilerplate code that is required to “wire” the UserRepository object. Internally, the DiC object is loading and consulting a definition, creating objects, injecting those objects, and returning the requested object that has been fully wired and ready to use.

The above code also demonstrates two common criticism of dependency management frameworks, which is also a criticism of frameworks in general. By using this framework, you are moving further away from the facilities of the language or platform itself. Instead of using the “new” keyword to create a new object, you’ve asked another object to create this requested object for you. What this has done has shifted developers away from utilizing the language’s well understood API and onto the framework’s API. Additionally, this kind of code is not easily understood by IDE’s. While special features could be added to the IDE to support this framework, it does not inherently know what kind of object is being returned by the $dic->get(..) method call.

Summary

While dependency management frameworks have clear drop-in benefits, there exist a few considerations that have unknown or unexplored consequences. For example, if the benefit is such that all dependencies are managed, and all a developer has to do is configure it, does that encourage deeper object graphs when creating classes and class dependencies? If so, what is the performance impact of these deep object graphs, particularly on the PHP platform. What are the memory implications of such object graphs, what are the speed implications of them? Furthermore, if one needed to debug an object that has been generated by a dependency management framework, is that easily possible?

At the end of the day, whether or not to use a dependency management framework is a matter of cost versus benefit. In order to be able to make an informed decision, a developer should consider a few scenarios. First, one should know what code might look like with and without this new framework. This will give an indication of the cost/benefit at the code level, does it actually save lines of code, and developer headaches? Secondly, one should consider how much added knowledge a developer or a team of developers need in order to understand this framework. Lastly, one should consider what kind of performance impact implementing this new framework has on the application’s throughput.

Where am I?

You are currently viewing the archives for May, 2011 at Ralph Schindler.