2012-12-10

PaaS not just about runtime, data services are the next differentiator


In general, Platform as a Service (PaaS) is developed by developers for developers. Of course they’re going to love it. It enables them to focus on the nuances of their applications – not on the day-to-day pointless activities that so often take their time away from solving real problems. The non-developers point to the abstraction of underlining infrastructure and dynamic resource allocation as some of the core benefits of PaaS. In short, we often view PaaS as a runtime execution engine that trivialize the complex aspects of application development and deployment.

The problem with that kind of view however is that it focus primarily on the run-time aspects of the platform. This may be a result of some vendors treating data services as an external concern, strapped onto the platform as an add-on, almost as an afterthought. Heroku, for example, provides only Postgres as their one “native” data service, while OpenShift does slightly better, adds MySQL and a community supported edition of MongoDB.

Everyone would agree that add-ons play an important role in the extendibility of any PaaS solution. I would argue, however, that as the “open” and “polyglot” aspects of PaaS become the de facto standard, a more holistic view of the entire application platform, including a diverse selection of native data services is quickly becoming a major differentiator.

Today, for example, you would not choose PaaS without its support for most common development frameworks, or its ability to run unmodified in public cloud and in private data centers. The very same way, you should not choose a PaaS solution without an integrated, native and diversified data service support.

As many of you know, I work for VMware, which initiated open source PaaS solution called Cloud Foundry. Right now, Cloud Foundry delivers the richest selection of native data services on the market, including MySQL, PostgreSQL, MongoDB, RabitMQ and a couple different versions of Redis. These services deliver predictable, low-latency connectivity to your data whether your application is deployed to the public instance of Cloud Foundry operated by VMware, AWS instance operated by one of our ecosystem partners like AppFog, or to a private instance running out of your own data center. Whichever Cloud Foundry instance your application targets, that data service provisioned by Cloud Foundry will behave exactly the same.

However, it would be naïve to expect all necessary data services to always be available natively. Just for these kinds of situations, Cloud Foundry provides an open source Service Broker (yes, service extending a service), which delivers the very same provisioning characteristics to external or legacy services, which are currently not offered by Cloud Foundry. The best part is that these services can be managed through the same API and benefit from the very same native integration into your application.

In short, if application mobility is important to you, please view data services as an intrinsic part of your PaaS strategy. Add-ons are great and certainly appropriate in many cases; just make sure they don’t become your gateway drug locking your application to specific provider.

2 comments:

Mark Chmarny said...

A few days after writing this post my colleagues challenged me that while assuring locality of data is certainly important, and should be aspired for, the reality of the shifting landscape in today’s enterprise makes this a utopian notion. As one who strives for pragmatism, I aim for a less unicorn-like approach to data provisioning. So, I wrote a response outlining the current opportunities in data flow automation. Make sure you give this one a read.

Navin R. Thadani said...

Mark, good post.

However, I feel that the PaaS industry is missing an important angle. They way most PaaSs are today, lend themselves very well to rapid prototyping etc. but for anything beyond that, most developers/devops need to get back to IaaS and build their own stacks. This is largely due to the lack of customizability.

Ideally, the industry should have something in the middle between a PaaS and an IaaS. For example, if one had fully customizable, cloud neutral blueprints. For example, I have a blueprint starting point as a particular LB, app server and a DB. Assume in this case that the DB is PostgreSQL - but I want to change that to Mongo. I just delete the PostgreSQL VM and drag and drop (or via an API) a Mongo VM. If for some reason, I want to tweak the network, I have a way to do that as well.

In addition, all my developers should be able to instantly clone the full multi-VM/distributed environment on demand. I should be able to hook it up to my CI system and run tests every time someone checks in code - on the full multi-VM instance. That way I catch issues before staging/prod.

All of this should be manageable via code and should be a part of the source control. That in my mind is an ideal developer solution. It allows ease of use (for prototyping) but also complete control and flexibility for additional stages in development/deployment.

We have spent 18+ months building something like this at www.ravellosystems.com . Would like to get your thoughts.