2012-10-10

For Developers, Database Soon to go the Route of Garbage Collector


It used to be simple: there were only a few viable database providers. Most organizations made their RDBMS choice by selecting one of these providers, and they appointed a Database Administrator (DBA) to set and administer all its rules.

Sometime during late 1990s, our approach to data started to shift. People began questioning if traditional databases were designed to perform the tasks we required of them. The one-size-fits-all approach by “Web-enabling” traditional databases led to some spectacular failures under the irregular access patterns, increasing access concurrency and geographically dispersed demand.

Now, in some cases, people tried to fight this shift by vertical scaling their existing databases, acquiring increasingly more powerful machines to compensate for its inherent lack of scalability. All too soon, most realized there simply was no single powerful-enough machine to address their scalability challenges. Others, perhaps seeing the futility of this effort, were unwilling to pay the price for yet another round of hardware upgrades.

Today, the landscape looks very different. Literally hundreds of databases are available on the market, and new ones are added almost every day. Many of these have grown organically to address very specific use-case (pain point as it were, where existent solutions were inadequate or too expensive.) Many of the offerings, see the below chart, are based on Open Source Software (OSS) and almost all can run on commodity hardware.
Source: The 451 Group

Now, some could argue that this Wild West of data persistence will soon end due to consolidation. That’s how it always works, right?

In contrast to many other areas, database consolidation is not something we ought to expect anytime soon, at least not in a way in which we are familiar. Many of these OOS projects do not respond to traditional competitive pleasures. They are managed and maintained by vibrant communities of volunteers -- not driven by market capitalization opportunities. In many cases these solutions address a very specific use-case, and can’t easily be replaced by a next-in-line competitor without losing some very specific features.

The question then becomes “Do application developers really care what technology is used to persist their data?” I would argue that the developers are very passionate about where their data lives (when it’s being used). They really do not care where that data sleeps (is stored). Increasingly, the place where the data lives is the memory, not storage.

It is for that very reason that I am convinced that databases will go the route of Garbage Collector. Let me explain. Unless you are still writing apps in C (and there are some really good reasons why you should sometimes do this), the notion of garbage collection as a means of memory management is something totally abstracted from you as a developer. Yes, some frameworks like Java and .NET do allow you to proactively trigger this process, but unless you go through some “special” steps, this is still only a request. Garbage Collector will perform that task whenever it needs to.

In the same way as GC, the database, and its unique complexities, will soon be abstracted from the developer. The developer will be interacting with a simple service already optimized with the most appropriate technology for that specific workload/data type combination.

You can already see this strategy being implemented in some forward-thinking organizations like Disney, who at recent Cassandra Summit, demonstrated their Data Management Platform (DMP).

Disney's DPM Platform
Source: Cassandra Summit
 Disney's goals for DMP were to:

  • Hiding operational complexity from their application developers
  • Abstracting specific storage engines behind APIs – Focus on Semantics, not Technologies
  • Delivering a uniform security layer across all of their data stores 

Another inherent benefit of this kind of approach is that it creates a more centralized data catalog. The thinking here is that data has more value in a larger context and when its structure is known. Once the data access is streamlined, it is easier to start overlaying additional value-add services in run-time like analytics or notifications.

If you are not convinced this is the future, just consider the alternative. Managing hundreds of different types of persistence frameworks, each with its own replication strategy, storage format and API. What would you prefer?



2 comments:

Mark Chmarny said...

Just finished reading Anant Jhingran's post on A Changed Landscape for Enterprise Data Integration - From ETL to API. Highly recommend as a follow up reading this post.

Mark Chmarny said...

The above comment was missing an actual link. Here http://blog.apigee.com/detail/from_etl_to_api_a_changed_landscape_for_enterprise_data_integration