Making the Switch to Facets

Overview

Making the switch to Facets from an RDBMS involves considerable thought due to the fundamental differences between an ODBMS (Object Database Management System) and an RDBMS (Relational Database Management System). In this series of articles on "Making the Switch to Facets", these issues will be discussed. Part 1, "Why an Object Database?" will discuss why moving to an Object Database is a good idea in the age of Java and Object Orientation, Part 2, "ODBMS Background", discusses why an ODBMS is different from RDMS and the changes in thinking that have to take place to efficiently exploit an ODBMS. Parts 3 to 6 of "Making the Switch" cover some of the architectural and technical details that have to be handled when actually using Facets.

 

Making the Switch Part 1 - Why an Object Database?

As an OO (Object Oriented) developer using Java in the current decade, it is easy to forget that until very recently both the OO paradigm and OO languages were not considered mainstream. Java has dramatically changed the landscape of development by becoming the first mainstream OO language. Devotees of other languages such as C++ and Smalltalk may dispute this statement, but it has been the widespread adoption of Java that has convinced the majority of developers, architects and IT executives that Object Orientation is not only valid but also vital in the struggle to develop sophisticated, flexible and yet maintainable enterprise systems.

It is worthwhile considering then, that most of the concepts associated with databases were developed when non OO paradigms reigned supreme. The current crop of relational database management systems were designed with the paradigm of procedural programming in mind. If our languages have changed from being fundamentally procedural to being object based surely our storage mechanisms should too?

 

Object Databases are easier to use than Relational Databases

In Java almost everything is an object (only the primitive types such as int, float, char and so on are not, and with Tiger, Release 1.5 of Java, this distinction will for all intents and purposes, vanish). It therefore makes sense to store our Java objects as real objects and not break them down into their constituent parts (and then break these constituent parts down in turn) prior to storage.

This is where an ODBMS really shines. An ODBMS, unlike an RDBMS, is intimately familiar with how objects work, and can take objects as they are and store them away for later access. This translates into less effort for developers, architects and database administrators since objects can be used and stored as is, without the complexities of Object-Relational mapping code and complex nested transactions to handle atomic commits to multiple objects whose state is spread across many tables in an RDBMS.

 

Object Databases require developers to write less code

Unlike using an RDBMS, which requires an inordinate amount of work and input from the developer to save an object, an ODBMS can simply persist the Java object with very little assistance from the developer. This means that an ODBMS significantly improves developer efficiency since the developer can concentrate on the business logic and the business issues at hand, rather than having to write what is in essence glue code, to save and restore objects.

 

Object Databases simplify the use of complex data structures

Modern enterprise systems require complex data structures to manage the sophisticated functions they support such as personalization, user customization, neural networks, Bayesian learning, image recognition and so forth. Developers often have to use relatively simplistic data structures if they are forced to store their data in an RDBMS, since relational databases do not directly support complex data structures. Object databases on the other hand support and manage the most complex data structures imaginable in a completely natural and easy to use manner. This encourages developers to design their data structures correctly, allows for simplified code, and improves performance, since there are no decomposition / re-composition overheads. For example, to store the state of a neural network, each of the values from the neurons have to extracted, and then stored in the various tables allocated for them, an ODBMS just saves the entire neural network as it is.

 

Object Databases can often outperform Relational Databases

That an ODBMS is aware of the structure of objects also results in considerable performance gains as well. Consider for a moment that a program using an RDBMS has to retrieve the various values from the columns of potentially dozens of tables, in some cases using complex joins, and then construct objects using these values. Often objects contained by the primary object may also need to be loaded, this can take a considerable amount of time, since these contained objects can only be constructed once the values needed to key them have been loaded as part of the construction of the primary containing object.

An ODBMS does not labor under such limitations, it stores objects as objects, and in a form compatible with the underlying OO language. It can merely load the entire object in one atomic operation, and potentially the tree of contained objects as well (depending on the clustering of the objects). This can result in massive performance gains :- if a tree of objects is correctly laid out, it can be loaded in its entirety in one disk read. An RDBMS system on the other hand may have to perform tens of reads to completely load a tree of objects.

 

A Short Digression on the Storage of Objects

When Facets persists an object or a tree of objects, it copies the contents of those objects into pages of memory that map directly to pages on the disk in the Facets repository, it then writes these pages into the repository. Hence it is very likely that when a group of related objects are persisted for the first time, they will occupy the same page (unless of course the object or the collection is too large, in which case it will be spread across many pages). When Facets later loads this object, it does so by ascertaining the repository disk page that contains the object, and then loading that entire page into memory (there is also page caching involved, so this is highly efficient, and the Facets hit rate on the shared page cache is very high). What this means is that when an object is loaded, there is a high probability that the entire contents of an object and the objects it in turn contains, are located on that same page.

Conversely in an RDBMS, because the unit of storage is the table rather than the object, tables are spread across pages. When RDBMS code has to load an object, the RDBMS may have to load dozens of pages to retrieve the contents of only one object, and may in fact even have to load considerably more if it is a tree of objects, since the rows in a table may be spread across multiple pages. In the case where a tree of objects need to be loaded, and each object in the tree is of a different type, then the RDBMS has to load at least one page per object type, since contents of tables can not be collocated on a page. In the ODBMS case, it is possible (and likely, if the size of the tree of objects is smaller than the page size, and if the tree was persisted at the same time) that there could be only one page load. Facets has mechanisms for enabling collocation of objects on a page where possible.

 

Object databases are a superior solution

The key points above very clearly outline the significant advantages of using an ODBMS when programming in an OO language:

  • ease of use
  • simplicity of coding,
  • performance,
  • developer efficiency and
  • focus on business logic rather than plumbing.

We will discuss a number of these matters later in this series of articles, and explain in some detail the whys and wherefores of making the switch from an RDBMS to the most natural form of storage for Java, a Java Object Database Management System.

 

Vincent Coetzee - March, 2005