Making the Switch to Facets
Overview
Making the switch to Facets from an RDBMS involves
considerable thought due to the fundamental differences between an ODBMS
(Object Database Management System) and an RDBMS (Relational Database
Management System). In this series of articles on "Making the Switch to
Facets", these issues will be discussed. Part 1, "Why an Object Database?" will
discuss why moving to an Object Database is a good idea in the age of Java and
Object Orientation, Part 2, "ODBMS Background", discusses why an ODBMS is
different from RDMS and the changes in thinking that have to take place to
efficiently exploit an ODBMS. Parts 3 to 6 of "Making the Switch" cover some of
the architectural and technical details that have to be handled when actually using
Facets.
Making the Switch Part 1 - Why an Object Database?
As an OO (Object Oriented) developer using Java in the
current decade, it is easy to forget that until very recently both the OO
paradigm and OO languages were not considered mainstream. Java has dramatically
changed the landscape of development by becoming the first mainstream OO
language. Devotees of other languages such as C++ and Smalltalk may dispute
this statement, but it has been the widespread adoption of Java that has
convinced the majority of developers, architects and IT executives that Object
Orientation is not only valid but also vital in the struggle to develop
sophisticated, flexible and yet maintainable enterprise systems.
It is worthwhile considering then, that most of the concepts
associated with databases were developed when non OO paradigms reigned supreme.
The current crop of relational database management systems were designed with
the paradigm of procedural programming in mind. If our languages have changed
from being fundamentally procedural to being object based surely our storage
mechanisms should too?
Object Databases are easier to use than Relational Databases
In Java almost everything is an object (only the primitive
types such as int, float, char and so on are not, and with Tiger, Release 1.5
of Java, this distinction will for all intents and purposes, vanish). It
therefore makes sense to store our Java objects as real objects and not break
them down into their constituent parts (and then break these constituent parts
down in turn) prior to storage.
This is where an ODBMS really shines. An ODBMS, unlike an
RDBMS, is intimately familiar with how objects work, and can take objects as
they are and store them away for later access. This translates into less effort
for developers, architects and database administrators since objects can be
used and stored as is, without the complexities of Object-Relational mapping
code and complex nested transactions to handle atomic commits to multiple
objects whose state is spread across many tables in an RDBMS.
Object Databases require developers to write less code
Unlike using an RDBMS, which requires an inordinate amount
of work and input from the developer to save an object, an ODBMS can simply
persist the Java object with very little assistance from the developer. This
means that an ODBMS significantly improves developer efficiency since the
developer can concentrate on the business logic and the business issues at
hand, rather than having to write what is in essence glue code, to save and restore
objects.
Object Databases simplify the use of complex data structures
Modern enterprise systems require complex data structures to
manage the sophisticated functions they support such as personalization, user
customization, neural networks, Bayesian learning, image recognition and so
forth. Developers often have to use relatively simplistic data structures if
they are forced to store their data in an RDBMS, since relational databases do
not directly support complex data structures. Object databases on the other
hand support and manage the most complex data structures imaginable in a
completely natural and easy to use manner. This encourages developers to design
their data structures correctly, allows for simplified code, and improves
performance, since there are no decomposition / re-composition overheads. For
example, to store the state of a neural network, each of the values from the
neurons have to extracted, and then stored in the various tables allocated for
them, an ODBMS just saves the entire neural network as it is.
Object Databases can often outperform Relational Databases
That an ODBMS is aware of the structure of objects also
results in considerable performance gains as well. Consider for a moment that a
program using an RDBMS has to retrieve the various values from the columns of
potentially dozens of tables, in some cases using complex joins, and then
construct objects using these values. Often objects contained by the primary
object may also need to be loaded, this can take a considerable amount of time,
since these contained objects can only be constructed once the values needed to
key them have been loaded as part of the construction of the primary containing
object.
An ODBMS does not labor under such limitations, it stores
objects as objects, and in a form compatible with the underlying OO language.
It can merely load the entire object in one atomic operation, and potentially
the tree of contained objects as well (depending on the clustering of the
objects). This can result in massive performance gains :- if a tree of objects
is correctly laid out, it can be loaded in its entirety in one disk read. An
RDBMS system on the other hand may have to perform tens of reads to completely
load a tree of objects.
A Short Digression on the Storage of Objects
When Facets persists an object or a tree of objects, it
copies the contents of those objects into pages of memory that map directly to
pages on the disk in the Facets repository, it then writes these pages into the
repository. Hence it is very likely that when a group of related objects are
persisted for the first time, they will occupy the same page (unless of course
the object or the collection is too large, in which case it will be spread
across many pages). When Facets later loads this object, it does so by
ascertaining the repository disk page that contains the object, and then
loading that entire page into memory (there is also page caching involved, so
this is highly efficient, and the Facets hit rate on the shared page cache is
very high). What this means is that when an object is loaded, there is a high
probability that the entire contents of an object and the objects it in turn
contains, are located on that same page.

Conversely in an RDBMS, because the unit of storage is the
table rather than the object, tables are spread across pages. When RDBMS code
has to load an object, the RDBMS may have to load dozens of pages to retrieve
the contents of only one object, and may in fact even have to load considerably
more if it is a tree of objects, since the rows in a table may be spread across
multiple pages. In the case where a tree of objects need to be loaded, and each
object in the tree is of a different type, then the RDBMS has to load at least
one page per object type, since contents of tables can not be collocated on a
page. In the ODBMS case, it is possible (and likely, if the size of the tree of
objects is smaller than the page size, and if the tree was persisted at the
same time) that there could be only one page load. Facets has mechanisms for
enabling collocation of objects on a page where possible.
Object databases are a superior solution
The key points above very clearly outline the significant
advantages of using an ODBMS when programming in an OO language:
- ease of use
- simplicity of coding,
- performance,
- developer efficiency and
- focus on business logic rather than plumbing.
We will discuss a number of these matters later in this
series of articles, and explain in some detail the whys and wherefores of
making the switch from an RDBMS to the most natural form of storage for Java, a
Java Object Database Management System.
Vincent Coetzee - March, 2005