Table of Contents

Persisting Collections of Objects

Managing Collections of Objects

We have briefly touched on the use of collections in some of the sample code that we have demonstrated in previous modules, but we have not discussed the use of collections in any great detail. All of the normal java.util collection objects as well as arrays of any type are supported by Facets, but this is not the only support that Facets provides for collections. The java.util collections are fine to use when working with relatively small collections, on the order of 10,000 objects or less. But how do we handle a collection that is so large, that there is not enough memory to actually contain the entire collection at any one time. This is an area where Facets really shines, it has a number of specialized collections that scale into the 100s of millions obj objects. These collections can be found in the com.gemstone.persistence.collections package. Before we discuss these collections we need to talk about the "Cu" collections.

Concurrent Update Classes

The Cu in the Cu class name is an abbreviation of Concurrent Update. These classes are really clever classes that make use of patented Gemstone technology to enable concurrent access to a collection from different threads and VMs. When developing large Enterprise systems there is often a requirement for multiple processes to access and potentially update a single collection at the same time, since these applications are typically developed for large multiproccesor machines, and therefore the chance of an update actually occurring concurrently is very high. For example: in the Game Application that we have been using, different VMs need to add or remove a player from the registered players collection. If we had to use a normal HashMap if two VMs attempted to update the HashMap at the same time, without locking it, one or both of the updates would have to fail, since there would be write conflict. The CuHashMap on the other hand will allow two or more transactions to add or delete items from it without generating a conflict, as long as the update did not affect the same object. Let's spell this out more clearly:

Thread 1 can add Player A to the CuHashMap at the same time that Thread 2 adds Player B to the CuHashMap. In this case both transactions will succeed, however if Thread 2 tries to update Player B and Thread 1 tries to update Player B at the same time, one of the transactions will fail, since the CuHashMap only allows concurrent access to itself, not concurrent access to the objects it contains.

Concurrent update classes are one of the most powerful features of Facets since they eliminate the need for complex and time consuming locking of collections when they are being used by multiple processes. Using Cu classes does not either entirely eliminate the need for locking or completely prevent write conflicts, it does however significantly reduce the number of places where locking is required. It is however, still important for your code to handle conflicts when committing updates to collections.

The Facets Collection Classes

All of the Facets collection classes are optimised for very large (i.e. millions of objects) collections, and will transparently manage the loading and unloading of the required subcollections as needed. Certain of the collections may need to be "tuned" occasionally when the number of objects in increases beyond a certain size. We will now discuss the various classes found in com.gemstone.persistence but while this tutorial provides a high level overview of these classes, reading the detailed JavaDocs for them is strongly advised.

  • CuHashMap

    We have used CuHashMap a number of times in the examples to date. It is a sophisticated replacement for HashMap and behaves in a similar manner to the normal Java HashTable.

  • CuHashMapStrict

    Is almost identical to CuHashMap except that attempting to update the same key at the same time will cause a conflict, e.g. thread 1 adding an object at key A and thread 2 adding an object at key A at the same time will cause a conflict

  • CuQueue

    An implementation of the familair queue optimised for concurrent access. This can be really useful when adding tasks to a queue for one or more processor VMs.

  • ScalableArrayList

    This is a Facets list that is optimized for large collections. It would be used where a normal Java List implementation would be used.

  • ScalableHashMap

    This implements a HashMap that can handle large collections of objects, this should be used for large keyed collections unless there is a requirement for concurrent access, in which case CuHashMap should be used.

The Manager Pattern

Enterprise applications contain a large number of large collections. A brute force approach to managing these collections is to name each of these collections and store a reference to them in the Naming Service and then access them directly. This approach can often lead to problems as your application changes over time, since more collections may need to be added, some may need to be changed or accessed differently, or there may be a change in the business rules associated with the addition or removal of objects from that collection. If the collections are directly accessed managing these changes can be difficult, because every piece of code that directly mutates any of the collections has to be changed when their usage patterns change.

The Manager Pattern suggests the following approach in these cases. Create a single object that manages all the collections specific to a domain and ensure that all other code only accesses the collections using the operations provided by the manager object. In our example case, we created a GameManager object and embedded the collections associated with the game into it. We then created a single instance of the GameManager object and named and stored that. All access to the underlying collections is managed by the GameManager object, thus providing a single point at which code and behavior changes can be managed.

Best Practice Patterns for Collections

Best Practice : Define Collection Variables as Interfaces rather than Concrete Types

As applications change with time, the type of collection that is used to hold a collection of objects may need to change. For example, there may initially not be a requirement for concurrent access to a HashMap, but with time there arises a need to allow concurrent access to the HashMap. If the type of the instance is defined to be ScalableHashMap, it will become necessary to manually change all of the references from ScalableHashMap to CuHashMap, which could be a long winded and complex process. Rather, declare the type as the interface, such as Map or List, and then instanciate specific instances and store them into the abstractly typed variable. Changing the actual implementation at a later stage is merely an exercise in creating an instance of the new implementation type, populating it correctly and assigning to the abstractly typed variable.

Best Practice : Use Cu collections intelligently

Because Cu collections perform a number of complex operations transparently for the programmer they have certain overheads associated with their use, and as such they should be used only where concurrent access to collections is an absolute requirement. If in doubt ask Gemstone or your local Facets expert whether they should be used or not.

©2005 GemStone Systems, Inc.