Archive for Neo

Storing Custom Fields in the Database

Throughout the years, I’ve worked on a number of apps which required user-customizable fields. For instance, consider a document-management system which has a some standard fields (document title, creation date), but also gives users the possibility of adding their own (department name, security code, etc). The question whether having such fields is a good idea is touched upon by Joel; in this post, I’d like to think about the available options for storing these fields in a relational database.

So far, I’ve come across four different patterns to store custom fields in a RDBMS:

  1. the meta-database
  2. the mutating table
  3. the fixed table
  4. the LOB

All of these have their advantages and disadvantages. I will discuss them seperately, using the document-management system as an example.

the Meta Database

This approach basically defines a database inside of the database. In its simplest form, it consists of two tables: a Field Definition table, and a Field Value table. The value table has a foreign key relationship with the definition table, and with the table containing the entities (i.e. the documents table). The following figure illustrates this:

Meta Database

When adding a custom field, you just add a row to the definitions table. When adding a document, you look up the defined custom fields, and add value rows for all definitions. When retrieving documents, you also retrieve all custom fields for that document.

Advantages

  • Fully normalized

Disadvantages

  • Requires complex queries to get to the fields values
  • Slow, because of the JOINs
  • The fields table become huge, because we add a row for each custom field

the Mutating Table

When using this approach, we store all custom fields in one Fields table. This table has a one-on-one relationship with the documents table, and it alters as we add or remove custom fields. Here’s a picture:

Mutating Table

The custom fields table initially starts out with just an id column, but as custom fields are defined, columns are added to this table by using ALTER TABLE statements. When custom fields are removed, we drop the column again.

Advantages

  • Simple to query: the data is just there to read
  • Works nicely with reporting tools

Disadvantages

  • Requires runtime permissions to alter the database
  • Does not work with ORMs which require a mapping to be defined upfront

the Fixed Table

This approach is very similar to the previous one: we still have one table for all fields. The major difference is that we don’t alter this table, but initialize it with a fixed number of nullable columns: 20 or 30 will do. The columns all have unimaginative names such as CustomField1, CustomField2, etc.; in another table, we map these column names to the field names as fields are defined. It looks something like this:

Fixed Table

Advantages

  • Simple to query
  • Works with ORM & reporting tools

Disadvantages

  • Fixed amount of fields

the LOB

Finally, there is the option of storing custom fields in some sort of Large Object. You can serialize some Hashtable object containing the fields and store that in a BLOB, or create an XML tree and store that in a CLOB. The LOB containing this data will just end up right next to the other document fields, in the document table:

LOB

Advantages

  • Fully extendable

Disadvantages

  • Hard or even impossible to query

Conclusions

I’ve used all of these four approaches, and they all have their pro’s and con’s. Overall, my favorite is the Fixed Table pattern, where we simply reserve some table space for 20 or so fields. Sure, this might be less flexible, since the amount of fields is fixed, but I don’t think that’s a big problem. Quoting Joel:

Here’s a common programmer thought pattern: there are only three numbers: 0, 1, and n. If n is allowed, all n’s are equally likely. […] Thus, for example, programmers tend to think that if your program allows you to open multiple documents, it must allow you to open infinitely many documents […]. A programmer would tend to look with disdain on a program which limited you to 20 open documents. What’s 20? Why 20? It’s not even a power of 2!

In the past, when I offered the choice to clients, explaining that they could choose between a solution which was easy to implement, but had an upper limit of 20 fields; or a solution which was more complex, and doesn’t work with their existing reporting tools, they all chose the simple solution.

And besides, 20 fields ought to be enough for anybody.

Comments (9)

Architecture vs. Infrastructure

This is my first post on Neo (.NET Entity Objects), a framework for .NET developers who want to write enterprise applications with an object-based domain model. I have been using Neo for little over a year now, and it has suited me perfectly for my .NET ORM needs. To further quote the website:

Neo includes tools that create an extensible object-based domain model as well as the database schema from a an abstract description of the model. At runtime, rich schema information is used to dynamically generate all SQL required for object persistence management. Being based on ADO.NET data sets, a Neo domain model is independent of the actual backing store and works equally well with databases and objects in an XML representation.

One of the nicest features features of Neo is that it encourages you to place the business logic in the domain objects. As developers, we have been thought good object-oriented practices such as data abstraction, information hiding, encapsulation, inheritance, polymorphism. It almost seems that we forgot about these principles when we started to make Enterprise software: in both J2EE land and .NET, I often see the following architecture:

  1. Data layer, which contains value objects with just properties,
  2. Business or service layer, with Manager or Service classes that manipulate the data objects
  3. Presentation layer, which calls the business layer

This not a object-oriented architecture (which combines data and process together), this is a functional architecture. Why should we give up on our OO-practices? Because we want to store something in a database? Shouldn’t that be just a matter of serialization…

As counter-argument, I am often told that enterprise-like features such as security, transactions, or auditing cannot be implemented when using a behavioral domain model. Thus, what you see is that the infrastructure of the project is often more important than the architecture; it seeps through the design. Martin Fowler calls this the Anemic Domain Model anti-pattern.

I think that this is wrong: the infrastructure should be a thin layer on top of the architecture, and never influence it. And to end this post on topic: Neo nudges you in the right direction: for every entity, it generates two classes: a value-containing base class, and a behavioral subclass that you are supposed to edit.

Comments (4)