Archive for the '.NET' Category

Storing Custom Fields in the Database

Thursday, January 24th, 2008

Throughout the years, I’ve worked on a number of apps which required user-customizable fields. For instance, consider a document-management system which has a some standard fields (document title, creation date), but also gives users the possibility of adding their own (department name, security code, etc). The question whether having such fields is a good idea is touched upon by Joel; in this post, I’d like to think about the available options for storing these fields in a relational database.

So far, I’ve come across four different patterns to store custom fields in a RDBMS:

  1. the meta-database
  2. the mutating table
  3. the fixed table
  4. the LOB

All of these have their advantages and disadvantages. I will discuss them seperately, using the document-management system as an example.

the Meta Database

This approach basically defines a database inside of the database. In its simplest form, it consists of two tables: a Field Definition table, and a Field Value table. The value table has a foreign key relationship with the definition table, and with the table containing the entities (i.e. the documents table). The following figure illustrates this:

Meta Database

When adding a custom field, you just add a row to the definitions table. When adding a document, you look up the defined custom fields, and add value rows for all definitions. When retrieving documents, you also retrieve all custom fields for that document.

Advantages

  • Fully normalized

Disadvantages

  • Requires complex queries to get to the fields values
  • Slow, because of the JOINs
  • The fields table become huge, because we add a row for each custom field

the Mutating Table

When using this approach, we store all custom fields in one Fields table. This table has a one-on-one relationship with the documents table, and it alters as we add or remove custom fields. Here’s a picture:

Mutating Table

The custom fields table initially starts out with just an id column, but as custom fields are defined, columns are added to this table by using ALTER TABLE statements. When custom fields are removed, we drop the column again.

Advantages

  • Simple to query: the data is just there to read
  • Works nicely with reporting tools

Disadvantages

  • Requires runtime permissions to alter the database
  • Does not work with ORMs which require a mapping to be defined upfront

the Fixed Table

This approach is very similar to the previous one: we still have one table for all fields. The major difference is that we don’t alter this table, but initialize it with a fixed number of nullable columns: 20 or 30 will do. The columns all have unimaginative names such as CustomField1, CustomField2, etc.; in another table, we map these column names to the field names as fields are defined. It looks something like this:

Fixed Table

Advantages

  • Simple to query
  • Works with ORM & reporting tools

Disadvantages

  • Fixed amount of fields

the LOB

Finally, there is the option of storing custom fields in some sort of Large Object. You can serialize some Hashtable object containing the fields and store that in a BLOB, or create an XML tree and store that in a CLOB. The LOB containing this data will just end up right next to the other document fields, in the document table:

LOB

Advantages

  • Fully extendable

Disadvantages

  • Hard or even impossible to query

Conclusions

I’ve used all of these four approaches, and they all have their pro’s and con’s. Overall, my favorite is the Fixed Table pattern, where we simply reserve some table space for 20 or so fields. Sure, this might be less flexible, since the amount of fields is fixed, but I don’t think that’s a big problem. Quoting Joel:

Here’s a common programmer thought pattern: there are only three numbers: 0, 1, and n. If n is allowed, all n’s are equally likely. […] Thus, for example, programmers tend to think that if your program allows you to open multiple documents, it must allow you to open infinitely many documents […]. A programmer would tend to look with disdain on a program which limited you to 20 open documents. What’s 20? Why 20? It’s not even a power of 2!

In the past, when I offered the choice to clients, explaining that they could choose between a solution which was easy to implement, but had an upper limit of 20 fields; or a solution which was more complex, and doesn’t work with their existing reporting tools, they all chose the simple solution.

And besides, 20 fields ought to be enough for anybody.

Conversions: All in a Day’s Work

Thursday, January 10th, 2008

Back in the day, when I was a C++ Developer on the Microsoft platform, it always struck me as weird that half of my code base consisted of String conversions1. The reason for all this conversion fun was that there were at least half a dozen string types in Microsoft C++: there is the plain char*, the Unicode wchar*, LPSTR and friends (LPCSTR, LPWSTR, LPCWSTR, LPTSTR, LPTSTR, LPCTSTR, and LPCTSTR), the MFC CString, the STL basic_string, the Visual Basic BSTR, and its C++ wrappers _bstr_t and CComBSTR. And I’m probably forgetting quite a few. Each library used a different string, so conversion was all in a day’s work.

Imagine my joy when I started programming in Java, some 10+ years ago. “Woohoo! Just one string type! And garbage collection too! No more conversions, I am going to be so productive! Finally, I will have time for a Real Life!

Fast-forward to the present, and I still don’t have that Real Life I wanted. I wonder what went wrong. As it turns out, conversion is still all in a day’s work. The only thing that has changed is the types we’re converting.

If you look at the classic architecture of an Enterprise Application, it will look something like this:

I am counting eight conversions in this picture. Granted, you can skip the Data Transfer Objects by using Domain Object directly, and the SQL can be generated by using an ORM tool. However, using an ORM tool does not mean that there is no conversion at all; it is just a higher-level abstraction. You still have to define the metadata (Hibernate mappings, JPA annotations, etc.) to do this conversion, of course.

In my experience, a typical Enterprise application consists of very little “Business Logic”2, but mostly Conversion Logic, so I suppose that conversions are still all in a day’s work.

Guess I’ll never have that Real Life.

<hr/>

1 - The other half consisted of reference counting and releasing. Makes you wonder what’s left.<br/> 2 - An most of the Business Logic is hardly logical.<br/>

A Fluent Interface for XML DOM APIs

Thursday, May 10th, 2007

The concept of fluent interfaces is certainly not new to me, but recently I’ve had the chance to bring it into practice by using EasyMock 2. EasyMock 2 uses the features of Java 5 to greatly simplify the programming model for mock objects.

This got me thinking about XML, and DOM in particular. Let’s say we want to create the following piece of XML using DOM:

<contacts>
   <contact>
      <name>John Doe</name>
      <phone type="home">555-12345</phone>
      <phone type="work">555-67890</phone>
   </contact>
</contacts>

Basically, the way to create this XML using the standard DOM API look something like:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.newDocument();
Element name = doc.createElement("name");
name.setTextContent("John Doe");
Element phone1 = doc.createElement("phone");
phone1.setAttribute("type", "home");
phone1.setTextContent("555-12345");
Element phone2 = doc.createElement("phone");
phone2.setAttribute("type", "work");
phone2.setTextContent("555-67890");
Element contact = doc.createElement("contact");
contact.appendChild(name);
contact.appendChild(phone1);
contact.appendChild(phone2);
Element contacts = doc.createElement("contacts");
contacts.appendChild(contact);
doc.appendChild(contacts);

Not really fluent in any way, and quite verbose as well.Things improve somewhat with JDOM, because it uses concrete classes rather than interfaces, so we don’t need the Document factory. Also, most methods conveniently return this, so you can chain them:

Document doc = new Document();
Element name = new Element("name").setText("John Doe");
Element phone1 = new Element("phone").setAttribute("type", "home").setText("555-12345");
Element phone2 = new Element("phone").setAttribute("type", "work").setText("555-67890");
Element contact = new Element("contact").addContent(name).addContent(phone1).addContent(phone2);
Element contacts = new Element("contacts").addContent(contact);
doc.addContent(contacts);

This is certainly less verbose, but is it fluent? Those two concepts don’t necessarily mean the same thing. An API can be very succinct, but that does not necessarily mean it is fluent. Quoting Martin Fowler: The [fluent] API is primarily designed to be readable and to flow. The example above doesn’t flow; the approach is quite linear, even though we are creating a hierarchal tree.

Compare this to the API of XLinq, the XML component of Microsoft’s LINQ effort1:

XElement contacts = 
   new XElement("contacts", 
      new XElement("contact", 
         new XElement("name", "John Doe"), 
         new XElement("phone", "555-12345",
new XAttribute("type", "home")), new XElement("phone", "555-67890", new XAttribute("type", "work")) ) );

Note that by indenting (and squinting a bit) the code to construct the XML tree shows the structure of the underlying XML. And this is due to the way the API was designed: clever use of .NET operator overloading, variable constructor arguments, and generics allow for this.

I really miss something like this in the Java space. So much, in fact, that I’m thinking about writing one myself. We don’t have operator overloading in Java, but we do have static imports. So that could mean we could end up with something along the lines of:

import static org.easydom.EasyDom.*;

Element contacts = element(name("contacts"), element(name("contact"), element(name("name"), "John Doe"), element(name("phone"), "555-1234", attribute(name("type"), "home")), element(name("phone"), "555-567890", attribute(name("type"), "work")) ) );

The idea is that the EasyDom object contains nothing but static methods, just like EasyMock does. The name() method, for instance, returns a QName, necessary to construct an element using element(), or an attribute using attribute(). I’m not sure about the name() requirement, dropping that would make it more consice. So far, so good.

There are two things stopping me from creating this EasyDom API:

  1. Does the world really need another DOM API? I thought about returning or wrapping W3C DOM, but that would mean that only the DOM creation API is fluent, and the end result still has weird behavior like using NodeList where a List<Element> would be nicer.
  2. Creating a DOM API is hard. Creating a fluent interface is even harder. I’ve got enough work already.
Leave a comment to change my mind :).

Update 20070514: Guillaume is right when he says that the Groovy Markup builder has a fluent interface. And on that note, Tareq Abed Rabbo has written a nice post about combining Groovy with Spring-WS. <hr/>

1 - If you don’t know anything about LINQ, read this. I wish we had something similar in the Java space.

Nobody cares about your stack trace!

Sunday, January 21st, 2007

Some Web service stacks offer an option to send any server-side stack trace over to the client, as part of a <faultDetail/> block. Axis 1 does it, and it seems like the JAX-WS reference also offers this feature. Presumably, this makes it easier to debug the service from the client side.

Let me explain why I think that there is a big difference between a SOAP Fault and an exception, and why I don’t offer this feature in Spring-WS:

Let’s say I call a public Web service to get a stock quote. Instead of the result, I might get a SOAP Fault as a reponse, which means something went wrong. In SOAP 1.1, a Fault contains four elements, the first two of which are required:

  1. a code (a qualified name)
  2. a string
  3. an actor, and
  4. a detail (which can contain any number of xml elements)
A Java or .NET Exception, on the other hand, has just two elements:
  1. a string message, and
  2. a stack trace
Notice the difference between the two? Of course, you can use the exception message as the fault string, and you can create some kind mapping between the exception class and the code, but that’s your mapping. There is no standard, interoperable way to write stack traces in faults, mostly because stack traces can only be found in Java, but not in C, for instance. And Web services are all about interoperability.

Back to the stock quote: imagine the Web service is not written in Java, but in C, which means we cannot send any stack trace as part of the fault. So what do we send in order to facilitate debugging on the client side? Do we attach the core dump ;)?

There’s a simple lesson here: it only makes sense to use Web service technology where the client isn’t interested to the language of the server. If you really want to see the stack trace on the client-side, use Java RMI. If you want to invoke a method on a class written in another language, use CORBA or Hessian. If you want to do XML messaging, use SOAP.

Domain Drivel

Tuesday, October 10th, 2006

Computer science tells us there are three ways to represent data:

  1. Object graphs (i.e. Java or C#)
  2. Relational data (RBMS)
  3. Hierarchal data (XML or HTML)

During the years, we’ve found out about the Impedance Mismatches between these three representations. At first, it seems pretty easy to convert a row in a database to an object. However, things are not so easy as they seem.

And yet, conversion between these representations is all we do in a standard Web application! Data is generally stored in relational databases, and converted to Java or .NET objects using some kind of ORM tool. Next, we use the objects to invoke business logic, and finally, we display them on an HTML page. I find this very amusing, it makes me feel we are doing something wrong here.

Recently, though the influence of books like Domain Driven Design, it has become fashionable to think that a rich Domain Model is one of the most important things there is. Well, that might be the case for you as an OO developer, but for the Enterprise the database is much more important. It is not without reason that a large number of databases outlive the applications built around them. And for the user of your Web application the most important part is the HTML UI he or she is looking at. Finally, in this SOA day and age, the OO business logic is probably not the only business logic that is there. Your application is part of a team of services, the business logic on the mainframe is just as important!

Now, I’m not saying that a rich domain model is not important. I’m just saying that — as with any solution — it is not a silver bullet. Use it when it gives an advantage, but don’t swear by it. In the grand scheme of things, your precious OO model is not as important as you think it is.