The .NET class System.Data.DataSet is an implementation of the Table Module Enterprise Pattern. As such, it represents (part of) an in-memory dump of a database, including tables, rows, relations, etc. The idea is that you manipulate this dump, and send an update to the database using a DataAdapter. Thus, in .NET the classic three tier enterprise application looks something like this:
- SQL Server
- Business Layer
- Web application
The second layer uses stored procedures in the first layer to store data in a dataset, which then gets passed on to the third layer. The data is manipulated, and send back to the second layer, which sends the changes back to the database. Seems to be ok, right?
Not quite. There are two things wrong with this picture. Firstly, the dataset that is being manipulated does not represent the domain of the application at all. Instead, it represents an object-approach to a RDBMS: first, you define a DataSet with DataTable containing DataColumns, then you can add new DataRows to the table, perform a query, or load it with data from the database:
DataSet dataSet = new DataSet("pubs");
DataTable titles = dataSet.Tables.Add("titles");
titles.Columns.Add("titleid", typeof(string));
titles.Columns.Add("title", typeof(string));
titles.Columns.Add("pubid", typeof(int));
titles.Columns.Add("price", typeof(double));
DataTable publishers =
dataSet.Tables.Add("publishers");
publishers.Columns.Add("pub_id", typeof(int));
publishers.Columns.Add("name", typeof(string));
dataSet.Relations.Add(titles.Columns["pubid],
publishers["pubid"]);
DataRow titleRow = titlesTable.NewRow();
titleRow["titleid"] = "aaop";
titleRow["title"] = "The Ancient Art of Programming";
titleRow["pubid"] = 34;
titleRow["price"] = 24.95;
...
You get the idea: tedious and error-prone (though you could replace all literal strings with constants).
The Typed DataSet is a strong-typed version of the above, and improves the situation a bit, though not much. You will end up with something like:
PubsDataSet pubs = new PubsDataSet(); TitleRow titleRow = pubs.Titles.NewTitlesRow(); titleRow.TitleId = "aaop"; titleRow.Title = "The Ancient Art of Programming"; titleRow.PubId = 34; titleRow.Price = 24.95; ...
Though the string-based indexers are now nice properties, we are still manipulating tables and rows. I don’t want to manipulate rows, I want to manipulate domain objects which are eventually stored in database rows and tables.
Another problem with the dataset is that the layout of the database is not being abstracted; in fact the layout is pushed across all three tiers. This means that, if you change the database, you will have to change the business layer and web layer too.
I guess that these two reasons make me such a big fan of ORM tools such as Neo or NHibernate.