.comment-link {margin-left:.6em;}

Solving a puzzle intended to expand the mind.

Sunday, July 31, 2005

Truly abstracting a persistence mechanism

The initial design that I used when I made the ancestor of norm was based upon designs by Scott Ambler - The initial intent of Ambler's designs was definitely to provide an abstraction around the logic of a relational database. What I want to do with norm is to abstract the very notion of a data structure. When we persist an object into a row on a RDB it is almost irrelevant that the data is persistent. Of course its not truly irrelevant, else why bother at all? What I mean is that the persistence of the data store in an RDB is irrelevant - it's the concepts that are used that make ORM such a complicated enterprise, the notion of a relationship is subtly different from that used in the object domain. Therein lies the "object/relational impedance mismatch" that Ambler identifies in his hugely influential paper, the design of a robust persistence layer for relational databases.

As you can see the persistence mechanism is deliberately kept very simple, since there is little in the way of overlap (conceptually) between the APIs for a flat file persistence store and a relational database. In fact the notion of connection means different things in each mechanism.

So what I'm after is a way to bridge the gap between persistence stores so that mappings can be made between two different object models as easily as they can between object models and relational databases. What I'm wondering is whether Ambler's model is the right one to use for such an abstraction. My first task is to purge any domain pollution from the mapping system, the transaction manager and the command system.

My initial system was a very close parallel to Ambler's designs. But now I'm looking to diverge in the hope of a cleaner conceptual model. What most ORM systems do is to define a invertible function between the object domain and the relational domain. I propose to do the same thing, but I want to do it in a non-explicit way.

Normally the mapping is done by enumerating the domain set (the object domain normally), enumerating the range set (the relational model) and then defining the mappings between them. If you look closely at the mapping file formats for persistence mechanisms such as Torque, Hibernate, ObjectSpaces and norm's predecessor, they all followed this idiom using XML configuration files to define the mapping, and an in-memory model to serve as a runtime aid to the persistence broker to build its SQL commands.

That has to be the way of doing it ultimately, but I wonder whether we can't define the mapping in another way, rather like the difference between defining a set using an enumeration of all of its members or through the definition of a generative function that maps onto the set. i.e. Rather than say:

x = {2, 4, 6, 8}

we could say

x = {2i where i > 0 & i < 5}

It's definitely more complicated than explicitly enumerating the mappings, but might enable the easy solution of mappings in the case of inheritance where there are several solutions that all work.

To do this conceptual mappings we need to work out what the key abstractions that define the mapping functions:

  • whole/part relationship
  • complex type
  • association
  • CRUD operation
  • is-a relationship

Each of these things are present in every representation that I am considering. They exist in RDBMSs, object databases, and XML documents (i.e. a flat file, kinda:) But how they are represented and realized is vastly different between each of these technologies.

I wonder, and that's all I've done so far, whether if we defined how the underlying concept is laid out in each case we could do the mapping by specifying how that meaning was projected onto the concepts of the problem domain. Maybe I could perform the mapping by saying that this groups of concepts is achieved using this kind of mapping, and maybe the ORM can deduce the rest.

Of course, proper naming strategies in each domain dictate different names, and they are seldom held to consistently, so short of naming the attributes exhaustively there is no way of doing the mapping. So is it worth my time? Or am I just proposing a slight change of terminology so as not to give away the format of the persistence mechanism?

Tuesday, July 19, 2005

And we're off

I have set up the sourceforge project. You can find it here. I've also classified all of the work, and split it up into releases. Here's what will go into release one:

Configuration Use native .NET configuration
Configuration Remove existing config assembly
Installers WIX installers
Runtime Control Add transactional support from COM+
Runtime Control Extend reverse engineering to examine SPs and create wrappers for them.
Runtime Control Configurable ID strategy
Runtime Control Configurable transaction isolation policy
Templates Move core templates into resource assembly
Testing Create a proper test database
Runtime Control Divide system between runtime and development projects
Runtime Control Standardise all names to CamelCase
I think the highest priority is the configuration rework. Configuration in the previous system was way too complicated. What we need is a very simple, very reliable system that can easily be expanded to accommodate something like the config app block at a later date. As soon as that is done, the key task will be converting it from its current broken state to a working state, and then splitting the system up into runtime and development arms. I will also do some work towards creating WIX installers for the runtime and development systems, including an installer for packaging source releases, that will allow the easy setup of a development environment for new volunteers on the project. This is of course based on the "if you build it, they will come" model of open source development.

Saturday, July 16, 2005

First Task - What to do?

I've posted a whole bunch (well 45) bugs on the GotDotNet bug tracker for enhancements that nORM should not have to live too long without. Some of them ought to be fairly easy to deliver, like changing method names to have a clean and consistent format. Others are a bit more of a challenge - like adding persistence support for XML documents and reverse engineering an entity model from a schema file. I think that delivering these bits of functionality will take quite a while. If each of these things took an average of a week to do (which is optimistic!) It would take me about a year to deliver all of the enhancements. But by the end of that - the system will kick some serious arse! The BugList/WishList page is here. Feel free to make suggestions if you can think of anything I've missed.

Thursday, July 14, 2005

Welcome to nORM

This brand new blog is where I (Andrew Matthews) and any other volunteers that join the project will post on issue involved in the production of nORM. As the name suggests nORM is another .NET ORM system. The open source market seems to be flooded with them, so why release another? nORM started out life as a piece of infratsructure for a project I did a few years back when there wasn't anything worth having in the .NET market. I had just finished doing some Java development for a client using Torque and had found it such a labour saving device that I didn't feel I could live without it. When I moved over to C# I couldn't find anything comparable so I developed one based on the designs publicised by Scott Ambler. The ORM had a very powerful and extensible code generation system that allowed me to generate everything from SQL statements, entities and remoting specifications to basic datagrids for ASP.NET. All in all I was very impressed with what it did, but even more enthused with what was possible in a productised system. In a subsequent project for Avanade and LloydsTSB share Registrars I was involved in choosing an ORM system that would see really _huge_ transaction rates. As an exercise, and not really expecting it to rate very highly, I included my old ORM system in the comparison, and was pleasantly surprised to find that rather than being a rank outsider, it came out just shy of the leaders in my comparisons. I didn't use it, because of the lack of support (I have just moved to the other side of the world!) but it made me realise that nORM could be a world beating system that could if productised be an essential tool in any .NET developers arsenal of APIs. The core code generation system is potentially usable in a variety of other applications as well - I used it to great effect in AabsDbc (http://www.sf.net/projects/aabsdbc). So I decided that I don't have time to do it all by myself (I have twins on the way, and can't see myself having the brain-power to get much coding done in my spare time for a year or so) so I'm opening it up to the world, in the hope that there are people out there who DO have the time, and interest to get into developing a good ORM system. God knows, there's plenty yet to do!