Archive for the ‘DDD’ Category

Transitory Domain Objects

Saturday, May 30th, 2009 by Sebastian Markbåge

A common problem with DDD is the injection of services to your domain model. Sometimes your domain relies on external services to do it’s job. You could do that by injecting your services directly to your entities using NHibernate Interceptors or ObjectStateManager for Entity Framework v4.

There are many design issues with the POCOness of Entities when you keep references to external services within the Entities themselves. The reference itself is (usually) infrastructure and not really a persistence concern.

Double Dispatch, Specifications and Services

The double dispatch pattern seems to be a popular approach. A better solution seem to be to move the logic in front of the Entites. Usually people seem to solve this by moving logic to services or even specifications.

Moving domain logic to services is a big no, no. That’s a gateway to anemic domain models and bloated service implementations. Services should be a last resort for external concerns and should probably have a solid anti-corruption layer.

The double dispatch pattern is a pain, ugly and introduces lots of references to services where the ubiquitous language doesn’t dictate it.

The specification pattern is particularly ugly because that’s (usually) not how a domain expert would refer to the issue. We are violating the ubiquitous language.

Transitory Domain Objects

Recently I’ve started introducing unpersisted classes to my domain models. If you think about it, many domain models have transitory terms and concerns that are not really persisted.

Imagine that your domain model consists of an archive of home photography. Let’s call them Photos. Now, you want to work with a couple of them. You pick out all the ones that have a red lavish hue and start organizing, labeling them or other operations. Now you have a set of Photos.

You could claim that it is a UI or Controller concern. Given the right bounded context, that set of photos IS A Domain Concern! Your domain could have domain specific restrictions and operations occurring on those sets of photos. You can think about them as a workspace or extended units of work.

Now this set isn’t persisted. It’s not an entity, it’s not a value object. Your entities can’t refer to it. This transitory logic lies infront of your entities. Since it’s transitory it also means that it can contain references to repositories and external services. It makes reference management much easier.

Now we can change out our specification and double dispatch patterns:

var redishPhotoList = photoRepository.Find(
  new HueSpecificiation(colorDetectorService, Color.Red)
);
foreach(var photo in redishPhotoList){
  //checks...
  photo.MarkWithMetaData("RED", metaDataService);
  //contraints...
}

To something more domain specific:

var photoSet = new PhotoSet(photoRepository, colorDetectorService, metaDataService);
photoSet.UsingOnly(Color.Red).MarkWithMetaData("RED");

We now have a domain object that we can easily pass around our application.

When you think about it you’re probably already using this pattern either as helpers or as “services”. But making the clear distinction that this is 1) A Domain Concern. 2) Temporary. Makes it easier to place your logic and apply constraints.

Achieving pure POCO is a pain from an infrastructure perspective but it’s worth it once it’s in place. I should be able to pass it to and from Db4O without any infrastructure concerns. Then you have a clear and solid domain model.

Large Object Storage for NHibernate – Part 2 – Storage Options

Sunday, March 29th, 2009 by Sebastian Markbåge

This is part 2 of a series describing Large Object Storage (BLOB) in a Domain Driven fashion. Be sure to read Part 1 about the new base classes introduced by this project.

Physical Storage Considerations

So, what are you options of storing large data objects in your relational database? This is actually not an easy problem to solve. Because a relational database is designed for small pieces of well structured data. Making a table, row or column too large will cause various problems with fragmenting, indexing and table scans.

Because of this, vendors have implemented data columns that store large data separately from the rest of the row. This typically means they can’t be used for indices or searches. They’re still internal to the RDMS and are fully covered by ACID transactions and backup procedures. You would typically keep the large object data on the same discs as the actual database itself. This can limit your overall performance and scalability. Additionally, the vendor API might support streaming for reading, while not supporting streaming for writing.

To remedy this situation vendors have come up with various ways of storing your data externally to your database (typically in a file system) while storing references to your data in the database and allowing you to access it and manage access control through your RDMS. This typically means that operations on these files are not covered by ACID transactions1 and backup procedures. External storage allows you to save disc space, since multiple rows and tables can share a reference to the same data file in a true denormalized fashion. Content-addressable storage (CAS) solutions are especially suitable for this kind of storage. You can use external storage with or without RDMS integration.

So to sum up your physical storage options:

  • In-table RDMS storage
  • Out-of-table RDMS storage
  • External storage using RDMS integration
  • External storage using NHibernate client

Because of these issues, various vendors have implemented more than one solution and there isn’t a consistent best-practise of working with large data. You have to chose the storage solution that is most appropriate for your particular requirements.

1 In this series I will only cover complete replacements of data rather than changes of data. This is done by exchanging one blob object for another as described in Part 1. Therefore ACID transactions on individual data changes aren’t going to be important for external storage. The entire blob will be written to storage. If the entire transaction succeeds, the reference will be changed. Otherwise the reference will remain at the old data.

Data Transfer Considerations

Accessing in-row data is usually sent with the rest of the data result of the query. This means that it is not typically viable for streaming because the entire row is always read in to memory.

For out-of-table storage some vendors doesn’t send large object data with the rest of the row result. That means that it can be requested and streamed in pieces. However, because ADO.NET doesn’t offer a requirement and API for this, it is usually done in vendor specific implementations. Some vendors require the data reader to remain open while reading the stream. This makes it unsuitable for NHibernate since we would like to work with our entities in a disconnected fashion. So in this case, we would have to query that row and column again to open the data connection when needed (lazy loading).

When external storage is used, only a reference to the data is sent with the query result. The actual data transfer is usually done over a protocol completely separate to the RDMS connection. Sometimes it isn’t even communicating with the same machine as the database. This makes it a very scalable solution. It will also allow us to open that connection and stream the data without querying the row and column of the database again.

Because of the inconsistent ways of accessing the data, we will addressing this at the client level in a vendor specific fashion. More on this in Part 3 – NHibernate Mappings.

Small In-Table Data Types

All RDMS has small in-row data types. In-row binary and text data. Such as VARBINARY(size) or VARCHAR(size). These are typically limited to around 4000-8000 bytes of data and are therefore not suitable for large objects. You would typically just map these to memory using byte[] and string. If you currently only have small amounts of data but expect it to scale, you can start off using one of these small data types and map it to Blob and Clob objects and then scale as you need it.

Large object storage options are highly vendor specific. I’ll cover a few common vendors.

Microsoft SQL Server

VARBINARY(MAX), VARCHAR(MAX) – These types is used to store in-table binary and text data at up to 2 GB. The practical performance and scalability limitations involved in storing data in-table usually means that you want to keep data in these columns to a few MB. Using the UPDATETEXT command in SQL Server you can write changes to the database in chunks.

XML(DOCUMENT), XML(CONTENT) – You can use the XML data type to store up to 2 GB of XML data per column. The same practical limitations as for VARBINARY and VARCHAR applies. You can specify either DOCUMENT or CONTENT to indicate whether the data has to comply to either a full XML document or an XML fragment. The Xlob base class allows for both complete documents and fragments.

IMAGE, TEXT and NTEXT – These data types are now deprecated and will be removed in future versions of SQL Server. Use VARCHAR(MAX) or VARBINARY(MAX) instead.

FILESTREAM – If your data is more more than 1 MB on average, you should consider the new FILESTREAM data type introduced in SQL Server 2008. It stores data out-of-table in the NTFS file system. The data size is limited only by the local NTFS file system. Each row that uses a FILESTREAM column must have UNIQUEIDENTIFIER. The FILESTREAM column is completely integrated with SQL Server, it’s backup facilities and it’s client software.

Microsoft SQL Remote Blob Storage (RBS) – Microsoft has introduced a new plug-in API for external storage used together with SQL Server. This will allow any storage solution provider to hook into Microsoft’s common API. It’s installed both on the server and the client. The server handles garbage collecting and manages the references to various BLOBs in the storage solution. This is a flexible and highly scalable solution and it integrates nicely into the SQL Server product. If you want to leave the external storage API to the client, read on to External Storage.

Oracle

BLOB, CLOB, NCLOB and XMLType – These are out-of-table data types for storing binary, text and XML data up to 4 GB. Oracle 10g and above supports up to 8 terabytes of storage depending on your CHUNK setting for the table. NCLOB stores text data in a Unicode national character set.

Oracles LOB types are all stored out-of-table and referenced using Lob locators. This makes them suitable for the disconnected environment used by NHibernate.

Oracle allows for XML operations to take place on the server which, in the future, could be used to speed up operations of a XmlReader generated by a Xlob.

LONG and LONG RAW – These data types are now deprecated. They can store 2 GB of data. Use CLOB or BLOB instead.

BFILE – Oracle has reference type that points to files on the local file system. You can read these files (up to 4GB) via the Oracle API. You can’t write to them though. You can change the reference to another file in the file system. So if you create a reference to an existing file using Blob.Create(“filepath”), the NHibernate mappings will be able to change out the reference to the new file. You can also open up a directory where NHibernate can store new files. In both cases, both the Oracle server and client will need access to this directory. BFILEs are an external storage solution. Oracle doesn’t handle write transactions, garbage-collecting of files nor backup procedures.

PostgreSQL

BYTEA, TEXT and XML – Used for in-table binary, text and XML data respectively. Current APIs doesn’t support streaming of these types. They will have to be read in to memory all at once.

TOAST – PostgreSQL normally stores it’s data in tuples of 8 kb which doesn’t allow the above data types to be very large. Using TOAST large columns are automatically stored out-of-table. It also has mechanisms for compressing data and trying to fit it in to rows if possible. TOAST isn’t it’s own data type but can be used to expand BYTEA, TEXT and XML columns to a maximum of 1 GB.

Large Objects – PostgreSQL supports the notion of Large Objects. These are stored out of table but within the management of the RDMS itself. Each new object is given it’s own ID and it is this ID that is referenced in the data tables. These objects are read and manipulated using a special API. Each object can be referenced several times and across tables. As far as I know it is not garbage-collected nor handled by backup solutions. So this solution can be compared to other external solutions even though it is managed by PostgreSQL itself. Since this solution shares objects for the entire database you will have to incorporate your own custom garbage collecting solution.

Large Objects are useful when you need to store data larger than 1 GB. The documented limit is 2 GB but in practice you can store files of several GB depending on the file system. The Large Object API will also allow you to stream the data instead of reading it all into memory. Therefore this is the preferred solution for storing large data on PostgreSQL.

MySQL

BLOB and TEXT – These columns are used to store binary and text data up to 4 GB. MySQL doesn’t have a column for XML data. TEXT is the recommended column type for XML. These columns are stored out-of-table but MySQL doesn’t support streaming of data. This means that each object will have to be read into memory in it’s entirety.

PrimeBase Technologies are currently working on a Blob streaming infrastructure over HTTP to be integrated into MySQL. It uses their XT storage engine.

For other storage engines, you will need to look to external storage.

External Storage

If you prefer to decouple your large object storage solution from the database you can use a completely external storage solution. In this case, you would store a reference to the data blob in your relational table. Usually as a fixed length binary or GUID/UUID. The data is stored in a completely external solution with no communication with the database. This makes this solution completely vendor independent and highly scalable.

The NHibernate.Lob client handles the communication with both the external storage solution as well as the database. Your client should on certain intervals (nightly?) let NHibernate.Lob scan all mapped tables for external references. It will then garbage collect the data blobs in the external storage that are no longer referenced.

The NHibernate.Lob project includes a common API for external storage solutions for use with NHibernate. Included is also a file-system based CAS storage option to get you started. High-end CAS solutions such as EMC’s Centera or Caringo’s CAStor are very suitable for this kind of storage if you have extreme scalability or accessibility needs. They’re also useful if you need to comply with local regulations that require you to never delete data.

Text and XML Types

Clob and Xlob are structured data since they have a specific format (Text and XML). These can be stored in various ways depending on your vendor’s specific data columns. Text can be stored in various different character sets. XML can be serialized as binary XML in storage or saved using various character sets. If your vendor does provide a specific Text or XML data type that is suitable for large objects I would recommend that you use it. This will allow the RDMS to handle the format and serialization constraints. Any compliant software can handle and display the data without further user interaction.

However, if you use external storage or want to utilize the various compression options mentioned in Part 5, you can store your Clob and Xlob data in any binary column as well. There by letting the client determine the serialization format.

Getting Started

The full source code to Calyptus.Lob and appropriate NHibernate mappings are available at our Calyptus.Lob project at GitHub.

More in This Series

In the next part of this series I’m going to describe how you can use the NHibernate.Lob project to map up these storage options to your Blobs, Clobs and Xlobs in your NHibernate Entities.

Part 1 – BLOBs, CLOBs and XLOBs

Part 2 – Storage Options

Part 3 – NHibernate Mappings

Part 4 – External Storage

Part 5 – Compression Options

Large Object Storage for NHibernate – Part 1 – BLOBs, CLOBs and XLOBs

Thursday, March 12th, 2009 by Sebastian Markbåge

This is the first in a series of posts describing the design considerations involved with storing Binary Large OBject (BLOB) data with NHibernate and how it led me to start a project I’m currently calling NHibernate.Lob.

Note that the samples here are focused mainly on NHibernate but the pattern can be applied to many different persistence models. I’m considering support for DB4o for example.

Lazy Streaming of Data

The typical way to store binary data in NHibernate entities would be as a byte[] array. After all the basic premise of NHibernate entities is that the data is stored in-memory in the first level cache. For smaller binary data this is just fine. We don’t even really our columns to be lazy.

If we start adding larger data the first problem one might notice is that the data is loaded every time the entity is loaded. This is a common question around NHibernate user groups. This can quite easily be solved by separating it out to a lazy loaded entity or using lazy columns.

If we add even larger files we start wasting precious memory. This is especially problematic in high concurrency applications and web applications. A (very) common scenario would be to store image data together with an entity. At this point we shouldn’t ever keep the entire file in memory. Instead we should stream the data piece by piece from the persistent storage to whatever we want to use it for.

At this point, the actual data is never stored in the in-memory entity. Only pointer data is stored about where to find the information. It goes beyond the concept of lazy loading since only a piece of the data in available in memory at any point.

Note that this is NOT really related to the concept of a document database. We’re talking about large serialized objects (500 kb+ if I had to give a number) such as images, videos or large document files.

I also mention the term: pointers. In the context of this article series I don’t mean memory pointers but rather a reference to where one can find the real complete data. This may be in-memory, on disk, remote or distributed etc.

Streaming Data Types – The Current State of ADO.NET

So what data type will we use as the pointer to this data? Our domain model is suppose to be persistence ignorant so one of the common .NET types would be nice. There are typically three common structured types of large data stored in modern databases: Raw binary data, Text and XML. Binary Large OBjects are typically called BLOBs. Text or Character Large OBjects are sometimes called CLOBs. How you store the data and in which column types is very RDMS provider specific. From now on I will call these three types as just LOBs.

Now, the in-memory types for these would typically be byte[], string and XmlDocument. The streamed versions would be Stream, TextReader and XmlReader. However, this gives us some problems. The contract of these three abstract classes are more than just pointers to where to get the data. They also contain the current reading position of the stream. This means that we can only read from that entity ONCE during it’s life time. They also implement IDisposable and keep a data reading connection open and expect to be closed and disposed of.

There’s really no common way for working with streamed data in ADO.NET since everything. In fact, the typical example for dealing with LOBs in ADO.NET involves reading the full data into memory using IDataReader.GetBytes(…). Some providers have supplied there own solutions to this issue (such as OracleLob and SqlBytes). The most common solution seems to be to inherit Stream in their custom solutions. You can still read it several times by first cloning the Lob object but it isn’t really a nice solution for a domain model. They also imply that the connection is already open. What we really need is a type from which we can create readers.

Thankfully our friends on the Java end of things have already thought about this. In Java there are Blob and Clob interfaces which fits just this purpose. They can create both reader and writer streams. It is also nicely implemented in both JDBC and Hibernate.

Another issue with TextReader and XmlReader is that we have no way to write to them but this is not really an issue as I will describe at the end of this article.

Introducing New Data Types – Blob, Clob and Xlob

So to remedy this situation I’ve suggested that three new base classes are added to our .NET domain models. The contracts of these are pretty simple.

namespace Calyptus.Lob
{
	public abstract class Blob
	{
		public abstract Stream OpenReader();
		public virtual void WriteTo(Stream output);
	}
 
	public abstract class Clob
	{
		public abstract TextReader OpenReader();
		public virtual void WriteTo(TextWriter writer);
		public virtual void WriteTo(Stream output, Encoding encoding);
	}
 
	public abstract class Xlob
	{
		public abstract XmlReader OpenReader();
		public virtual void WriteTo(TextWriter writer);
		public virtual void WriteTo(XmlWriter writer);
		public virtual void WriteTo(Stream output, Encoding encoding);
	}
}

Basically there’s a PULL and a PUSH method to get the data from LOB. The WriteTo methods are NOT away to write data to the LOBs. It’s a way to PUSH the data from the LOB into a writer.

Why an abstract base class instead of an interface? This is a common debate in .NET. But since this pattern is overwhelmingly used most often in the .NET Framework (Stream, TextReader and XmlReader are a few examples) I figured it’d be best to keep that trend. It also allows for virtual methods to be added later (such as Java’s getBytes, position and length) without recompilation of inheritors.

I’m sure that these contracts are going to be very much debated since it involves the core of the domain model which, in the NHibernate world, should be persistence ignorant. You can still easily switch out the ORM and let that ORM handle these new types. The best would be if Microsoft’s Patterns and Practises team introduced these new types as a common practise and perhaps even into a System.Data.Lobs namespace.

Some of you may be thinking that this pattern makes the domain model aware of it’s repository. But it really doesn’t. No more than lazy loaded entities and collection does. You can even save it to another repository. More on that later.

Writing to Blobs – Don’t

You may have noticed that unlike the Java interface I didn’t put any way to write to the LOBs in the base contract. This is because you shouldn’t persist anything until a Flush (or SaveChanges) style event. If it’s a new entity, the row doesn’t exists and there may not be anything to write the data to. It could also not even be part of the row. It may be stored as it’s own “entity” and shared by multiple other entities. In this case you would override their data. Data should be written all together in an atomic manner.

So how do I change the data? You replace the LOB pointer (the Blob, Clob or Xlob objects) with something that points to some other data source with the new data. This can be from a file, a stream, memory, or maybe a custom implementation which combines or converts data on-the-fly. This will allow you to build pipelining patterns. It can even be an other LOB in your database. NHibernate will tell your LOB object when and where to write itself to.

This is also the same way Hibernate handles Blob and Clob in Java. It actually throws exceptions if you try to write to it’s Blobs or Clobs.

Finally Some Code

Let’s start by defining a domain model. Let’s just stick to one single entity called Product. With a binary image file, a long description text and an XML file which contains further specifications.

public class Product
{
	public int ID { get; set; }
	public string Title { get; set; }
	public Blob Image { get; set; }
	public Clob Description { get; set; }
	public Xlob Specifications { get; set; }
}

To read from these three LOBs you would use either the PULL or PUSH patterns (OpenReader or WriteTo). The following sample fetches a Product from the database. It then writes the image data to a HttpResponse. Then it writes the specifications to disk using a custom XmlWriter. Finally it reads the first line of the description.

using (ISession session = sessionFactory.OpenSession())
{
	Product product = session.Get<Product>(100);
 
	Response.Clear();
	Response.ContentType = "image/jpeg";
	Response.BufferOutput = false;
	product.Image.WriteTo(Response.OutputStream);
 
	using (XmlWriter writer = XmlWriter.Create(@"C:\MyFiles\SomeData.xml"))
	{
		product.Specifications.WriteTo(writer);
	}
 
	using (TextReader reader = product.Description.OpenReader())
	{
		string firstLine = reader.ReadLine();
	}
}

Changing the LOB data involves replacing the instance with another one. You can do this by using one of the built-in implementations using the static overloaded Blob.Create(), Clob.Create() and Xlob.Create() methods. The data can come from files, streams, memory, the web or your own implementations. You could for example create your own implementation which combines two files into one on the fly as it is written to the database.

The following sample loads a product from the database, replaces the image with a file from disk, replaces the description with an in-memory string, replaces the specifications with one from the web and then saves it all to the database.

using (ISession session = sessionFactory.OpenSession())
using (ITransaction transaction = session.BeginTransaction())
{
	Product product = session.Get<Product>(100);
	product.Image = Blob.Create(@"C:\MyFolder\MyImage.jpg");
	product.Description = Clob.Create("My short description.");
	product.Specifications = Xlob.Create(new Uri("http://domain/document.xml"));
	transaction.Commit();
}

Note that in the above sample it’s not a reference to the file and the web that is stored in the database. The actual data is read and stored. Depending on your application, the use of WebRequests could be prohibited or a potential security issue to load unknown remote XML documents.

Note that you can also use the implicit casting of the LOB types to implicitly cast some known types. There are also Blob.Empty, Clob.Empty and Xlob.Empty singletons that you can use to insert empty data. This is not null. The following sample implicitly casts a Stream to a Blob, a String to a Clob and removes the product’s specification by replacing it with an empty one.

using (ISession session = sessionFactory.OpenSession())
using (ITransaction transaction = session.BeginTransaction())
{
	Product product = session.Get<Product>(100);
	product.Image = Request.Files["uploadedImage"].InputStream;
	product.Description = Request.Form["description"];
	product.Specifications = Xlob.Empty;
	transaction.Commit();
}

If you prefer a less anemic domain model you could keep the LOB internal to the class and do reads and writes with custom logic.

public class Product
{
	public int ID { get; set; }
	private Blob image;
 
	public void ChangeImage(Stream input)
	{
		this.image = input;
	}
 
	public void CopyImageFrom(Product product)
	{
		this.image = product.image;
	}
 
	public void WriteImageTo(Stream output)
	{
		this.image.WriteTo(output);
	}
}

Note that the stream used in ChangeImage() will not be read and disposed of until your Session is Flushed. Depending on your application design this pattern may not be useful.

In the current version, the StreamBlob class which wraps a Stream as a Blob can only be read once if the Stream is not seekable. Therefore each instance can only be saved to one entity and not reused. In future versions it may replace the internal stream pointer to the one in the repository once the first one is saved. The same goes for the TextReader and XmlReader wrappers.

Getting Started

The full source code to Calyptus.Lob and appropriate NHibernate mappings are available at our Calyptus.Lob project on GitHub. I’ll make some official builds once the code stabilizes.

Coming up

Part 2 – Storage Options

Part 3 – NHibernate Mappings

Part 4 – External Storage

Part 5 – Compression Options