Large Object Storage for NHibernate – Part 1 – BLOBs, CLOBs and XLOBs
Thursday, March 12th, 2009 by Sebastian MarkbågeThis is the first in a series of posts describing the design considerations involved with storing Binary Large OBject (BLOB) data with NHibernate and how it led me to start a project I’m currently calling NHibernate.Lob.
Note that the samples here are focused mainly on NHibernate but the pattern can be applied to many different persistence models. I’m considering support for DB4o for example.
Lazy Streaming of Data
The typical way to store binary data in NHibernate entities would be as a byte[] array. After all the basic premise of NHibernate entities is that the data is stored in-memory in the first level cache. For smaller binary data this is just fine. We don’t even really our columns to be lazy.
If we start adding larger data the first problem one might notice is that the data is loaded every time the entity is loaded. This is a common question around NHibernate user groups. This can quite easily be solved by separating it out to a lazy loaded entity or using lazy columns.
If we add even larger files we start wasting precious memory. This is especially problematic in high concurrency applications and web applications. A (very) common scenario would be to store image data together with an entity. At this point we shouldn’t ever keep the entire file in memory. Instead we should stream the data piece by piece from the persistent storage to whatever we want to use it for.
At this point, the actual data is never stored in the in-memory entity. Only pointer data is stored about where to find the information. It goes beyond the concept of lazy loading since only a piece of the data in available in memory at any point.
Note that this is NOT really related to the concept of a document database. We’re talking about large serialized objects (500 kb+ if I had to give a number) such as images, videos or large document files.
I also mention the term: pointers. In the context of this article series I don’t mean memory pointers but rather a reference to where one can find the real complete data. This may be in-memory, on disk, remote or distributed etc.
Streaming Data Types – The Current State of ADO.NET
So what data type will we use as the pointer to this data? Our domain model is suppose to be persistence ignorant so one of the common .NET types would be nice. There are typically three common structured types of large data stored in modern databases: Raw binary data, Text and XML. Binary Large OBjects are typically called BLOBs. Text or Character Large OBjects are sometimes called CLOBs. How you store the data and in which column types is very RDMS provider specific. From now on I will call these three types as just LOBs.
Now, the in-memory types for these would typically be byte[], string and XmlDocument. The streamed versions would be Stream, TextReader and XmlReader. However, this gives us some problems. The contract of these three abstract classes are more than just pointers to where to get the data. They also contain the current reading position of the stream. This means that we can only read from that entity ONCE during it’s life time. They also implement IDisposable and keep a data reading connection open and expect to be closed and disposed of.
There’s really no common way for working with streamed data in ADO.NET since everything. In fact, the typical example for dealing with LOBs in ADO.NET involves reading the full data into memory using IDataReader.GetBytes(…). Some providers have supplied there own solutions to this issue (such as OracleLob and SqlBytes). The most common solution seems to be to inherit Stream in their custom solutions. You can still read it several times by first cloning the Lob object but it isn’t really a nice solution for a domain model. They also imply that the connection is already open. What we really need is a type from which we can create readers.
Thankfully our friends on the Java end of things have already thought about this. In Java there are Blob and Clob interfaces which fits just this purpose. They can create both reader and writer streams. It is also nicely implemented in both JDBC and Hibernate.
Another issue with TextReader and XmlReader is that we have no way to write to them but this is not really an issue as I will describe at the end of this article.
Introducing New Data Types – Blob, Clob and Xlob
So to remedy this situation I’ve suggested that three new base classes are added to our .NET domain models. The contracts of these are pretty simple.
namespace Calyptus.Lob { public abstract class Blob { public abstract Stream OpenReader(); public virtual void WriteTo(Stream output); } public abstract class Clob { public abstract TextReader OpenReader(); public virtual void WriteTo(TextWriter writer); public virtual void WriteTo(Stream output, Encoding encoding); } public abstract class Xlob { public abstract XmlReader OpenReader(); public virtual void WriteTo(TextWriter writer); public virtual void WriteTo(XmlWriter writer); public virtual void WriteTo(Stream output, Encoding encoding); } } |
Basically there’s a PULL and a PUSH method to get the data from LOB. The WriteTo methods are NOT away to write data to the LOBs. It’s a way to PUSH the data from the LOB into a writer.
Why an abstract base class instead of an interface? This is a common debate in .NET. But since this pattern is overwhelmingly used most often in the .NET Framework (Stream, TextReader and XmlReader are a few examples) I figured it’d be best to keep that trend. It also allows for virtual methods to be added later (such as Java’s getBytes, position and length) without recompilation of inheritors.
I’m sure that these contracts are going to be very much debated since it involves the core of the domain model which, in the NHibernate world, should be persistence ignorant. You can still easily switch out the ORM and let that ORM handle these new types. The best would be if Microsoft’s Patterns and Practises team introduced these new types as a common practise and perhaps even into a System.Data.Lobs namespace.
Some of you may be thinking that this pattern makes the domain model aware of it’s repository. But it really doesn’t. No more than lazy loaded entities and collection does. You can even save it to another repository. More on that later.
Writing to Blobs – Don’t
You may have noticed that unlike the Java interface I didn’t put any way to write to the LOBs in the base contract. This is because you shouldn’t persist anything until a Flush (or SaveChanges) style event. If it’s a new entity, the row doesn’t exists and there may not be anything to write the data to. It could also not even be part of the row. It may be stored as it’s own “entity” and shared by multiple other entities. In this case you would override their data. Data should be written all together in an atomic manner.
So how do I change the data? You replace the LOB pointer (the Blob, Clob or Xlob objects) with something that points to some other data source with the new data. This can be from a file, a stream, memory, or maybe a custom implementation which combines or converts data on-the-fly. This will allow you to build pipelining patterns. It can even be an other LOB in your database. NHibernate will tell your LOB object when and where to write itself to.
This is also the same way Hibernate handles Blob and Clob in Java. It actually throws exceptions if you try to write to it’s Blobs or Clobs.
Finally Some Code
Let’s start by defining a domain model. Let’s just stick to one single entity called Product. With a binary image file, a long description text and an XML file which contains further specifications.
public class Product { public int ID { get; set; } public string Title { get; set; } public Blob Image { get; set; } public Clob Description { get; set; } public Xlob Specifications { get; set; } } |
To read from these three LOBs you would use either the PULL or PUSH patterns (OpenReader or WriteTo). The following sample fetches a Product from the database. It then writes the image data to a HttpResponse. Then it writes the specifications to disk using a custom XmlWriter. Finally it reads the first line of the description.
using (ISession session = sessionFactory.OpenSession()) { Product product = session.Get<Product>(100); Response.Clear(); Response.ContentType = "image/jpeg"; Response.BufferOutput = false; product.Image.WriteTo(Response.OutputStream); using (XmlWriter writer = XmlWriter.Create(@"C:\MyFiles\SomeData.xml")) { product.Specifications.WriteTo(writer); } using (TextReader reader = product.Description.OpenReader()) { string firstLine = reader.ReadLine(); } } |
Changing the LOB data involves replacing the instance with another one. You can do this by using one of the built-in implementations using the static overloaded Blob.Create(), Clob.Create() and Xlob.Create() methods. The data can come from files, streams, memory, the web or your own implementations. You could for example create your own implementation which combines two files into one on the fly as it is written to the database.
The following sample loads a product from the database, replaces the image with a file from disk, replaces the description with an in-memory string, replaces the specifications with one from the web and then saves it all to the database.
using (ISession session = sessionFactory.OpenSession()) using (ITransaction transaction = session.BeginTransaction()) { Product product = session.Get<Product>(100); product.Image = Blob.Create(@"C:\MyFolder\MyImage.jpg"); product.Description = Clob.Create("My short description."); product.Specifications = Xlob.Create(new Uri("http://domain/document.xml")); transaction.Commit(); } |
Note that in the above sample it’s not a reference to the file and the web that is stored in the database. The actual data is read and stored. Depending on your application, the use of WebRequests could be prohibited or a potential security issue to load unknown remote XML documents.
Note that you can also use the implicit casting of the LOB types to implicitly cast some known types. There are also Blob.Empty, Clob.Empty and Xlob.Empty singletons that you can use to insert empty data. This is not null. The following sample implicitly casts a Stream to a Blob, a String to a Clob and removes the product’s specification by replacing it with an empty one.
using (ISession session = sessionFactory.OpenSession()) using (ITransaction transaction = session.BeginTransaction()) { Product product = session.Get<Product>(100); product.Image = Request.Files["uploadedImage"].InputStream; product.Description = Request.Form["description"]; product.Specifications = Xlob.Empty; transaction.Commit(); } |
If you prefer a less anemic domain model you could keep the LOB internal to the class and do reads and writes with custom logic.
public class Product { public int ID { get; set; } private Blob image; public void ChangeImage(Stream input) { this.image = input; } public void CopyImageFrom(Product product) { this.image = product.image; } public void WriteImageTo(Stream output) { this.image.WriteTo(output); } } |
Note that the stream used in ChangeImage() will not be read and disposed of until your Session is Flushed. Depending on your application design this pattern may not be useful.
In the current version, the StreamBlob class which wraps a Stream as a Blob can only be read once if the Stream is not seekable. Therefore each instance can only be saved to one entity and not reused. In future versions it may replace the internal stream pointer to the one in the repository once the first one is saved. The same goes for the TextReader and XmlReader wrappers.
Getting Started
The full source code to Calyptus.Lob and appropriate NHibernate mappings are available at our Calyptus.Lob project on GitHub. I’ll make some official builds once the code stabilizes.
Coming up
Part 3 – NHibernate Mappings
Part 4 – External Storage
Part 5 – Compression Options
Tags: Blob, Clob, Domain Driven Design, Lazy, NHibernate, Streaming, Xlob
August 20th, 2009 at 03:06
Thanks for the article. It shows exactly what we want to do. But how to get the source code of Calyptus.Lob? The SVN-Link doesn’t work. :-(
August 20th, 2009 at 15:53
Ah, I forgot to update the link on this page. The project has now moved to GitHub. You can find it at http://github.com/calyptus/calyptus.lob/ or clone it using git://github.com/calyptus/calyptus.lob.git
There are some instructions on how to write your mapping files coming.
November 20th, 2010 at 14:41
your code is incomplete, geeze, take this off the web as it’s messes up my google searchs.
November 24th, 2010 at 17:46
Yeah, I have this requirement as well and this is of no help since the articles were never finished, nor is the code complete. Sort of useless.
April 28th, 2011 at 07:28
I reckon something really special on this site.
May 7th, 2011 at 21:01
Can I simply say what a reduction to find someone who truly knows what theyre speaking about on the internet. You positively know how one can carry an issue to gentle and make it important. More people have to learn this and perceive this aspect of the story. I cant imagine youre no more in style since you definitely have the gift.
June 18th, 2011 at 18:35
I’m always enjoying quality posts such this one. Please, please write more on this subject!
March 27th, 2012 at 09:11
hey there and thank you for your info – I’ve certainly picked up something new from right here. I did however expertise a few technical points using this web site, as I experienced to reload the site a lot of times previous to I could get it to load properly. I had been wondering if your hosting is OK? Not that I’m complaining, but sluggish loading instances times will sometimes affect your placement in google and can damage your high quality score if ads and marketing with Adwords. Well I am adding this RSS to my email and can look out for much more of your respective exciting content. Ensure that you update this again very soon..
January 30th, 2013 at 13:19
With the intention of terror may possibly attain every one of punish on the contrary few.
January 30th, 2013 at 13:24
I have recently ongoing a web position, the info you bestow proceeding this web locate has helped me tremendously. Gratitude for all of your calculate & work. “A physicist is an atom’s way of conscious regarding atoms.” by George Wald.
January 30th, 2013 at 13:40
There are certainly a lot of particulars like that to take into consideration. That is a nice point to carry up. I provide the thoughts above as general inspiration but clearly there are questions just like the one you deliver up the place the most important factor might be working in trustworthy good faith. I don?t know if best practices have emerged around issues like that, however I am certain that your job is clearly recognized as a good game. Each girls and boys really feel the impact of just a second’s pleasure, for the remainder of their lives.
January 30th, 2013 at 18:16
There are actually a whole lot of details like that to take into consideration. That could be a great level to carry up. I provide the ideas above as general inspiration but clearly there are questions just like the one you carry up where the most important thing will likely be working in trustworthy good faith. I don?t know if best practices have emerged round issues like that, but I’m sure that your job is clearly recognized as a fair game. Each boys and girls really feel the impression of only a moment’s pleasure, for the remainder of their lives.
January 30th, 2013 at 18:30
Nice post. I study something more difficult on totally different blogs everyday. It will at all times be stimulating to learn content from different writers and observe a little one thing from their store. I’d want to use some with the content on my blog whether or not you don’t mind. Natually I’ll provide you with a link on your internet blog. Thanks for sharing.
January 30th, 2013 at 18:56
Youre not the complete blog article journalist, staff. You completely have something important to bring about the web. Such a fantastic blog. In poor health homecoming yet again pertaining to further.
January 30th, 2013 at 19:34
Oh my goodness! an amazing article dude. Thank you Nonetheless I’m experiencing problem with ur rss . Don’t know why Unable to subscribe to it. Is there anybody getting equivalent rss problem? Anybody who is aware of kindly respond. Thnkx
January 30th, 2013 at 19:52
Hi, polite blog
January 31st, 2013 at 00:24
Together along with everything which seems that they are building within this distinct concern be important, a huge percentage regarding details of view are generally rather radical. Nonetheless, I appologize, nevertheless My partner and i can undoubtedly not subscribe on behalf of your full aura, every whether it is refreshing probably none the fewer. It looks to ensue intelligent to everybody that your distinct opinions are commonly not necessarily completely rationalized along with in statement an distinctive are typically your do-it-yourself not totally confident connecting your point. All the rage more or less a few incident Used to achieve be thankful for conception the item.
January 31st, 2013 at 03:11
Hi, nice blog