The Reactive Extensions for JavaScript – MooTools Integration

March 7th, 2010 by Sebastian Markbåge

This is a follow up to my earlier post about the Reactive Extensions (Rx) for JavaScript by Microsoft’s DevLabs. This is also in response to Matthew Podwysocki’s post on jQuery integration (which deserves some credit for putting it out there).

I will assume some familiarity with Rx.

Just like any other DOM library, MooTools has a way of working with native and custom passive DOM events. We can easily give Element object and the Elements collection a method to provide these events as “Observables”. In the jQuery example the method name “ToObservable” was added to the jQuery object, accepting an event type parameter, which was my initial reaction as well. But I’m going to call mine getEvent as in “getting a stream of events given the event type“.

var observableFromEvent = function(type){
  var self = this;
  return Rx.Observable.Create(function(observer){
      var fn = function(event){
          observer.OnNext(event);
      };
      self.addEvent(type, fn);
      return function(){
        self.removeEvent(type, fn);
      };
  });
};
 
Window.implement('getEvent', observableFromEvent);
Document.implement('getEvent', observableFromEvent);
Element.implement('getEvent', observableFromEvent);
Elements.implement('getEvent', observableFromEvent);

These are infinite Observables but we could also make .destroy() trigger onComplete to make them finite as well.

Flickables Example

Instead of the canonical Drag and Drop example I thought I show a twist. Let’s say we want to listen to a mouse flick. The mouse position have to move over 100px in 200ms. Then we want the angle of the flick.

var angleFromPosition = function(position, center){
    var diffX = position.x - center.x, diffY = position.y - center.y;
    var distance = Math.sqrt( Math.pow(diffX, 2) + Math.pow(diffY, 2) );
    var angle = (2 * Math.atan2(diffY + distance, diffX)) * 180 / Math.PI;
    return { distance: distance, angle: angle };
};
 
var distanceReached = function(angle){ return angle.distance > 100; };
 
var timeLimit = Rx.Observable.Timer(200);
 
var mousePositions = document.getEvent('mousemove')
                     .Select(function(event){ return event.page; });
 
var flicks = document.getElements('.flickable')
             .getEvent('mousedown')
             .SelectMany(function(event){
                 return mousePositions
                     .Select(angleFromPosition.bindWithEvent(null, event.page))
                     .TakeUntil(document.getEvent('mouseup'))
                     .TakeUntil(timeLimit)
                     .Where(distanceReached)
                     .Take(1);
             });
 
// ...
 
flicks.Subscribe(function(current){
    console.log('Flicked in direction: ' + current.angle + '°');
});

Events Mixin

MooTools has a very strong benefit compared to many other libraries. The publish/subscribe pattern is made explicit even for custom classes, using the Events mixin. By implement our “getEvent” method on this class we can use Rx on all custom MooTools classes that provide passive events.

Events.implement('getEvent', observableFromEvent);

Side-effects

Rx allows for the act of subscribing to an event to trigger an action/side-effect. Think of the Request object for example. You can use the act of subscribing to it, to issue a HTTP request. Then we can turn the subsequent events like success and failure into the Observable interface. This means that Request is a complete Observable in it self. This is what I was saving the conversion name toObservable for.

Request.implement({
 
  toObservable: function(){
    var self = this;
    return Rx.Observable.create(function(observer){
 
      var listeners = {
 
        success: function(result){
          self.removeEvents(listeners);
          observer.OnNext(result);
          observer.OnCompleted();
        },
 
        cancel: function(){
          self.removeEvents(listeners);
          observer.OnCompleted();
        },
 
        failure: function(xhr){
          self.removeEvents(listeners);
          observer.OnError(xhr);
        }
 
      };
 
      if (!self.running || self.options.link == 'cancel'){
        self.addEvents(listeners).send();
        return function(){
          self.removeEvents(listeners).cancel();
        };
      }
 
      if (self.options.link == 'chain'){
        var disposed, running;
        self.chain(function(){
          running = true;
          if (!disposed) self.addEvents(listeners).send();
        });
        return function(){
          if (running) self.removeEvents(listeners).cancel();
          disposed = true;
        };
      }
 
      observer.OnComplete();
      return function(){};
 
    });
  }
 
});

This creates a finite stream of events – only one response to be exact. However, since the act of subscribing to it causes it to occur we can have it trigger repeatedly as part of a composite stream of events.

MooTools’ Fx provides a similar concept but slightly different. Even though we don’t get an event for each tick, we still get an asynchronous complete event. This means we can insert Fx as part of a composite stream of events.

Fx also requires from/to arguments to be passed at the start. So we add the option “defaultArgs” to allow us to pass those at initialization.

Fx.implement({
 
  toObservable: function(){
    var self = this;
    return Rx.Observable.create(function(observer){
 
      var listeners = {
 
        complete: function(){
          self.removeEvents(listeners);
          observer.OnCompleted();
        },
 
        cancel: function(){
          self.removeEvents(listeners);
          observer.OnCompleted();
        }
 
      };
 
      if (!self.running || self.options.link == 'cancel'){
        self.addEvents(listeners).start.run(self.options.defaultArgs, self);
        return function(){
          self.removeEvents(listeners).cancel();
        };
      }
 
      if (self.options.link == 'chain'){
        var disposed, running;
        self.chain(function(){
          running = true;
          if (!disposed)
            self.addEvents(listeners).start.run(self.options.defaultArgs, self);
        });
        return function(){
          if (running) self.removeEvents(listeners).cancel();
          disposed = true;
        };
      }
 
      observer.OnComplete();
      return function(){};
 
    });
  }
 
});

Of course since there are a lot of other classes extending the Request and Fx classes, you get the same benefits on them. This is one of the true benefits of MooTools’ modular extensibility.

That is one of the benefits of using the class(ical) pattern in JavaScript. More on that next time…

Side-effects Example

var popup = document.id('popup');
var showPopup = new Fx.Morph(popup, { property: 'opacity', defaultArgs: 1 });
var feed = new Request.JSON({ url: 'mydata.json', method: 'get' });
 
var showFeed = feed.toObservable()
               .Do(function(data){ popup.set('text', data); })
               .Concat(showPopup.toObservable());
 
// ...
 
showFeed.Subscribe(); // loads mydata.json into #popup and displays it

Using Arrays in Unit Tests

Since natives are allowed to be extended within the MooTools theorem, we can add a convenience method to turn an Array into an observable stream of content.

Array.implement('toObservable', function(){return Rx.Observable.FromArray(this);});

We can use this to fake the “flicks” event stream in our earlier example. We avoid having to include complex asynchronous tests or user action tests.

var flicks = [
               { angle: 0, distance: 100 },
               { angle: 45, distance: 100 },
               { angle: 90, distance: 100 }
             ]
             .toObservable();
 
// Unit tests
// Synchronously testing code that's depending on a flick event stream

Web Sockets and Web Workers

Now imagine this on a stream of events coming in from Web Sockets or Web Workers.

You could set up a web socket to asynchronously feed you JSON objects, and easily hook that up to the rest of you UI just as easily as the Request example above.

The Reactive Extensions for JavaScript – Event Composition

March 6th, 2010 by Sebastian Markbåge

I’ve been following the work on the Reactive Extensions for .NET (Rx) by Eric Meijer and others over at Microsoft. At first look I was intrigued but didn’t really understand the purpose of it. However, at a second look, I realized that it had the potential to solve every major problem I’ve had with advanced UI development in JavaScript.

Asynchronous Programming – Composable Events

Modern UI development forces us to use asynchronous patterns for user actions, animations and data load. But to make UI development easy you can think of each operation as sequential. You can even artificially lock down the user interface – by disabling or hiding UI elements – while an operation is occurring.

If you want to optimize your user experience, you will need to start dabble in the complicated art of event composition. The problem occurs when you have complex interactions that depend upon other interaction or state.

You can solve this using various state machine patterns. However, I think you will find it quite difficult at times. Even if you do solve it, it’s probably going to be for a specific purpose which is not easily generalizable nor extensible.

Various tools have tried using patterns like Futures and Promises. I think those patterns need to be applied at the language level to be really useful though.

Reactive Programming in an Object Oriented World

JavaScript has introduced the map/filter/reduce methods on arrays to allow collection operations using a sequenced composition of functions.

There’s one minor thing that JavaScript developers should note. In LINQ these operations are lazy iterables. The map/filter operations aren’t actually executed until an .each() starts iterating over them. This avoids having to create duplicates of the result in memory. It also means that the underlying array can change after we call filter and map. This is similar to “live” collections in the DOM. But they can also be infinite in length just like Mozilla’s Iterators. The .each() call is still essentially a synchronous operation though.

Erik Meijer and team simply decided to make that iteration execution asynchronous.

This means that the source data can be asynchronous. So instead of thinking of Events as independent, think of them as a stream of data with an unknown length… (or an asynchronous list/array).

This means that you can now apply the same type of function composition to streams of events. Enter the Reactive Extensions for .NET.

Supposedly this could solve the problem of Event composition in the UI space.

The Reactive Extensions for JavaScript

To my surprise, Matthew Podwysocki recently started a blog series about the Reactive Extensions for JavaScript (also Microsoft to be clear). Apparently the benefits of this tool in the JS world has not gone unnoticed.

There are no bits officially released yet. However, considering Matthew Podwysocki’s recent posts and Eric Meijer’s upcoming talk at the Mix conference… I wouldn’t be surprised if something was released at Mix on March 17th.

Learning More

I have since been interested in learning more about various alternative models. There’s a research project called Arrows which provides a different model that’s more purely functional. There’s also a framework called Flapjax which is more of a DOM library aiming to provide reactive concepts to JavaScript.

To learn more about the Reactive Extensions, take a look at the videos about the .NET version posted by the team over at Channel 9.

Concerns

I’m not sure the first implementation of the Reactive Extensions is going to be the one to solve all these problems.

I think that many developers will have a difficult time thinking about these concepts in the terms of event streams. That could make it difficult to use the current method naming. In this sense, I think Arrows might be easier to get started with. It will allow you to think about events as sequential operations. However, I also see benefit in the model employed by the Reactive Extensions, IF we can all wrap our heads around it.

Another issue is the “Let” method. This may be difficult to know when to use for many developers. That’s true even for LINQ. However, in the Reactive Extensions I have a feeling those issues will become even more prevalent. Hopefully there will be better syntactical sugar to solve this issue.

Rx has yet to prove itself in real world complex applications that goes well beyond single subscription examples. I may try to extend the canonical drag and drop examples to my own HTML5 based Drag and Drop model and plugins to stress test it.

Naming Conventions

There’s also the issue of upper camel case in method names. LINQ for JavaScript is also using this convention. I’m guessing they’re trying to be compatible. However, the JavaScript convention is to use lower camel case method names, which also the new ASP.NET AJAX library is doing. So I don’t understand why.

In dynamic languages with limited auto-completion (IntelliSense) support, naming conventions are very important to follow.

Although, I do like the names Select/Where/OrderBy better than map/filter/sort since given the arguments, that tends to read better as a grammatical sentence.

UPDATE: Event DSLs

I should mention that the MooTools 2.0 team has been working on a DSL based on the CSS selector syntax. This is an extension of the Element.Delegation plugin.

The idea is to use event names and pseudos in combination to create custom composite event listeners. This example would listen to the first click event:

element.addEvent('click:flash', firstClick);

This could enable a lot more powerful combinations of custom events. However, it doesn’t enable passing of parameters and the composability of Rx and Arrows.

The Performance of .nodeName

November 24th, 2009 by Sebastian Markbåge

I was researching various options of traversing nodes for Slick and the DOM Range for MooTools. I realized that the nodeName property is incredibly slow to access in WebKit browsers. This is because it is working with qualified names (with namespaces and stuff) internally.

if (node.nodeName == 'A') // do something with anchor tag

If you add case insensitive matching to that it will be even slower.

Instead I decided to try to check the constructor of the node to determine what type it is. For example for the anchor (A) tag, modern browsers will use the prototype of HTMLAnchorElement. This can potentially speed up these checks if you’re looking for a known node type.

if (node.constructor === HTMLAnchorElement) // do something
// OR...
if (node instanceof HTMLAnchorElement) // do something

I ran this performance test in various browsers. It traverses all nodes in a large HTML documents and checks which ones are anchor nodes. It first does a blank run to eliminate any initialization quirks. Then it does a control run without the anchor check. Then it tests each of the above models.

IE6 and IE7 will obviously fail since they don’t support the HTMLAnchorElement constructor/prototype. For that case you would have to fall back to the nodeName property.

IE8 will be slightly slower with the constructor check than the nodeName check. But the difference is marginal in the overall scope of IE’s slowness.

WebKit will gain significant performance using the constructor check. The difference is relatively small to the overhead of manually walking the tree. However, if you take the control value from the blank run into account, the difference of just the node type checks will be significant (several times faster). The slow part is the WebKit DOM API, so you will see this with both JavaScriptCore and V8 (Safari and Chrome respectively).

Firefox will be slower on the first run for some weird reason. But in subsequent runs the constructor check will be faster than the nodeName check.

As a side note, node.tagName is no different. That is just an alias for node.nodeName.

In John Resig’s case sensitivity he discusses the case inconsistencies of the nodeName property in various contexts and the impact on performance. For example, in IE, the value of nodeName of unknown elements (like the new HTML5 elements) keeps it original case as in the markup.

This means that any proper CSS selector search for such elements would have to run a case-insensitive match against the nodeName property. Unfortunately the little trick I’ve shown above doesn’t remedy this problem because unknown elements will be lacking a known constructor. However, known Elements can still utilize this trick as a slight performance boost, while letting unknown element fallback to a case insensitive match.

UPDATE: I added a case-insensitive match to the performance tests using regular expressions – showing the added overhead compared to constructor checking.

Why you shouldn’t return false in MooTools event handlers

July 25th, 2009 by Sebastian Markbåge

Let’s say I have a link (anchor tag with href), and I wish to attach an event listener to it.

<ul>
  <li><a id="mylink" href="http://...">my link</a></li>
</ul>
document.id('mylink').addEvent('click', function(){
  console.log('hello world');
});

Now, if I click the link it will log the message but the browser window will also visit the location of the link. There are a bunch of such default behaviors to pretty much every event in the DOM. If we’re implementing custom behavior, we typically want to prevent this default behavior. A common practise is to have the method return false as such:

document.id('mylink').addEvent('click', function(){
  console.log('hello world');
  return false;
});

THIS IS BAD! Don’t. To understand the reason for this, you need to understand event bubbling and the difference between preventDefault and stopPropagation.

Event bubbling and stopPropagation

When an event is dispatched, it first fires the listeners of the ‘mylink’ element (not quite true, but we don’t use capture). But then it propagates (bubbles) up to the LI-element, UL-element, BODY-element etc.  So for every click on any element, the ‘click’ event is triggered on the BODY-element. After all of that, the default behavior of the browser is triggered.

In most browsers bubbling continues to the document and window objects, but that’s not always true for IE.

This is a powerful model. It allows us to do things like Event delegation. You can place a listener on the UL-element to catch any events triggered on the LI-elements without adding listeners to all the existing or any new LI-elements.

Sometimes we don’t want bubbling to occur. Let’s say for example that I wanted to have a ‘click’ event handler on the UL-element that handles clicks on the UL area outside of any A-element. Then I could accept the Event object as the first parameter, use stopPropagation during the click event on the A-element to stop the event before it reaches the UL.

document.getElements('ul').addEvent('click', function(){
  console.log('You clicked within the UL but outside of any link.');
});
 
document.getElements('a').addEvent('click', function(event){
  console.log('You clicked a link.');
  event.stopPropagation();
});

preventDefault

In my example above the browser would still visit the href of the link. Stopping propagation (bubbling) doesn’t actually prevent the default browser action. So we also need to call preventDefault during the click event to prevent the default operation of clicking a link.

document.getElements('a').addEvent('click', function(event){
  console.log('You clicked a link.');
  event.stopPropagation();
  event.preventDefault();
});

Now since this is fairly common MooTools has a shortcut for doing both stopPropagation AND preventDefault. Namely the stop() method:

document.getElements('a').addEvent('click', function(event){
  console.log('You clicked a link.');
  event.stop();
});

So, why is return false bad?

In the standard browser DOM model it’s equivalent to calling event.preventDefault(); but in MooTools it’s equivalent to calling event.stop(); i.e. it also calls stopPropagation.

This is a problem. If you use this model routinely you may not notice that you actually prevent plugins attached to elements higher up in the bubbling chain.

Let’s say I want to use the ‘mouseleave’ event to hide the UL-element when the mouse leaves. If I also return false on the ‘mouseout’ event on the A-element, I may not get the ‘mouseleave’ event because the A-element stops it. OR maybe I have a plugin higher up that requires that my events bubble. It’ll be even more prevalent as more plugins makes use of Event delegation.

Therefore you need to be very explicit about when you stop propagation and not.

Second of all, the “return false” API doesn’t make sense. The function isn’t failing. It isn’t canceled. In fact, it’s canceling a DIFFERENT function.

Therefore you should ALWAYS be explicit by calling either event.preventDefault(), event.stopPropagation() or event.stop(); instead of relying on an implicit convention that differs between frameworks.

Returning a false value is a relic from the old days when we only had a single listener per event.

Binding Parameters

Sometimes you need to bind parameters that you wish to pass to an event listener. A common practise is to use bind.

document.getElements('a').addEvent('click', function(paramA, paramB){
  // do something with this, paramA and paramB
  return false;
}.bind(someObj, [objA, objB]));

In this case you can’t accept an Event object since you’ve bound your parameters to other objects. In this case you can use bindWithEvent to let the first parameter (the event object) get through, while binding the remaining parameters.

document.getElements('a').addEvent('click', function(event, paramA, paramB){
  // do something with this, paramA and paramB
  event.stop();
}.bindWithEvent(someObj, [objA, objB]));

$lambda(false)

“But I don’t want to type out all of that just to stop an event. I like $lambda(false) to easily block events.”

People sometimes use the $lambda method to create a function that returns false to easily stop an event without doing anything else: el.addEvent(‘click’, $lambda(false));

So you need a method that does nothing other than accepts an Event object and calls preventDefault, stopPropagation or stop? Thanks to MooTools generics you can easily do that like this:

element.addEvent('click', Event.preventDefault); // OR...
element.addEvent('click', Event.stopPropagation); // OR...
element.addEvent('click', Event.stop);

For you that think “return false;” saves bandwidth… “e,” and “e.stop();” is two bytes shorter.

Additional Event Listeners on the same Element

Neither preventDefault or stopPropagation or even an error prevents any additional handlers/listeners on the same element. So if you have two handlers listening to the same event, then both will be triggered regardless of the result of either function.

That should be true for all Events, even Class events. More on that in MooTools 2.0…

Transitory Domain Objects

May 30th, 2009 by Sebastian Markbåge

A common problem with DDD is the injection of services to your domain model. Sometimes your domain relies on external services to do it’s job. You could do that by injecting your services directly to your entities using NHibernate Interceptors or ObjectStateManager for Entity Framework v4.

There are many design issues with the POCOness of Entities when you keep references to external services within the Entities themselves. The reference itself is (usually) infrastructure and not really a persistence concern.

Double Dispatch, Specifications and Services

The double dispatch pattern seems to be a popular approach. A better solution seem to be to move the logic in front of the Entites. Usually people seem to solve this by moving logic to services or even specifications.

Moving domain logic to services is a big no, no. That’s a gateway to anemic domain models and bloated service implementations. Services should be a last resort for external concerns and should probably have a solid anti-corruption layer.

The double dispatch pattern is a pain, ugly and introduces lots of references to services where the ubiquitous language doesn’t dictate it.

The specification pattern is particularly ugly because that’s (usually) not how a domain expert would refer to the issue. We are violating the ubiquitous language.

Transitory Domain Objects

Recently I’ve started introducing unpersisted classes to my domain models. If you think about it, many domain models have transitory terms and concerns that are not really persisted.

Imagine that your domain model consists of an archive of home photography. Let’s call them Photos. Now, you want to work with a couple of them. You pick out all the ones that have a red lavish hue and start organizing, labeling them or other operations. Now you have a set of Photos.

You could claim that it is a UI or Controller concern. Given the right bounded context, that set of photos IS A Domain Concern! Your domain could have domain specific restrictions and operations occurring on those sets of photos. You can think about them as a workspace or extended units of work.

Now this set isn’t persisted. It’s not an entity, it’s not a value object. Your entities can’t refer to it. This transitory logic lies infront of your entities. Since it’s transitory it also means that it can contain references to repositories and external services. It makes reference management much easier.

Now we can change out our specification and double dispatch patterns:

var redishPhotoList = photoRepository.Find(
  new HueSpecificiation(colorDetectorService, Color.Red)
);
foreach(var photo in redishPhotoList){
  //checks...
  photo.MarkWithMetaData("RED", metaDataService);
  //contraints...
}

To something more domain specific:

var photoSet = new PhotoSet(photoRepository, colorDetectorService, metaDataService);
photoSet.UsingOnly(Color.Red).MarkWithMetaData("RED");

We now have a domain object that we can easily pass around our application.

When you think about it you’re probably already using this pattern either as helpers or as “services”. But making the clear distinction that this is 1) A Domain Concern. 2) Temporary. Makes it easier to place your logic and apply constraints.

Achieving pure POCO is a pain from an infrastructure perspective but it’s worth it once it’s in place. I should be able to pass it to and from Db4O without any infrastructure concerns. Then you have a clear and solid domain model.

Parsing Base64 Encoded Binary PNG Images in JavaScript

May 20th, 2009 by Sebastian Markbåge

The other day David Walsh was experimenting with rendering images in the browser using regular tags as pixels. Valerio picked up the idea and made some enhancements. A server-side script transformed PNG files into a JSON image format for easy parsing on the client. That raised the question… How difficult would it be to do that parsing on the client instead?

Why PNG? Well, other than becoming the new defacto standard for graphics it’s a very simple format. It’s also free of patents and uses only simple well known techniques. It makes it very easy to work with. This post is about parsing raw PNG image data in pure JavaScript. It has nothing to do with built in browser support for the format.

Base64 Encoding

JavaScript doesn’t allow us to work with binary data directly. Even with XHR we can’t work with the raw binary data because JavaScript doesn’t currently have a concept of raw bytes. Instead we have to get the bytes from a character representation of the data.

Luckily there’s already a standard transfer encoding already heavily in use in various places of the W3C standards… Base64! You can use the data: URI scheme to embed image data in your HTML or CSS documents. It’s also heavily used for binary data in e-mails.

We can get the data either from an XHR request, from a src attribute or just statically embedded in your JavaScript file. So, now we have our data as Base64 encoded string.

To work with the raw data we need a way to represent bytes. If you’re working with ASCII data you can just stick to string representations. But since we’re going to be working binary data the most useful way seems to be simple Numbers. That allows you to do bitwise operations and easily convert them to and from ASCII. It’s also provides better performance than representing the bytes as Objects.

Now we need a parser. I went with a sample parser by some guy named notmasteryet. There are others but this seems like a pretty solid implementation and allows us to work with bytes as Numbers. It also works as a reader that lets us read our data piece by piece instead of filling our memory.

DEFLATE

The current PNG standard only uses the DEFLATE algorithm for compression. It’s the same algorithm used in ZIP, GZIP, zlib, etc. So it’s a very common format.

Luckily for us, notmasteryet’s sample also includes a DEFLATE decompressor. It also works as a piece by piece reader which makes it more memory efficient to work with. The reader pattern is a great way to read data in nested formats.

PNG

The PNG format consists of a set of named chunks. A set of “IDAT” chunks makes up the main image data. The total data stream is compressed using DEFLATE. The uncompressed data is filtered using one of 5 simple delta compression filters for each line of pixels.

Notice that we haven’t yet touched any image-processing specific logic. DEFLATE and delta compression is used for text and other data as much as anything else.

The raw data consists of a color for each pixel. This can be either grayscale, RGB or a reference to a palette color. This is what we really want.

The PNG format is open and well documented. So I’m not going to cover it in any more detail.

Proof of Concept

Since we’re doing a lightweight JavaScript parser and probably have some control over the image data, we can skip some of the more outlandish features of the specification. We can also skip the verification parts. We’ll just skip the file headers and CRC checks.

I decided on an a simple API that reads each line of pixels as an array of RGB colors represented as a number.

var image = new PNG(base64data);
image.width; // Image width in pixels
image.height; // Image height in pixels
var line;
while(line = image.readLine()){
  for(var x = 0;x < line.length;x++){
    var px = line[x]; // Pixel RGB color as a single numeric value
    // white pixel == 0xFFFFFF
  }
}

I then took that RGB data and inserted the pixels into my document as DIV tags with a background-color.

Proof of Concept

In less than 3 hours I had a working Proof of Concept of a format I had never worked with before.

I skipped interlacing, alpha and some of the filters for the demo. It’s not meant to be a fully working prototype nor a reference library in any way.

Now What?

You could…

  • Display the image using a regular rendering method but use the PNG parser to extract colors using a Color Picker.
  • Add obfuscation or cryptographic layers to render images that can’t be easily ripped by bots or downloaded by users.
  • Render embedded PNG images using VML in Internet Explorer (which lacks data: URI support) with full alpha support.

Don’t expect this method to become the new hack for PNG or embedded images in Internet Explorer. The rendering methods here are probably too slow for that. You could do some nice stuff with CANVAS though.

However, I have demonstrated that it is possible to work with binary formats in JavaScript. We shouldn’t be afraid of utilizing existing binary standards (PNG, GZIP, SVGZ, SWF, TTF…). We shouldn’t always fallback to our comfortable old JSON format and reinvent the wheel for every client-side need.

Relevant Projects

The MooTools team is working on a tool set for vector graphics in the web browser, A.R.T. You could use binary formats to embed your vector based graphics in formats like… TrueType!

The APE (Ajax Push Engine) project brings socket programming to the JavaScript platform.

Digg’s MXHR stream parses multipart encoded data and extracts the parts for various uses. This could provide a packaging model for various widgets or data packets.

Client Side Dependency Strategy

May 17th, 2009 by Sebastian Markbåge

This post is in response to an off-site discussion about modular dependency strategies. But I figured I’d post it here for future reference.

The Calyptus Web Resource Manager is a project that can on compile-time or on runtime handle your JavaScript, CSS, and other client-side dependencies. You can keep source code as separate files on the server or pre-compile packages (such as .ZIP, .DLL or .JAR). Currently source is available only on the .NET/Mono platforms but the concept is valid for all platforms.

Syntax

The syntax is largely inspired to be compatible with ScriptDoc and ECMAScript 4 Draft import statement. In the top of your file you add the dependencies that your file relies on:

/*
@import [package, ]filename
@include [package, ]filename
@build [package, ]filename
@compress [always|release|never]
*/

Don’t worry, we’re not going to ruin your precious open-source project with inline docs. Read on.

@import – Indicates that this file has a dependency on the referenced file and that it needs to be included in the final document (implicitly before this one). The other file may be a JavaScript file, CSS, image, Flash or something else. The project is fully extensible.

@include – Same as @import but also indicates that the referenced file should be merged into this one on compile or runtime.

@build – Same as @include but also merges any nested @import statements. Allowing you to create a single packaged file.

@compress – Indicate whether the document should use a compression tool (such as YUI compressor) or not. Defaults to “release”, which means that it won’t compress during the debug stage.

You can reference a file by either filename/namespace or package + filename/namespace. You may include wild cards to reference an entire path or namespace. If you’re referencing another file in the same package, you can exclude the package name.

If you are running ASP.NET you can exclude the package if you’re referencing an assembly that is already referenced in your Web.config.

If you’re in a .js file, the filename will automatically look for files ending in .js.

This allows you to do a namespace like syntax on prepackaged files:

/*
@import MooTools.Core.*
@import MooTools.More.URI
*/

If you want to use the runtime view generating tools the syntax depends on what View Engine you’re running. For ASP.NET WebForms you can use the following controls:

<c:Import src="filename" runat="server" />
<c:Import assembly="package" name="namespace/filename" runat="server" />
<c:Include ... />
<c:Build ... />

In the future this will be integrated into the ASP.NET ScriptManager as well. For other view engines the syntax would be much prettier.

Example

MyBaseStyle.css

div.BaseClassItem {
  background-image: url(MyBaseImage.png);
}

MyBaseClass.js

// @import MyTheme.css
var MyBaseClass = new Class({
  initialize: function(){
    this.element = new Element('div', { className: 'BaseClassItem' });
  }
});

MyChildClass.js

// @import MyBaseClass.js
var MyChildClass = new Class({
  Extends: MyBaseClass,
  ...
});

MyView.aspx

<c:Include src="MyChildClass.js" />

OUTPUT:

<link href="MyBaseStyle.css" rel="stylesheet" type="text/css" />
<script src="MyBaseClass.js" type="text/javascript"></script>
<script type="text/javascript">
var MyChildClass=new Class({Extends:MyBaseClass,...});
</script>

Since I used the included command the file is included in the output document. All it’s dependencies are automatically added to the document through links.

Any referenced file is only added once to the output. So it’s no problem adding multiple references to the same resource in partial views or by indirect dependencies.

MyOtherView.aspx

<c:Import src="MyChildClass.js" />
<c:Import src="MyBaseClass.js" />

OUTPUT:

<link href="MyBaseStyle.css" rel="stylesheet" type="text/css" />
<script src="MyBaseClass.js" type="text/javascript"></script>
<script src="MyChildClass.js" type="text/javascript"></script>

In the sample above, I import the base class after the child class. Since the child is dependent on the base, it will be included first. Therefore the second reference to MyBaseClass.js is excluded.

Typical Work Flow – Late Optimization

Typically you would only use the @import statement in all your resources. You should only reference any direct resources that your code or style sheet uses. Indirect files are referenced by the referenced resources so that if a dependency changes, you don’t have to update all your reliers. Your views will only reference the direct resources that it is using by import statements as well.

This will generate a lot of <script> and <link> tags in your documents. This is not good for production where you want to minimize the overhead of multiple requests. That’s when you start building clusters.

Common.css

/*
@build Headers.css
@build Footers.css
@build MyBaseStyle.css
*/

Common.js

/*
@build MooTools.Core.Fx.Tween
@build MyChildClass.js
*/

Now I can include the cluster Common.js in my view:

<c:Import src="Common.css" />
<c:Import src="Common.js" />
...
<c:Include src="MyChildClass.js" />

OUTPUT:

<link href="Common.css" rel="stylesheet" type="text/css" />
<script src="Common.js" type="text/javascript"></script>

The MyChildClass.js reference and all it’s dependencies are ignored since those file has already been included in the document by Common.css and Common.js. You can for example add these clusters to your Master view to automatically optimize all your partial views. If you remove a reference from your cluster it won’t break any of your code, since those files are individually added by your partial views to your document.

This pattern will allow you to do late optimization of your load-time by grouping only the files that are commonly used in to clusters. Leaving edge-case files into the outer branches of your site. To accomplish this I recommend that you use a modular framework such as MooTools.

Your clusters should be named and composed in relevant packages for your site, not in packages of JavaScript frameworks. For example, DON’T create a MooTools.js cluster that includes all MooTools files.

By default, @include and @build commands are evaluated as @import during the debug stage. That makes it easy to find the references to your source code with debugging tools such as FireBug.

Messing Up Your Beautiful Source? Use Place Holders

If you’re working with a consultant project you can just put all your references in the source file. That makes it very easy to work with. But if you have an open-source project you may not want to mess up the source with dependency references. Instead, use place holder files that @include the original source and references the dependency place holders using @import.

Fx.js

/*
@import Class.Extras.js
@include Real/Source/Fx/Fx.js
*/

Fx.CSS.js

/*
@import Fx.js
@import Element.Style.js
@include Real/Source/Fx/Fx.CSS.js
*/

Now you can reference your place holders to get dependencies instead of the original source files.

What about my CDN?

You can use a CDN to store your clusters. Just reference the full URIs in your import statements. There is a pre-built class that does this with MooTools on Google. Just @import GoogleAPIs.MooTools.

I will add an @embedded syntax to reference other files that have already been included. That way you could write your own like this:

MooTools-Cluster-Google.js

/*
@import http://ajax.googleapis.com/ajax/libs/mootools/1.2.2/mootools-yui-compressed.js
@embedded MooTools.Core.*
*/

If you reference this cluster in your view, all references to your local MooTools files will be ignored since it they are already included in the Google cluster.

@include on Images

If @include filename.png is used in a style-sheet, every instance of url(filename.png) will automatically be replaced with base64 embedded data at runtime. This is only used on the runtime version since this content can’t be sent to IE browsers. IE browsers will get the url(filename.png) reference intact.

This also works with view/document Include commands. In that case an <img> tag is rendered with a link or embedded content depending on the browser capabilities.

This pattern allows you to do late load time optimization of image dependencies.

Getting Started

As always, begin by checking out the source.

Large Object Storage for NHibernate – Part 2 – Storage Options

March 29th, 2009 by Sebastian Markbåge

This is part 2 of a series describing Large Object Storage (BLOB) in a Domain Driven fashion. Be sure to read Part 1 about the new base classes introduced by this project.

Physical Storage Considerations

So, what are you options of storing large data objects in your relational database? This is actually not an easy problem to solve. Because a relational database is designed for small pieces of well structured data. Making a table, row or column too large will cause various problems with fragmenting, indexing and table scans.

Because of this, vendors have implemented data columns that store large data separately from the rest of the row. This typically means they can’t be used for indices or searches. They’re still internal to the RDMS and are fully covered by ACID transactions and backup procedures. You would typically keep the large object data on the same discs as the actual database itself. This can limit your overall performance and scalability. Additionally, the vendor API might support streaming for reading, while not supporting streaming for writing.

To remedy this situation vendors have come up with various ways of storing your data externally to your database (typically in a file system) while storing references to your data in the database and allowing you to access it and manage access control through your RDMS. This typically means that operations on these files are not covered by ACID transactions1 and backup procedures. External storage allows you to save disc space, since multiple rows and tables can share a reference to the same data file in a true denormalized fashion. Content-addressable storage (CAS) solutions are especially suitable for this kind of storage. You can use external storage with or without RDMS integration.

So to sum up your physical storage options:

  • In-table RDMS storage
  • Out-of-table RDMS storage
  • External storage using RDMS integration
  • External storage using NHibernate client

Because of these issues, various vendors have implemented more than one solution and there isn’t a consistent best-practise of working with large data. You have to chose the storage solution that is most appropriate for your particular requirements.

1 In this series I will only cover complete replacements of data rather than changes of data. This is done by exchanging one blob object for another as described in Part 1. Therefore ACID transactions on individual data changes aren’t going to be important for external storage. The entire blob will be written to storage. If the entire transaction succeeds, the reference will be changed. Otherwise the reference will remain at the old data.

Data Transfer Considerations

Accessing in-row data is usually sent with the rest of the data result of the query. This means that it is not typically viable for streaming because the entire row is always read in to memory.

For out-of-table storage some vendors doesn’t send large object data with the rest of the row result. That means that it can be requested and streamed in pieces. However, because ADO.NET doesn’t offer a requirement and API for this, it is usually done in vendor specific implementations. Some vendors require the data reader to remain open while reading the stream. This makes it unsuitable for NHibernate since we would like to work with our entities in a disconnected fashion. So in this case, we would have to query that row and column again to open the data connection when needed (lazy loading).

When external storage is used, only a reference to the data is sent with the query result. The actual data transfer is usually done over a protocol completely separate to the RDMS connection. Sometimes it isn’t even communicating with the same machine as the database. This makes it a very scalable solution. It will also allow us to open that connection and stream the data without querying the row and column of the database again.

Because of the inconsistent ways of accessing the data, we will addressing this at the client level in a vendor specific fashion. More on this in Part 3 – NHibernate Mappings.

Small In-Table Data Types

All RDMS has small in-row data types. In-row binary and text data. Such as VARBINARY(size) or VARCHAR(size). These are typically limited to around 4000-8000 bytes of data and are therefore not suitable for large objects. You would typically just map these to memory using byte[] and string. If you currently only have small amounts of data but expect it to scale, you can start off using one of these small data types and map it to Blob and Clob objects and then scale as you need it.

Large object storage options are highly vendor specific. I’ll cover a few common vendors.

Microsoft SQL Server

VARBINARY(MAX), VARCHAR(MAX) – These types is used to store in-table binary and text data at up to 2 GB. The practical performance and scalability limitations involved in storing data in-table usually means that you want to keep data in these columns to a few MB. Using the UPDATETEXT command in SQL Server you can write changes to the database in chunks.

XML(DOCUMENT), XML(CONTENT) – You can use the XML data type to store up to 2 GB of XML data per column. The same practical limitations as for VARBINARY and VARCHAR applies. You can specify either DOCUMENT or CONTENT to indicate whether the data has to comply to either a full XML document or an XML fragment. The Xlob base class allows for both complete documents and fragments.

IMAGE, TEXT and NTEXT – These data types are now deprecated and will be removed in future versions of SQL Server. Use VARCHAR(MAX) or VARBINARY(MAX) instead.

FILESTREAM – If your data is more more than 1 MB on average, you should consider the new FILESTREAM data type introduced in SQL Server 2008. It stores data out-of-table in the NTFS file system. The data size is limited only by the local NTFS file system. Each row that uses a FILESTREAM column must have UNIQUEIDENTIFIER. The FILESTREAM column is completely integrated with SQL Server, it’s backup facilities and it’s client software.

Microsoft SQL Remote Blob Storage (RBS) – Microsoft has introduced a new plug-in API for external storage used together with SQL Server. This will allow any storage solution provider to hook into Microsoft’s common API. It’s installed both on the server and the client. The server handles garbage collecting and manages the references to various BLOBs in the storage solution. This is a flexible and highly scalable solution and it integrates nicely into the SQL Server product. If you want to leave the external storage API to the client, read on to External Storage.

Oracle

BLOB, CLOB, NCLOB and XMLType – These are out-of-table data types for storing binary, text and XML data up to 4 GB. Oracle 10g and above supports up to 8 terabytes of storage depending on your CHUNK setting for the table. NCLOB stores text data in a Unicode national character set.

Oracles LOB types are all stored out-of-table and referenced using Lob locators. This makes them suitable for the disconnected environment used by NHibernate.

Oracle allows for XML operations to take place on the server which, in the future, could be used to speed up operations of a XmlReader generated by a Xlob.

LONG and LONG RAW – These data types are now deprecated. They can store 2 GB of data. Use CLOB or BLOB instead.

BFILE – Oracle has reference type that points to files on the local file system. You can read these files (up to 4GB) via the Oracle API. You can’t write to them though. You can change the reference to another file in the file system. So if you create a reference to an existing file using Blob.Create(“filepath”), the NHibernate mappings will be able to change out the reference to the new file. You can also open up a directory where NHibernate can store new files. In both cases, both the Oracle server and client will need access to this directory. BFILEs are an external storage solution. Oracle doesn’t handle write transactions, garbage-collecting of files nor backup procedures.

PostgreSQL

BYTEA, TEXT and XML – Used for in-table binary, text and XML data respectively. Current APIs doesn’t support streaming of these types. They will have to be read in to memory all at once.

TOAST – PostgreSQL normally stores it’s data in tuples of 8 kb which doesn’t allow the above data types to be very large. Using TOAST large columns are automatically stored out-of-table. It also has mechanisms for compressing data and trying to fit it in to rows if possible. TOAST isn’t it’s own data type but can be used to expand BYTEA, TEXT and XML columns to a maximum of 1 GB.

Large Objects – PostgreSQL supports the notion of Large Objects. These are stored out of table but within the management of the RDMS itself. Each new object is given it’s own ID and it is this ID that is referenced in the data tables. These objects are read and manipulated using a special API. Each object can be referenced several times and across tables. As far as I know it is not garbage-collected nor handled by backup solutions. So this solution can be compared to other external solutions even though it is managed by PostgreSQL itself. Since this solution shares objects for the entire database you will have to incorporate your own custom garbage collecting solution.

Large Objects are useful when you need to store data larger than 1 GB. The documented limit is 2 GB but in practice you can store files of several GB depending on the file system. The Large Object API will also allow you to stream the data instead of reading it all into memory. Therefore this is the preferred solution for storing large data on PostgreSQL.

MySQL

BLOB and TEXT – These columns are used to store binary and text data up to 4 GB. MySQL doesn’t have a column for XML data. TEXT is the recommended column type for XML. These columns are stored out-of-table but MySQL doesn’t support streaming of data. This means that each object will have to be read into memory in it’s entirety.

PrimeBase Technologies are currently working on a Blob streaming infrastructure over HTTP to be integrated into MySQL. It uses their XT storage engine.

For other storage engines, you will need to look to external storage.

External Storage

If you prefer to decouple your large object storage solution from the database you can use a completely external storage solution. In this case, you would store a reference to the data blob in your relational table. Usually as a fixed length binary or GUID/UUID. The data is stored in a completely external solution with no communication with the database. This makes this solution completely vendor independent and highly scalable.

The NHibernate.Lob client handles the communication with both the external storage solution as well as the database. Your client should on certain intervals (nightly?) let NHibernate.Lob scan all mapped tables for external references. It will then garbage collect the data blobs in the external storage that are no longer referenced.

The NHibernate.Lob project includes a common API for external storage solutions for use with NHibernate. Included is also a file-system based CAS storage option to get you started. High-end CAS solutions such as EMC’s Centera or Caringo’s CAStor are very suitable for this kind of storage if you have extreme scalability or accessibility needs. They’re also useful if you need to comply with local regulations that require you to never delete data.

Text and XML Types

Clob and Xlob are structured data since they have a specific format (Text and XML). These can be stored in various ways depending on your vendor’s specific data columns. Text can be stored in various different character sets. XML can be serialized as binary XML in storage or saved using various character sets. If your vendor does provide a specific Text or XML data type that is suitable for large objects I would recommend that you use it. This will allow the RDMS to handle the format and serialization constraints. Any compliant software can handle and display the data without further user interaction.

However, if you use external storage or want to utilize the various compression options mentioned in Part 5, you can store your Clob and Xlob data in any binary column as well. There by letting the client determine the serialization format.

Getting Started

The full source code to Calyptus.Lob and appropriate NHibernate mappings are available at our Calyptus.Lob project at GitHub.

More in This Series

In the next part of this series I’m going to describe how you can use the NHibernate.Lob project to map up these storage options to your Blobs, Clobs and Xlobs in your NHibernate Entities.

Part 1 – BLOBs, CLOBs and XLOBs

Part 2 – Storage Options

Part 3 – NHibernate Mappings

Part 4 – External Storage

Part 5 – Compression Options

HTML 5 Current Browser Support – Part 1 – Introduction

March 24th, 2009 by Sebastian Markbåge

The HTML 5 working draft is continuing it’s development of the future support for HTML 5. This includes new tags, attributes and a strong specification of how clients should interact with old and new elements. What I find even more intriguing, is the standardization of many advanced JavaScript DOM features (such as editable content, drag and drop). Most of which has been available to IE users for more than a decade. This is one area that standards has been particularly slow to adopt. With the current beta versions of Safari, Chrome and Firefox these new browsers are finally ready to leave IE behind (yes, even IE 8).

Many people are still frightened of implementing code according to a working draft. Especially since it’s not scheduled to be complete until 2012. In my opinion, those fears are largely unfounded at this point. The primary reason for this is that many of the features have been available in IE for many years and the HTML 5 specification centers around keeping some historical compliance. So the primary threat for lagging cross browser functionality has already been eliminated. It is also the WHATWG’s estimate that browsers will have full compliance and people will have started utilizing this new standard long before it is finalized. For these reasons, by the time you read this, you may already be a late adopter.

However, there are still some quirks that you need to be aware of. I’ve been working on cross browser layers of the HTML 5 specifications since 2007 including backwards compatible code for older browsers. This code has been used in production and little of it has changed since mid-2008. Therefore I’ve started work on introducing these features to my JavaScript framework of choice, MooTools. While I refactor my code for this purpose I thought I might introduce some of the quirks that you might come across in your own endeavors.

Coming up

Part 2 – Drag and Drop, Copy and Paste

Part 3 – Range and Selection

Part 4 – ContentEditable and ExecCommand

Large Object Storage for NHibernate – Part 1 – BLOBs, CLOBs and XLOBs

March 12th, 2009 by Sebastian Markbåge

This is the first in a series of posts describing the design considerations involved with storing Binary Large OBject (BLOB) data with NHibernate and how it led me to start a project I’m currently calling NHibernate.Lob.

Note that the samples here are focused mainly on NHibernate but the pattern can be applied to many different persistence models. I’m considering support for DB4o for example.

Lazy Streaming of Data

The typical way to store binary data in NHibernate entities would be as a byte[] array. After all the basic premise of NHibernate entities is that the data is stored in-memory in the first level cache. For smaller binary data this is just fine. We don’t even really our columns to be lazy.

If we start adding larger data the first problem one might notice is that the data is loaded every time the entity is loaded. This is a common question around NHibernate user groups. This can quite easily be solved by separating it out to a lazy loaded entity or using lazy columns.

If we add even larger files we start wasting precious memory. This is especially problematic in high concurrency applications and web applications. A (very) common scenario would be to store image data together with an entity. At this point we shouldn’t ever keep the entire file in memory. Instead we should stream the data piece by piece from the persistent storage to whatever we want to use it for.

At this point, the actual data is never stored in the in-memory entity. Only pointer data is stored about where to find the information. It goes beyond the concept of lazy loading since only a piece of the data in available in memory at any point.

Note that this is NOT really related to the concept of a document database. We’re talking about large serialized objects (500 kb+ if I had to give a number) such as images, videos or large document files.

I also mention the term: pointers. In the context of this article series I don’t mean memory pointers but rather a reference to where one can find the real complete data. This may be in-memory, on disk, remote or distributed etc.

Streaming Data Types – The Current State of ADO.NET

So what data type will we use as the pointer to this data? Our domain model is suppose to be persistence ignorant so one of the common .NET types would be nice. There are typically three common structured types of large data stored in modern databases: Raw binary data, Text and XML. Binary Large OBjects are typically called BLOBs. Text or Character Large OBjects are sometimes called CLOBs. How you store the data and in which column types is very RDMS provider specific. From now on I will call these three types as just LOBs.

Now, the in-memory types for these would typically be byte[], string and XmlDocument. The streamed versions would be Stream, TextReader and XmlReader. However, this gives us some problems. The contract of these three abstract classes are more than just pointers to where to get the data. They also contain the current reading position of the stream. This means that we can only read from that entity ONCE during it’s life time. They also implement IDisposable and keep a data reading connection open and expect to be closed and disposed of.

There’s really no common way for working with streamed data in ADO.NET since everything. In fact, the typical example for dealing with LOBs in ADO.NET involves reading the full data into memory using IDataReader.GetBytes(…). Some providers have supplied there own solutions to this issue (such as OracleLob and SqlBytes). The most common solution seems to be to inherit Stream in their custom solutions. You can still read it several times by first cloning the Lob object but it isn’t really a nice solution for a domain model. They also imply that the connection is already open. What we really need is a type from which we can create readers.

Thankfully our friends on the Java end of things have already thought about this. In Java there are Blob and Clob interfaces which fits just this purpose. They can create both reader and writer streams. It is also nicely implemented in both JDBC and Hibernate.

Another issue with TextReader and XmlReader is that we have no way to write to them but this is not really an issue as I will describe at the end of this article.

Introducing New Data Types – Blob, Clob and Xlob

So to remedy this situation I’ve suggested that three new base classes are added to our .NET domain models. The contracts of these are pretty simple.

namespace Calyptus.Lob
{
	public abstract class Blob
	{
		public abstract Stream OpenReader();
		public virtual void WriteTo(Stream output);
	}
 
	public abstract class Clob
	{
		public abstract TextReader OpenReader();
		public virtual void WriteTo(TextWriter writer);
		public virtual void WriteTo(Stream output, Encoding encoding);
	}
 
	public abstract class Xlob
	{
		public abstract XmlReader OpenReader();
		public virtual void WriteTo(TextWriter writer);
		public virtual void WriteTo(XmlWriter writer);
		public virtual void WriteTo(Stream output, Encoding encoding);
	}
}

Basically there’s a PULL and a PUSH method to get the data from LOB. The WriteTo methods are NOT away to write data to the LOBs. It’s a way to PUSH the data from the LOB into a writer.

Why an abstract base class instead of an interface? This is a common debate in .NET. But since this pattern is overwhelmingly used most often in the .NET Framework (Stream, TextReader and XmlReader are a few examples) I figured it’d be best to keep that trend. It also allows for virtual methods to be added later (such as Java’s getBytes, position and length) without recompilation of inheritors.

I’m sure that these contracts are going to be very much debated since it involves the core of the domain model which, in the NHibernate world, should be persistence ignorant. You can still easily switch out the ORM and let that ORM handle these new types. The best would be if Microsoft’s Patterns and Practises team introduced these new types as a common practise and perhaps even into a System.Data.Lobs namespace.

Some of you may be thinking that this pattern makes the domain model aware of it’s repository. But it really doesn’t. No more than lazy loaded entities and collection does. You can even save it to another repository. More on that later.

Writing to Blobs – Don’t

You may have noticed that unlike the Java interface I didn’t put any way to write to the LOBs in the base contract. This is because you shouldn’t persist anything until a Flush (or SaveChanges) style event. If it’s a new entity, the row doesn’t exists and there may not be anything to write the data to. It could also not even be part of the row. It may be stored as it’s own “entity” and shared by multiple other entities. In this case you would override their data. Data should be written all together in an atomic manner.

So how do I change the data? You replace the LOB pointer (the Blob, Clob or Xlob objects) with something that points to some other data source with the new data. This can be from a file, a stream, memory, or maybe a custom implementation which combines or converts data on-the-fly. This will allow you to build pipelining patterns. It can even be an other LOB in your database. NHibernate will tell your LOB object when and where to write itself to.

This is also the same way Hibernate handles Blob and Clob in Java. It actually throws exceptions if you try to write to it’s Blobs or Clobs.

Finally Some Code

Let’s start by defining a domain model. Let’s just stick to one single entity called Product. With a binary image file, a long description text and an XML file which contains further specifications.

public class Product
{
	public int ID { get; set; }
	public string Title { get; set; }
	public Blob Image { get; set; }
	public Clob Description { get; set; }
	public Xlob Specifications { get; set; }
}

To read from these three LOBs you would use either the PULL or PUSH patterns (OpenReader or WriteTo). The following sample fetches a Product from the database. It then writes the image data to a HttpResponse. Then it writes the specifications to disk using a custom XmlWriter. Finally it reads the first line of the description.

using (ISession session = sessionFactory.OpenSession())
{
	Product product = session.Get<Product>(100);
 
	Response.Clear();
	Response.ContentType = "image/jpeg";
	Response.BufferOutput = false;
	product.Image.WriteTo(Response.OutputStream);
 
	using (XmlWriter writer = XmlWriter.Create(@"C:\MyFiles\SomeData.xml"))
	{
		product.Specifications.WriteTo(writer);
	}
 
	using (TextReader reader = product.Description.OpenReader())
	{
		string firstLine = reader.ReadLine();
	}
}

Changing the LOB data involves replacing the instance with another one. You can do this by using one of the built-in implementations using the static overloaded Blob.Create(), Clob.Create() and Xlob.Create() methods. The data can come from files, streams, memory, the web or your own implementations. You could for example create your own implementation which combines two files into one on the fly as it is written to the database.

The following sample loads a product from the database, replaces the image with a file from disk, replaces the description with an in-memory string, replaces the specifications with one from the web and then saves it all to the database.

using (ISession session = sessionFactory.OpenSession())
using (ITransaction transaction = session.BeginTransaction())
{
	Product product = session.Get<Product>(100);
	product.Image = Blob.Create(@"C:\MyFolder\MyImage.jpg");
	product.Description = Clob.Create("My short description.");
	product.Specifications = Xlob.Create(new Uri("http://domain/document.xml"));
	transaction.Commit();
}

Note that in the above sample it’s not a reference to the file and the web that is stored in the database. The actual data is read and stored. Depending on your application, the use of WebRequests could be prohibited or a potential security issue to load unknown remote XML documents.

Note that you can also use the implicit casting of the LOB types to implicitly cast some known types. There are also Blob.Empty, Clob.Empty and Xlob.Empty singletons that you can use to insert empty data. This is not null. The following sample implicitly casts a Stream to a Blob, a String to a Clob and removes the product’s specification by replacing it with an empty one.

using (ISession session = sessionFactory.OpenSession())
using (ITransaction transaction = session.BeginTransaction())
{
	Product product = session.Get<Product>(100);
	product.Image = Request.Files["uploadedImage"].InputStream;
	product.Description = Request.Form["description"];
	product.Specifications = Xlob.Empty;
	transaction.Commit();
}

If you prefer a less anemic domain model you could keep the LOB internal to the class and do reads and writes with custom logic.

public class Product
{
	public int ID { get; set; }
	private Blob image;
 
	public void ChangeImage(Stream input)
	{
		this.image = input;
	}
 
	public void CopyImageFrom(Product product)
	{
		this.image = product.image;
	}
 
	public void WriteImageTo(Stream output)
	{
		this.image.WriteTo(output);
	}
}

Note that the stream used in ChangeImage() will not be read and disposed of until your Session is Flushed. Depending on your application design this pattern may not be useful.

In the current version, the StreamBlob class which wraps a Stream as a Blob can only be read once if the Stream is not seekable. Therefore each instance can only be saved to one entity and not reused. In future versions it may replace the internal stream pointer to the one in the repository once the first one is saved. The same goes for the TextReader and XmlReader wrappers.

Getting Started

The full source code to Calyptus.Lob and appropriate NHibernate mappings are available at our Calyptus.Lob project on GitHub. I’ll make some official builds once the code stabilizes.

Coming up

Part 2 – Storage Options

Part 3 – NHibernate Mappings

Part 4 – External Storage

Part 5 – Compression Options