jmnf 9: Practical Garbage Collection

javascript magic-ninja-foo episode 9: Practical Garbage Collection part 1

In this episode I'm going to briefly touch on the theory of garbage collection and then continue in the next episode with some strategies for overcoming common problems that arise due to garbage collection in javascript.

Theory

For those not familiar with the idea of garbage collection, it's an automatic memory management system for use in writing and running programs. Many newer (and some older) programming languages use some idea of garbage collection, a common example being Java. There are pluses and minuses to using a garbage collector, but that discussion is beyond the scope of this article. Suffice to say that javascript interpreters, by definition, implement one and therefore merit some brief reflection.

In essence, garbage collection alleviates the programmer from having to be disciplined in the management of system memory. In languages not using garbage collection (or some other automatic memory management system), the onus of allocating and freeing memory is on the programmer. Frequently in 'programmer controlled' memory management systems, functional bugs, syntax errors, inexperience, or even just sloppiness can lead to a condition known as a memory leak.

Memory Leaks

A memory leak is a condition where a program has allocated some amount of memory for program use, but fails to release it when no longer needed. For small, short-lived programs it might not be a problem. Exiting the 'runtime' state of the program might automatically free the memory or the operating system may 'scrub' the memory depending on the language/OS/implementation.

The worst case scenario, however, is either when the program has a long run-time and/or there are no 'safety nets' to release allocated memory. Over time the continuous allocation of memory without freeing gradually consumes more and more of the system until finally, the machine has no more available memory for program use, usually resulting in a program crash or worse, a system crash.

Implementation Theory

The basic theory of object oriented garbage collection is that there is a chunk of memory controlled by the garbage collector (the heap) where objects are stored when instantiated. When references exist pointing to an object in memory from the application context, some way even if indirect of accessing that object, its memory persists. If the garbage collector realizes that there is no way for the application to 'reach' objects in memory, no references exist, then the garbage collector reclaims the memory used by the object.

There are a couple of common ways of implementing a garbage collector. The first and most common is called tracing garbage collection. Tracing garbage collectors usually use explicit data 'sets' of objects in various states of reach-ability (usually three, sometimes called 'three color'). During garbage collection cycles, these states are updated with one state being reclaimable. An example of this type is the 'mark and sweep' algorithm. These garbage collectors are usually slower and more complicated than the the other common implementation.

The other common way to build a garbage collecting system involves an algorithm called reference counting. Reference counting uses a small data structure in every object instantiated that increments for every reference pointing to it, and decrements whenever those references are destroyed. When the reference count reaches zero, the object is free to be garbage collected (with there being no way to reach the object). Reference counting garbage collectors are much easier to implement but can be limited.

Practical implications of GC using javascript

Garbage collectors implemented using reference counting algorithms have some direct impact for javascript programmers. Part of the appeal of garbage collection is the abstraction of memory management away from the programmer so they can remain ignorant of the inner workings of its operation. Ironically certain implementations of javascript interpreters, namely the js interpreter included with Internet Explorer (as of version 6) by Microsoft, require us to go into the inner workings of its garbage collector so that we can write effective code. IE's javascript interpreter uses a 'reference counting' garbage collector that, unfortunately, has several shortcomings.

The biggest issue with reference counting garbage collectors, if not architected

carefully, is that they may miss some circular references. Imagine an application creates two objects, each with a reference to the other. Each object has a count of 2 (one from the application, and one from the other object). When the application discards its reference to the two objects, each of their counts are decremented by 1, but they're still not zero (each still has one reference to the other) even though they are no longer reachable from the application. This means that if the reference counting garbage collector isn't careful, it will not reclaim the memory used resulting in a memory leak.

Most reference counting garbage collectors implement strategies to avoid this scenario. In the above simple example, IE (version 6) does reclaim the offending memory. There are, however, other more complex cases where it doesn't. This seemingly arbitrary nature can make debugging IE memory leaks very frustrating. Ironically even Microsoft's own description of the leak pattern doesn't actually leak in IE 6.

While the scenario described in the Microsoft article doesn't leak, it's not difficult to come across the memory leak problem. It took a little time but I came up with the following example javascript that actually does leak in IE 6.

function TestMessage(message,target) {
   /* Function with state, outer closure arguments*/
   function MyInnerFunction() {
      function display() {
         var msg = document.createTextNode(message+" ");
         target.appendChild(msg);
      }
      return display;
   }
   /*assign inner function to object property*/
   this.display = MyInnerFunction();
}
/*create DOM element*/
var testdiv = document.createElement('div');
/*create new object*/
myMessage = new TestMessage("Event Handled!",testdiv);
/*make the DOM element visible and 
  assign event handler */
testdiv.style.position = "absolute";
testdiv.style.height = "100%";
testdiv.style.width = "100%";
testdiv.onmousedown = myMessage.display;

I have some DOM methods in the above code. Don't worry if you're not familiar with these functions, I'll discuss them in more detail in a later article. Essentially the snippet is creating an element that covers the 'body' content area that when clicked shows the message "Event Handled!".

While this code is lexically simple, the actual closures, execution objects, and what's referencing what is actually fairly complicated. I leave it as an exercise for the reader to diagram out the closures, scopes, and persistent execution objects represented by the above code. You can see why the reference counting garbage collector in IE6 could become tripped up by complicated circular reference patterns.

Putting this code into a web page you can witness the memory leak for yourself. Try hitting 'reload' a few times on the page with the 'Task Manager' open showing processes. Each reload should show 'iexplore' memory usage increasing from 2k - 20k per reload. Even if you navigate away from the page, that memory is now locked up by the IE garbage collector. The only way to reclaim the memory is to shut down IE and restart it. A 20k leak per reload may seem trivial but I've seen example leaks that consume 100's of K per reload depending on the complexity of the circular references. If multiple pages contained memory leaks of this magnitude, problems could arise very quickly.

Generally you want to look for the following two types of javascript that can cause memory leaks in the internet explorer javascript interpreter.

The second case becomes very common when using javascript object methods as event handlers for DOM objects.

In the next article I will discuss some strategies for avoiding these scenarios.