Tuesday, January 20, 2009

The Subtleties of PhantomReference and finalization

A common misconception regarding PhantomReference is that it is designed to "fix" the dead object resurrection problem that finalizers have. For example, this old java.net blog says about phantom references:

PhantomReferences avoid a fundamental problem with finalization: finalize() methods can "resurrect" objects by creating new strong references to them. So what, you say? Well, the problem is that an object which overrides finalize() must now be determined to be garbage in at least two separate garbage collection cycles in order to be collected.

Well, yeah. Except for the fact that even PhantomReference can let their referred objects be resurrected. This is even how it can be done:

Reference ref = referenceQueue.remove();
//ref is our PhantomReference instance
Field f = Reference.class.getDeclaredField("referent");
f.setAccessible(true);
System.out.println("I see dead objects! --> " + f.get(ref));
//This is obviously a very bad practice
Yes, unreachable objects are really referenced from the Reference#referent field, seemingly strongly, but no, the garbage collector makes an exception for that particular field. This fact also contradicts what the previous blog declares:
PhantomReferences are enqueued only when the object is physically removed from memory.
Which is not true, as we just saw. Javadocs also say:
Phantom references are most often used for scheduling pre-mortem cleanup
So if PhantomReference was not meant to fix the resurrection problem (which indeed is serious, as pointedly proven by Lazarus, Jesus and many others), what is it useful for?

The main advantage of using a PhantomReference over finalize() is that finalize() is called by a garbage-collector thread, meaning it introduces concurrency even in a single-threaded program, with all the potential issues (like correctly synchronizing shared state). With a PhantomReference, you choose the thread that dequeues references from your queue (in a single-threaded program, that thread could periodically do this job).

What about using WeakReference? It seems to also fit the bill for pre-mortem clean-up. The difference lies in when exactly the reference is enqueued. A PhantomReference is enqueued after finalization of the object. A WeakReference is enqueued before. This doesn't matter for objects with no non-trivial finalize() method.

And how exactly are you supposed to clean-up a dead object you don't even know? (PhantomReference's get() method always returns null). Well, you should store as much state as needed to perform the clean-up. If cleaning up an object means nulling an element of a global array, then you have to keep track of the element index, for example. This can be easily done by extending the PhantomReference and adding the fields you want, and then create PhantomReference instances from that subclass.

Now lets talk about even darker corners than these.

Lets say that it is even darker relating finalization than this. If you are about to write a clean-up hook for an object (by finalize() or with a [Weak|Phantom]Reference), and you happen to call a method on it while there is it is strongly-referenced only from the thread stack (i.e. a local variable), and you happen to invoke a method to that object, bad things can happen.

This is very unfortunate. For performance reasons, the VM is permitted if it can to reuse the register that holds the object reference, thus making the object unreachable. So, during the method invocation on an object, finalization might be executed concurrently, leading to unpredictable results (finalize() could modify state that is needed by the other method execution). This should be extremely rare though. Currently, this can be fixed by:



Object method() {
//do work here
synchronized (this) { }
return result;
}


public void finalize() {
synchronized (this) { }
//do work here
}
This will only affect you if you have an object referenced only from the thread stack and either one holds:
  • it has a non-trivial finalize() method
  • there is a [Weak|Soft|PhantomReference] to it, enlisted to a ReferenceQueue, and there is a different thread that dequeues references from ReferenceQueue
To conclude, the safest clean-up for objects is to have a ReferenceQueue and a PhantomReference, and you use the same thread that uses the object to do the clean-up. (If this the clean-up is performed from another thread, synchronization will probably be needed, and the above issue might be relevant).

You may check these JavaOne slides too.



2 comments:

  1. Thanks for sharing.. very enlightening!

    In your example, do you mean that the 'hook' is strongly-referenced only on the thread stack? Isn't the only way to reference anything is from within a thread (perhaps except for primitive types)? So you meant a thread created within the program's main() method? So if I understand correctly, the finalize() invokes the hook's method concurrently with another, ( e.g. strongly reachable) third object's invocation on the hook?

    Thanks,
    JF

    ReplyDelete
  2. By "thread stack" I probably meant call-stack; the collective storage of stack frames (local variables, method parameters). And I mean the last strong reference to the object being a local variable, that's fragile, that local variable can be reused for another var (if the compiler sees you don't plan to read again the previous value), thus allow for a (possibly concurrent) garbage collection of the object that now became unreachable due to this. Even though you just got access to the fields of the now unreachable object.

    ReplyDelete