Friday, March 1, 2013

Hodge Podge

This is just a hodge podge of short items; the 2nd part of the multithreading post will probably be up in a few days.

What's In a Name?

The methods in the Java standard libraries are generally well-named, but there are a few astonishingly irritating exceptions, and java.lang.Character seems to collect them.

Q: Which of the following returns true for a space, tab, newline and non-breaking whitespace: isWhitespace, isSpace, isSpaceChar.

A: NONE OF THEM!

isSpace and isWhitespace return false for a non-breaking space; isSpaceChar returns false for newlines and tabs. If you want to know if a character is a whitespace, you need isSpaceChar || isWhitespace. As an added kick in the shins, the \s regular expression class also doesn't match non-breaking space.

Keep on Moving (Conditionally)

Branch mispredictions on modern processors hurt. Badly. Because of that, anything that you can do to make branches (e.g. if statements & the conditionals on loops) easier to predict will generally make your code faster. Sometimes, though, you just reach the point where you can't make the branches easier to predict. In those cases, you can sometimes play some tricks to replace them with conditional moves/sets and pick up some performance. For example, here's the body of the main loop for merging two arrays in merge sort:

if(arr1[index1] < arr2[index2]) {  
   mergeInto[index3++] = arr1[index1++];
} else {
   mergeInto[index3++] = arr2[index2++];
}

and here's a version that replaces the branches with conditional moves/sets

int read = arr1[index1] < arr2[index2] ? arr1[index1] : arr2[index2];
mergeInto[index3++] = read;
int addTo1 = arr1[index1] == read ? 1 : 0;
index1 += addTo1;
index2 += addTo1 == 0 ? 1 : 0;

On an Ivy Bridge laptop running Java 1.7, the latter code is about 1.50 nanoseconds (0.0000015 ms) / element faster for the low, low price of making your code's intentions a little less obvious and the code a little harder to read. It should go without saying, but you shouldn't drop to this level of optimization unless your code is too slow, you've profiled it and know that your bottleneck is there and at least strongly suspect that the problem is branch mispredictions.

Singletonish

java.util.concurrent has a lot of neat classes, including the ever-useful ConcurrentHashMap. ConcurrentHashMap has a nifty little method called putIfAbsent that we can use to create singleton-ish objects. For example, if you need to run some expensive initialization once for each user who logs on, but you only want to do it once / user. The downside to putIfAbsent is that it takes an object, which would appear to meant that we have to instantiate the thing anyway. We can, however, take a page from Future's playbook and put an object that will instantiate the object. 

One thing you'll have to keep in mind is that you must check the return value from putIfAbsent, otherwise you'll be defeating the whole purpose and reinstantiating every time. Basically, here's the general pattern you'll want to use:

Instantiator instantiator = new Instantiator();
Instantiator fromMap = theMap.putIfAbsent(theKey, instantiator);
theValue = fromMap != null ? fromMap.getTheValue() : instantiator.getTheValue();

Instantiator's just a run-of-the-mill object that builds the object once and returns it. You'll need to do some synchronization to make sure you only instantiate its object once, but that's fairly cookie-cutter code.