Saturday, October 23, 2010

Infinite sequences in Java

One of the interesting features in a lot of functional languages is lazily evaluated infinite sequences. Java lacks any explicit syntax for creating sequences, however the Iterator interface combined with generics lets us implement just such a feature, though it's obviously not as clean or succinct as in some languages.

As an example, here's an implementation of an "infinite" sequence of 64-bit integers.

package blog;

import java.util.Iterator;
import java.util.List;
import java.util.ArrayList;

public class Sequence
implements Iterable<long> {

private long value;

public Sequence(long start) {
value=start;
}

public Sequence() {
this(0L);
}

public Iterator<long> iterator() {
return new LazyIterator();
}

public List<long> take(int toTake) {
ArrayList<long> retVal =
new ArrayList<long>(toTake);
for(int i = 0; i < toTake; ++i) {

public boolean hasNext() {
/* Always return true, because this
sequence is "infinite" */
return true;
}

public Long next() {
// store where we are
long retVal = value;
// Move on to the next value
++value;
// auto-boxing will turn it into a Long
return retVal;
}

public void remove() {
throw new UnsupportedOperationException();
}
}
}
This technique is also useful for parsing documents or in other situations where generating the entire list would either take too long or too much memory.

Friday, October 22, 2010

The Fizz-Buzz Challenge!

One of my friends asks all interviewees to implement Fizz Buzz as part of a programming interview and, sadly, many people trying to pass themselves off as programmers can't implement it.

The basic idea is to print out the integers from 1-100, if the number is evenly divisible by 3 also print "Fizz", if it's evenly divisible by 5, also print "Buzz" and if it's evenly divisible by 15, print "FizzBuzz".

There are a ton of possible implementations, most of them painfully obvious and straightforward, so here's the challenge. 1) What's the shortest program that you can write to solve the problem and 2) what's the "fastest" (in terms of fewest number of operations, no sane solution should take more than a few milliseconds to execute and very nearly all of that is I/O).

My shortest in terms of lines of code was 5 lines of code (not counting curly-brace only lines), with 300 conditionals, 100 increments and 200 modulus operations. Fewest instructions had 6 conditionals (could have been none, but that would've cost an extra 75 lines of code) and 100 increments (no other operations), but at a cost of 27 lines of code.

Tuesday, October 19, 2010

More algorithms and surprises

Testing the intersection problem using a HashMap turned up some more interesting numbers. In spite of that being an O(m+n) algorithm (m to build the hash map, n to check for intersection) like the linear probing, it was still significantly faster than the linear probing, even with the time to sort the two arrays removed. The HashMap based intersection did have the disadvantage of taking about an extra 3% memory.

Another surprise came from the fact that sorting the two arrays was just a very minor part of the runtime for the linear intersection, even though it should be m*log(m) + n*log(n), as opposed to the intersection's m+n time. I suspect, though I'm not certain, that that is due to locality and caching, since the sort operates on one array at a time, so the processor is better able to pre-fetch the needed data. Either that or Josh Bloch, Neil Gafter & John Rose have magical programming powers, which I'm also not ready to rule out quite yet.

Monday, October 18, 2010

Algorithms and surprises

The other day, there was a question on StackOverflow that basically boiled down to, "Why is my O(n*n) algorithm running so slow?" What the individual was trying to do was find all of the elements in one array that exist in another array. It didn't take long for someone to post that the person should sort the lists and iterate over them once, but that got me thinking. Would it be faster to do a linear traversal, or would doing a binary search back and forth between the two lists be faster? It seems like it should be, since it's O(log(n)) to find the next term, instead of a potential O(n), but I wanted to make sure. The answer was surprising, to say the least. The O(n*n) algorithm was by far the slowest, but the fastest turned out to be the linear traversal.

Cumulative times (in ms) for 100 iterations of each of the approaches were (for input arrays of size 1,000, 2,000, 4,000, 8,000, 16,000, 32,000 & 64,000 elements respectively):
O(n*n): [52, 174, 727, 2698, 10648, 42516, 171441]
Linear: [43, 36, 74, 161, 332, 718, 1484]
Binary Search: [28, 41, 87, 182, 385, 825, 1727]

Actually looking at the data (all arrays populated by a call to Random.nextInt) revealed the answer. None of the binary searches were allowing the index for either array to jump by more than 2 or 3 entries at a time, meanwhile the linear approach wasn't examining more than 2 or 3 entries before finding the value (or a higher value to force a switch to the other list). The element being sought being so near the lower-end of the array made the binary search evaluate many more elements than the linear search.

The take-home from this test is that it's important to test your assumptions when it comes to performance. Intuitively, an algorithm with an expected runtime log(n)*log(n) should be much faster than one with an expected runtime of 2n, but actual data distribution may make it not so.