Saturday, October 23, 2010

Infinite sequences in Java

One of the interesting features in a lot of functional languages is lazily evaluated infinite sequences. Java lacks any explicit syntax for creating sequences, however the Iterator interface combined with generics lets us implement just such a feature, though it's obviously not as clean or succinct as in some languages.

As an example, here's an implementation of an "infinite" sequence of 64-bit integers.

package blog;

import java.util.Iterator;
import java.util.List;
import java.util.ArrayList;

public class Sequence
implements Iterable<long> {

private long value;

public Sequence(long start) {
value=start;
}

public Sequence() {
this(0L);
}

public Iterator<long> iterator() {
return new LazyIterator();
}

public List<long> take(int toTake) {
ArrayList<long> retVal =
new ArrayList<long>(toTake);
for(int i = 0; i < toTake; ++i) {

public boolean hasNext() {
/* Always return true, because this
sequence is "infinite" */
return true;
}

public Long next() {
// store where we are
long retVal = value;
// Move on to the next value
++value;
// auto-boxing will turn it into a Long
return retVal;
}

public void remove() {
throw new UnsupportedOperationException();
}
}
}
This technique is also useful for parsing documents or in other situations where generating the entire list would either take too long or too much memory.

Friday, October 22, 2010

The Fizz-Buzz Challenge!

One of my friends asks all interviewees to implement Fizz Buzz as part of a programming interview and, sadly, many people trying to pass themselves off as programmers can't implement it.

The basic idea is to print out the integers from 1-100, if the number is evenly divisible by 3 also print "Fizz", if it's evenly divisible by 5, also print "Buzz" and if it's evenly divisible by 15, print "FizzBuzz".

There are a ton of possible implementations, most of them painfully obvious and straightforward, so here's the challenge. 1) What's the shortest program that you can write to solve the problem and 2) what's the "fastest" (in terms of fewest number of operations, no sane solution should take more than a few milliseconds to execute and very nearly all of that is I/O).

My shortest in terms of lines of code was 5 lines of code (not counting curly-brace only lines), with 300 conditionals, 100 increments and 200 modulus operations. Fewest instructions had 6 conditionals (could have been none, but that would've cost an extra 75 lines of code) and 100 increments (no other operations), but at a cost of 27 lines of code.

Tuesday, October 19, 2010

More algorithms and surprises

Testing the intersection problem using a HashMap turned up some more interesting numbers. In spite of that being an O(m+n) algorithm (m to build the hash map, n to check for intersection) like the linear probing, it was still significantly faster than the linear probing, even with the time to sort the two arrays removed. The HashMap based intersection did have the disadvantage of taking about an extra 3% memory.

Another surprise came from the fact that sorting the two arrays was just a very minor part of the runtime for the linear intersection, even though it should be m*log(m) + n*log(n), as opposed to the intersection's m+n time. I suspect, though I'm not certain, that that is due to locality and caching, since the sort operates on one array at a time, so the processor is better able to pre-fetch the needed data. Either that or Josh Bloch, Neil Gafter & John Rose have magical programming powers, which I'm also not ready to rule out quite yet.

Monday, October 18, 2010

Algorithms and surprises

The other day, there was a question on StackOverflow that basically boiled down to, "Why is my O(n*n) algorithm running so slow?" What the individual was trying to do was find all of the elements in one array that exist in another array. It didn't take long for someone to post that the person should sort the lists and iterate over them once, but that got me thinking. Would it be faster to do a linear traversal, or would doing a binary search back and forth between the two lists be faster? It seems like it should be, since it's O(log(n)) to find the next term, instead of a potential O(n), but I wanted to make sure. The answer was surprising, to say the least. The O(n*n) algorithm was by far the slowest, but the fastest turned out to be the linear traversal.

Cumulative times (in ms) for 100 iterations of each of the approaches were (for input arrays of size 1,000, 2,000, 4,000, 8,000, 16,000, 32,000 & 64,000 elements respectively):
O(n*n): [52, 174, 727, 2698, 10648, 42516, 171441]
Linear: [43, 36, 74, 161, 332, 718, 1484]
Binary Search: [28, 41, 87, 182, 385, 825, 1727]

Actually looking at the data (all arrays populated by a call to Random.nextInt) revealed the answer. None of the binary searches were allowing the index for either array to jump by more than 2 or 3 entries at a time, meanwhile the linear approach wasn't examining more than 2 or 3 entries before finding the value (or a higher value to force a switch to the other list). The element being sought being so near the lower-end of the array made the binary search evaluate many more elements than the linear search.

The take-home from this test is that it's important to test your assumptions when it comes to performance. Intuitively, an algorithm with an expected runtime log(n)*log(n) should be much faster than one with an expected runtime of 2n, but actual data distribution may make it not so.

Saturday, August 28, 2010

More on The Myth of SLOC

Wow, this has been a LONG time coming, but between finishing up my MS in CS and work I've been a bit busy.

SLOC and other metrics, like coupling and complexity metrics are just proxies for the only "metric" that really matters. The best name that I've got for this metric is "clarity of intent". It's nothing Earth-shattering or particularly easy to measure, but it's what really matters. It all boils down to how clearly does the code that you've written express what you want the code to do?

Clarity of intent is why we name our variables things like "numOrders" instead of "doggie". A lot of languages fall short in this regard, in different ways. Java, for example, makes you repeat yourself a lot like in the declaration for an ArrayList; ArrayList al = new ArrayList()

Other languages (in particular, dynamically and "loosely" typed languages) tend to fail in the other direction, leaving you with declarations like
something = doSomething(someThingy);

In my opinion, this is one of the biggest failings of a lot of the dynamic languages like Javascript and (at least the last time I checked) Ruby. Functions and procedures take one or more "thingies" and there's no easy way to say, in code, "This thingy must have bark() and playDead() methods". Sure, you can say in comments, "Hey, if this thing can't bark and playDead, I don't want it", but there's no way to have the compiler check that for you.

A lot of languages, like Java, go too far in the other direction, though, and say "This thingy must extend/implement Doggie" when what you really want to say is "This thingy needs to have behavor similar to a Doggie, but if you happen to have a Rubber Duckie that behaves like a Doggie, that's cool too".

A good language will let you say very clearly what the prerequisites are for a method and makes it clear what you intend to do (with features like list comprehensions or map/filters instead of dealing with iterators for everything).

Two approaches to products

There are, or at least seem to be, two distinct approaches to developing products. The first is to develop products that people need, whether they know it or not. These products may not be pretty or particularly fun, but they are the things that you absolutely need, whether you know it or not. People almost never love the products produced by companies who follow this path, but they'd be lost without them. This is what Google does, and it's why some of their products change everything and others flop completely.

The other approach is to build beautiful products. The kind of things that people love and feel they absolutely HAVE to have. These products may be "game changers", but they rarely make any really major impact on the world (not counting the obvious commercial impact). Apple falls firmly into this category. They can produce a product that's laughably inferior, but it's so blasted pretty that people will rush and line up to spend massive amounts of money to buy it. It's as rare for products in this category to not be huge commercial successes as it is for the products to meaningfully change the world or anyone's life.