Exceptional iterators using Guava

Some years ago, I came across a little gem of a blog post about using a class called AbstractIterator that I hadn’t heard of before. I immediately loved it, and started using it, and in general, the APIs defined in the JDK collections classes a lot more frequently. I’m still doing that today, but for the first time, I came across a situation where AbstractIterator (nowadays in Guava, not Google Collections, of course) let me down.

In the system I’m currently working on, we need to process tens of thousands of feeds that are generated ‘a little randomly’, often by people without deep technical skills. This means we cannot know for sure that entries in the feeds will be consistent; we may be able to parse the first four entries, but not the fifth, and so on. The quantity of feeds, combined with the fact that we need to be forgiving of less-than-perfect input means that we need to be able to continue ingesting data even if a few entries are bad.

Ideally, we wanted to write code similar to this:


Iterator iterator = feedParser.iterator();

while (iterator.hasNext()) {
  try {
    FeedEntry entry = iterator.next();

    // ... process the entry
  }
  catch (BadEntryException e) {
    // .. track the error and continue processing
  }
}

Now, we also wanted to use AbstractIterator as the base class for the parser to use, maybe something like:


class FeedEntryIterator extends AbstractIterator<FeedEntry> {
  protected FeedEntry computeNext() {
     if (noMoreInput()) {
        return endOfData();
     }

     try {
        return parseInput();
     }
     catch (ParseException e) {
        throw new BadEntryException(e);
     }
  }
}

This, however, doesn’t work for two reasons:

  1. The BadEntryException will be thrown during the execution of hasNext(), since the AbstractIterator class calls the computeNext() method to check if any more data is available, storing the result in an intermediate member field until the next() method is called.
  2. If the computeNext() method throws an exception, the AbstractIterator is left in a state FAILED, from which it cannot recover.

One option to work around this that I thought of was delegation. Something like this works for many scenarios:


public class ForwardingTransformingIterator<F, T> implements Iterator<T> {
    private final Iterator<F> source;
    private final Function<F, T> transformer;

    public ForwardingThrowingIterator(Iterator<F> source, Function<V, T> transformer) {
        this.delegate = delegate;
        this.transformer = transformer;
    }

    @Override
    public boolean hasNext() {
        return delegate.hasNext();
    }

    @Override
    public T next() {
        return transformer.apply(delegate.next());
    }
}

The transformation function could then safely throw exceptions for entries it fails to parse. There’s even a shorthand for the above code in Guava: Iterators.transform(). The only issue is that the delegate iterator needs to return chunks of data that are the same size as what the parser/transform needs. So if the parser needs to be, for instance, something like a SAX parser, you’re in trouble. When I had gotten this far, I figured the only solution was to modify the AbstractIterator class so that it can deal with exceptions directly. The source is a little too long to include in an already source-heavy post, but there’s a gist here. The essence of the change is to make the base class store exceptions of a particular type that are thrown by computeNext(), and re-throw them instead of returning a value when next() is called. It kind of works (we’re using this implementation right now), but it would be nice to find a better solution.

I’ve created a Guava issue for this, so if you think this would be useful, go ahead and vote for it. And if you have suggestions for how to improve the code, feel free to make them here or at code.google.com!

, ,

  1. Leave a comment

Leave a comment