Comparator : how to implement it

Plan


This post is inspired by the Item 14 : Consider Implementing Comparable of Java Effective third edition. Feel free to jump to the item of this reference book for more details.
This post focuses on the Comparator interface but most of things presented here are also valid for the Comparable interface .

Model to illustrate

We will use a Cpu class that is defined by some (not exhaustive) known and relevant attributes.

public class Cpu {
 
    private float baseClockInGHz;
    private float boostClockInGHz;
    private short numberOfCore;
    private short numberOfThread;
 
    public Cpu(float baseClockInGHz, float boostClockInGHz, short numberOfCore, short numberOfThread) {
        this.baseClockInGHz = baseClockInGHz;
        this.boostClockInGHz = boostClockInGHz;
        this.numberOfCore = numberOfCore;
        this.numberOfThread = numberOfThread;
    }
 
    @Override
    public String toString() {
        return "Cpu{" +
                "baseClockInGHz=" + baseClockInGHz +
                ", boostClockInGHz=" + boostClockInGHz +
                ", numberOfCore=" + numberOfCore +
                ", numberOfThread=" + numberOfThread +
                '}';
    }
 
    public float getBaseClockInGHz() {
        return baseClockInGHz;
    }
 
    public float getBoostClockInGHz() {
        return boostClockInGHz;
    }
 
    public short getNumberOfCore() {
        return numberOfCore;
    }
 
    public short getNumberOfThread() {
        return numberOfThread;
    }
}

Way to never use

A Comparator implementation to really avoid :

import java.util.Comparator;
 
public class ComparatorCpuToNotUse implements Comparator<Cpu> {
 
    @Override
    public int compare(Cpu o1, Cpu o2) {
 
        int compareResult = (int) (o1.getBaseClockInGHz() - o2.getBaseClockInGHz());
 
        if (compareResult == 0) {
            compareResult = (int) (o1.getBoostClockInGHz() - o2.getBoostClockInGHz());
 
            if (compareResult == 0) {
                compareResult = o1.getNumberOfCore() - o2.getNumberOfCore();
 
                if (compareResult == 0) {
                    compareResult = o1.getNumberOfThread() - o2.getNumberOfThread();
                }
            }
 
        }
        return compareResult;
 
    }
}

We don’t have to rely on differences between numerical fields values as return result because : 
– integer and floating values may overflow (while it is much more commonly for integer values)
– exact comparisons should not be used to compare floating values. 
The other issue is readability : the pattern used produces an arrow code. Note that this arrow code is a pattern proposed by Joschua Bloch in Java effective (page 69). Personally, I try to avoid that to make the code clearer by reducing the number of nested statements such as :

    @Override
    public int compare(Cpu o1, Cpu o2) {
 
        int compareResult = (int) (o1.getBaseClockInGHz() - o2.getBaseClockInGHz());
 
        if (compareResult == 0) {
            compareResult = (int) (o1.getBoostClockInGHz() - o2.getBoostClockInGHz());
        }
        if (compareResult == 0) {
            compareResult = o1.getNumberOfCore() - o2.getNumberOfCore();
        }
        if (compareResult == 0) {
            compareResult = o1.getNumberOfThread() - o2.getNumberOfThread();
        }
 
        return compareResult;
    }

Flattening the conditional operator is extremely cheap in terms of overhead.  
Here is a benchmark code comparing the two ways :

import comparator.ComparatorCpuToNotUse;
import comparator.ComparatorCpuToNotUseSlightlyImproved;
import comparator.Cpu;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;
 
import java.util.List;
import java.util.concurrent.TimeUnit;
import java.util.stream.IntStream;
 
import static java.util.stream.Collectors.toList;
 
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class ConditionalOperatorBenchmark {
 
  public static void main(String[] args) throws RunnerException {
    Options opt = new OptionsBuilder().include(ConditionalOperatorBenchmark.class.getSimpleName())
                                      .warmupIterations(5)
                                      .measurementIterations(5)
                                      .forks(1)
                                      .build();
    new Runner(opt).run();
  }
 
  @State(Scope.Benchmark)
  public static class NestedConditionalOperatorState {
 
    public List<Cpu> cpus;
 
    @Setup(Level.Iteration)
    public void doSetup() {
      cpus = createCpus();
    }
 
  }
 
  @State(Scope.Benchmark)
  public static class FlattenedConditionalOperatorState {
 
    public List<Cpu> cpus;
 
    @Setup(Level.Iteration)
    public void doSetup() {
      cpus = createCpus();
    }
 
  }
 
  private static List<Cpu> createCpus() {
    List<Cpu> cpus =
        IntStream.range(0, 100).mapToObj(i -> new Cpu(2.4F, 2.8F, (short) 4, (short) 1)).collect(toList());
    return cpus;
  }
 
  @Benchmark
  public void _1_sort_with_nested_conditional_operator(NestedConditionalOperatorState state) {
    state.cpus.sort(new ComparatorCpuToNotUse());
  }
 
  @Benchmark
  public void _2_sort_with_flattened_conditional_operator(FlattenedConditionalOperatorState state) {
    state.cpus.sort(new ComparatorCpuToNotUseSlightlyImproved());
  }
 
}

Below the result of the benchmark execution for 8 iterations and you can see that we don’t have any sensitive difference :

Result "benchmark.ConditionalOperatorBenchmark._2_sort_with_flattened_conditional_operator":
  297,352 ±(99.9%) 4,272 ns/op [Average]
  (min, avg, max) = (295,675, 297,352, 302,384), stdev = 2,234
  CI (99.9%): [293,080, 301,624] (assumes normal distribution)
 
 
# Run complete. Total time: 00:03:20
 
REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.
 
Benchmark                                                                 Mode  Cnt    Score   Error  Units
ConditionalOperatorBenchmark._1_sort_with_nested_conditional_operator     avgt    8  297,982 ± 2,450  ns/op
ConditionalOperatorBenchmark._2_sort_with_flattened_conditional_operator  avgt    8  297,352 ± 4,272  ns/op

However, the flattening of conditional operators doesn’t solve all issues. Indeed, the numeric overflow potential problem is still there as well as the way to compare floating values that could produce inaccurate results. We handle these in the next parts of this post while for the second point, it is a broad and side subject, so we will handle it only in the final solution with Java 8.

Way acceptable only for Java 6 and below

The relational operators < and > are desirable before Java 7 since this is the only correct way to compare numeric values.  It prevents numerical overflow issues. 
Here is how the comparator should be implemented with such java version constraints :

import java.util.Comparator;
 
public class ComparatorForJava6AndBelow implements Comparator<Cpu> {
 
  @Override
  public int compare(Cpu o1, Cpu o2) {
 
    int compareResult = 0;
 
    if (o1.getBaseClockInGHz() < o2.getBaseClockInGHz()) {
      compareResult = -1;
    } else if (o1.getBaseClockInGHz() > o2.getBaseClockInGHz()) {
      compareResult = 1;
    }
 
    if (compareResult == 0) {
      if (o1.getBoostClockInGHz() < o2.getBoostClockInGHz()) {
        compareResult = -1;
      } else if (o1.getBoostClockInGHz() > o2.getBoostClockInGHz()) {
        compareResult = 1;
      }
    }
 
    if (compareResult == 0) {
      if (o1.getNumberOfCore() < o2.getNumberOfCore()) {
        compareResult = -1;
      } else if (o1.getNumberOfCore() > o2.getNumberOfCore()) {
        compareResult = 1;
      }
    }
 
    if (compareResult == 0) {
      if (o1.getNumberOfThread() < o2.getNumberOfThread()) {
        compareResult = -1;
      } else if (o1.getNumberOfThread() > o2.getNumberOfThread()) {
        compareResult = 1;
      }
    }
 
    return compareResult;
  }
}

It is terribly verbose and it has also much duplication. Finally, that is a good example of why we should not use legacy versions of Java or of any languages/frameworks. In our use case, compared to its Java 8 equivalent this comparator multiplies by at least 10 the code verbosity and it is also much more error-prone.   

Favor methods of built-in basic type Comparable implementations in Java 7

The Java 7 improves strongly comparator implementations for developers by providing an instance compareTo(Foo other) method and a static compare(Foo o1, Foo o2) for basic built in-types : String, int, double… These two methods have the same logic. Their difference is that the first one compares this to the passed parameter while the second one compares the passed parameters.  
We generally use compare(Foo o1, Foo o2) to compare primitives(and avoid the boxing operations) and compareTo(Foo other) to compare objects.

import java.util.Comparator;
 
public class ComparatorForJava7 implements Comparator<Cpu> {
 
  @Override
  public int compare(Cpu o1, Cpu o2) {
 
    int compareResult = Float.compare(o1.getBaseClockInGHz(), o2.getBaseClockInGHz());
 
    if (compareResult == 0) {
      compareResult = Float.compare(o1.getBoostClockInGHz(), o2.getBoostClockInGHz());
    }
    if (compareResult == 0) {
      compareResult = Short.compare(o1.getNumberOfCore(), o2.getNumberOfCore());
    }
    if (compareResult == 0) {
      compareResult = Short.compare(o1.getNumberOfThread(), o2.getNumberOfThread());
    }
 
    return compareResult;
  }
}

This approach is really nicer : less duplication, less verbose, less error prone.
The single remaining annoying thing is the getter invocation duplicate to compare each properties. Java 8 handles this point.

Favor Comparator construction methods in Java 8

Java 8 introduces construction methods in the Comparator interface which the new thing is the ability to create a Comparator instance from a Function functional interface where we extract from it the property to compare. These construction methods have specializations for primitive types to prevent boxing and unboxing operations that decrease the performance and these also have thenComparing() methods to produce a comparison on a property then another, and so for…
Here is how we could implement it :

import java.util.Comparator;
 
public class ComparatorForJava8 implements Comparator<Cpu> {
 
  private static final  Comparator<Cpu> cpuComparator =
      Comparator.comparingDouble(Cpu::getBaseClockInGHz)
                .thenComparingDouble(Cpu::getBoostClockInGHz)
                .thenComparingInt(Cpu::getNumberOfCore)
                .thenComparingInt(Cpu::getNumberOfThread);
 
  @Override
  public int compare(Cpu o1, Cpu o2) {
    return cpuComparator.compare(o1, o2);
  }
}

Create a comparator instance has a cost, so we don’t want to create a new instance in the compare() method itself because it would have a sensitive performance cost.

Ce contenu a été publié dans java, java 8. Vous pouvez le mettre en favoris avec ce permalien.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *