Java Map and thread-safety

Can we use no synchronized java.lang.Map instance fields in a multi-threaded context without explicit synchronization?

No synchronized implementations of the java.util.Map interface such as java.util.HashMap or java.util.TreeMap are fast implementations as these are not thread-safe.
In some use cases, the thread safety is clearly not a problem. For example when we create a map as a local variable of a method, we are assured that the map instance is accessed in a thread safe way as each thread has its own execution stack during a method invocation.
When the map is a field of the class, synchronization may be needed or at least we could wonder if it is.


Plan


You don’t need synchronization mechanisms if the Map is never modified once it is accessible to concurrent threads

How to do it ?
To perform it, you could have a class that contains a final Map field and initialize map data in the constructor. The final keyword is not mandatory to ensure the thread safety but it is a good practice to define as final the variables which should not be reassigned. In this way if a developer modified the class to reassign the map, the compilation error would warn him.
For example, the JDK factory method java.util.Collections.unmodifiableMap(Map map) follows this approach:

    public static <K,V> Map<K,V> unmodifiableMap(Map<? extends K, ? extends V> m) {
        return new UnmodifiableMap<>(m);
    }
 
    private static class UnmodifiableMap<K,V> implements Map<K,V>, Serializable {
        private static final long serialVersionUID = -1034234728574286014L;
 
        private final Map<? extends K, ? extends V> m;
 
        UnmodifiableMap(Map<? extends K, ? extends V> m) {
            if (m==null)
                throw new NullPointerException();
            this.m = m;
        }

Here, no synchronization is required when methods of the map are invoked as the map is immutable. It accepts reading operations but disallows writing operations:

 public V get(Object key) {
    return m.get(key);
 }
 
 public V put(K key, V value) {
   throw new UnsupportedOperationException();
  }

The UnmodifiableMap class is in fact a particular case since not only it contains a Map but it is also a Map instance as it implements the java.util.Map interface.
Generally, when you have a class containing a private Map field and that the class doesn’t need to expose the java.util.Map to the class clients, it is not desirable to make the class implement the java.util.Map interface.
Only one thing matters : no writing methods on the map have to be provided.
A basic class accessible to concurrent threads which declares a no modifiable map field could be :

import java.util.Map;
import java.util.Map;
 
public class MapConainer<K, V> {
 
    private final Map<? extends K, ? extends V> m;
 
    public MapConainer(Map<? extends K, ? extends V> m) {
	if (m == null)
	    throw new NullPointerException();
	this.m = m;
    }
 
    public V get(Object key) {
	return m.get(key);
    }
 
}

Hint

You should populate the map with all required data in the constructor of the class that holds the map to avoid synchronization if all these conditions are gathered :
– you know all  data of the map as soon as the instance of the class is created.
– your map is read-only after being populated.
– all or almost all data of the map are initialized enough fast in the application life. So you gain nothing to initialize them in a lazy way.
– loading all data in your map in an eager way will not cause serious latency problems.

You need synchronization mechanisms if the Map is modifiable once it is accessible to concurrent threads

Since JDK 5, there are multiple ways to do it.
When you use synchronization mechanisms, you have in general two problematic to handle: ensuring that the thread safety has no defect (sometimes the developer handles some cases but forgets others) and avoiding unnecessary contentions by locking a instance or a class more than required.  It is not always an easy task to perform.

Our example use case 

We will start from a simple use case to illustrate the synchronizing need .
Suppose you have OccurrenceCounter, a class which the main role is counting the occurrence of letters. The class has addLetterOccurence(), a public method that may be accessed concurrently by multiple threads.
The method updates the occurrence of encountered letters. It takes as parameter a String.
The parameter may contain zero or one alphabetical character. When it is found, we update the occurrence of the current letter in a Map instance field. 
Here is the initial version of the class that doesn’t take into consideration the thread safety concern:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import java.util.HashMap;
import java.util.Map;
 
public class OccurrenceCounter {
 
    private Map<Character, Integer> mapCountByLetter = new HashMap<>();
 
    public void addLetterOccurence(String word) {
        Character letter = retrieveCharacter(word);
        if (letter == null) {
            return;
        }
        Integer occurrence = mapCountByLetter.get(letter);
        if (occurrence == null) {
            occurrence = 0;
        }
        mapCountByLetter.put(letter, ++occurrence);
    }
 
    private Character retrieveCharacter(String word) {
 
        Matcher matcher = Pattern.compile("([A-Za-z])")
                                 .matcher(word);
        if (matcher.find()) {
            String group = matcher.group(1);
            return group.charAt(0);
        }
        return null;
    }
 
}

Without synchronization, concurrent threads may execute in a undesirable way some statements. Which may break the logic of the method or corrupt the map functioning.
We will describe a scenario where a race condition happens and explain why it does.
Suppose thread A and thread B,  two threads that call in a concurrent way addLetterOccurence() with a word parameter containing the same letter : the ‘a‘ letter.

The thread A is at this line 

 Integer occurrence = mapCountByLetter.get(letter);

and the method returns null because the letter has never been encountered.
The occurrence variable is assigned to null. The thread A is then paused.
The thread B is resumed and arrives at the same statement. At this time, the call to mapCountByLetter.get(letter) returns still null . Then it goes to the next statement. As the condition occurrence == null is true, occurrence is valued to 0:

  if (occurrence == null){
       occurrence = 0;
   }

The thread B continues its execution and associates in the map the value to the a letter .

 mapCountByLetter.put(letter, ++occurrence);

When the thread A is resumed, it follows the same path since it was paused after the value of occurrence was retrieved and consequently the occurrence variable has the 0 value .
So, it overwrites in the map the value associated to the letter  with the same value : 1.
Hey ! We have not incremented it. As result, instead of having 2 associated to the a letter , we have only 1.
The logic of the method is broken.

We can very easily check that.
Here is a sample code where 5 concurrently threads invoke addLetterOccurence() with the same argument : « 123A456 » on a same instance of OccurrenceCounter. We request to perform this operation 100 times

public static void main(String[] args) throws InterruptedException {
    OccurrenceCounter occurrenceCounter = new OccurrenceCounter();
 
    ExecutorService executorService = Executors.newFixedThreadPool(5);
 
    List<Callable<Void>> callables = IntStream.range(0, 100)
                                              .mapToObj(i -> {
                                                  Callable<Void> callable = () -> {
                                                      occurrenceCounter.addLetterOccurence("123A456");
                                                      return null;
                                                  };
                                                  return callable;
                                              })
                                              .collect(Collectors.toList());
 
    executorService.invokeAll(callables);
    Integer occurence = occurrenceCounter.mapCountByLetter.get('A');
    System.out.println("A occurence=" + occurence);
    executorService.shutdown();
}

We can execute this code multiple times and we will see that the wished result « A occurence=100 » is not displayed. The result is indeed unpredictable because of the threads concurrency. For example, these are the 3 outputs that I get as I run 3 times the program :

A occurence=95 
A occurence=93 
A occurence=73

The race condition being established and concretely demonstrated, we will now see how to handle it.
We will review 5 common ways to perform synchronization for a class that holds a private Map field and provides public methods that modify and read it :

Some of these usages are complementary, others are competing.

java.util.Collections.synchronizedMap(Map)

This factory method takes a Map as input and returns a synchronized map backed by the specified map. In order to guarantee serial access, it is critical that all access to the backing map is accomplished through the returned map.
That’s why, it is imperative that the user manually synchronizes on the returned map when iterating over any of its collection views.

Here is a code wrapping a HashMap in a SynchronizedMap by calling  Collections.synchronizedMap(Map) :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
 
public class OccurrenceCounter {
 
    private Map<Character, Integer> mapCountByLetter = Collections.synchronizedMap(new HashMap<>());
 
    public void addLetterOccurence(String word) {
        Character letter = retrieveCharacter(word);
        if (letter == null) {
            return;
        }
        Integer occurrence = mapCountByLetter.get(letter);
        if (occurrence == null) {
            occurrence = 0;
        }
        mapCountByLetter.put(letter, ++occurrence);
    }
 
    private Character retrieveCharacter(String word) {
 
        Matcher matcher = Pattern.compile("([A-Za-z])")
                                 .matcher(word);
        if (matcher.find()) {
            String group = matcher.group(1);
            return group.charAt(0);
        }
        return null;
    }
 
    public static void main(String[] args) throws InterruptedException {
        OccurrenceCounter occurrenceCounter = new OccurrenceCounter();
 
        ExecutorService executorService = Executors.newFixedThreadPool(5);
 
        List<Callable<Void>> callables = IntStream.range(0, 100)
                                                  .mapToObj(i -> {
                                                      Callable<Void> callable = () -> {
                                                          occurrenceCounter.addLetterOccurence("123A456");
                                                          return null;
                                                      };
                                                      return callable;
                                                  })
                                                  .collect(Collectors.toList());
 
        executorService.invokeAll(callables);
        Integer occurence = occurrenceCounter.mapCountByLetter.get('A');
        System.out.println("A occurence=" + occurence);
        executorService.shutdown();
    }
}

Is the code thread safe ?

No it is not. You can test it to check that.
The program execution is not able to output in a reproducible way : « A occurence=100 ». Instead, it still provides unpredictable results.
By using Collections.synchronizedMap(), the map is thread-safe for every method invoked on the mapCountByLetter map. Therefore, the get() and put() methods are thread-safe. But it is not enough.
If you refer to the explanation why addLetterOccurence() method was not thread-safe, it indicates that a race condition occurs (note that it is not the single case) when for a Thread A the occurrence variable is assigned to null and the thread A is paused. With the actual code, the race condition is always not handled. The  get() and put() methods are thread-safe but only them individually. We want all this logic to be thread safe :

Integer occurrence = mapCountByLetter.get(letter);
if (occurrence == null) {
	occurrence = 0;
}
mapCountByLetter.put(letter, ++occurrence);

Nevertheless as the map is thread safe in all its accesses it prevents from breaking the map functioning and rising a runtime exception.

There are of course use cases where Collections.synchronizedMap() is enough to ensure thread safety. It is the case when the map is used in methods where inside them we don’t have a race condition on a specific mapping of the map as the addLetterOccurence() method has. 
For example here is a code where threads concurrently put entries in the map with  distinct keys.

import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
 
public class OccurrenceCounter{
 
    private Map<Character, Integer> mapCountByLetter = Collections.synchronizedMap(new HashMap<>());
 
    public void addLetterOccurence(char c, int value) {
        mapCountByLetter.put(c, value);
    }
 
    public static void main(String[] args) throws InterruptedException {
        OccurrenceCounter occurrenceCounter = new OccurrenceCounter();
 
        ExecutorService executorService = Executors.newFixedThreadPool(5);
 
        List<Callable<Void>> callables = IntStream.range(0, 100)
                                                  .mapToObj(i -> {
                                                      Callable<Void> callable = () -> {
                                                          occurrenceCounter.addLetterOccurence((char) i, 1);
                                                          return null;
                                                      };
                                                      return callable;
                                                  })
                                                  .collect(Collectors.toList());
 
        executorService.invokeAll(callables);
        int mapSize = occurrenceCounter.mapCountByLetter.size();
        System.out.println("map size=" + mapSize);
        Set<Integer> values = occurrenceCounter.mapCountByLetter.values()
                                                                .stream()
                                                                .distinct()
                                                                .collect(Collectors.toSet());
        System.out.println("distinct values in the map=" + values);
        executorService.shutdown();
    }
}

And it is fine : we get  a reproducible output :

map size=100
distinct values in the map=[1]

Don’t think that the actual code will produce the same result without using Collections.synchronizedMap(). HashMap.put() being invoked concurrently and Map being not designed to be thread safe, concurrent invocations of HashMap on a same object may have side effects. It is particularly right for methods that changes the HashMap state.

Here is the code without without a HashMap wrapped in Collections.synchronizedMap() :

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
 
public class OccurrenceCounter {
 
    private Map<Character, Integer> mapCountByLetter = new HashMap<>();
 
    public void addLetterOccurence(char c, int value) {
        mapCountByLetter.put(c, value);
    }
 
    public static void main(String[] args) throws InterruptedException {
        OccurrenceCounter occurrenceCounter = new OccurrenceCounter();
 
        ExecutorService executorService = Executors.newFixedThreadPool(5);
 
        List<Callable<Void>> callables = IntStream.range(0, 100)
                                                  .mapToObj(i -> {
                                                      Callable<Void> callable = () -> {
                                                          occurrenceCounter.addLetterOccurence((char) i, 1);
                                                          return null;
                                                      };
                                                      return callable;
                                                  })
                                                  .collect(Collectors.toList());
 
        executorService.invokeAll(callables);
        int mapSize = occurrenceCounter.mapCountByLetter.size();
        System.out.println("map size=" + mapSize);
        Set<Integer> values = occurrenceCounter.mapCountByLetter.values()
                                                                .stream()
                                                                .distinct()
                                                                .collect(Collectors.toSet());
        System.out.println("distinct values in the map=" + values);
        executorService.shutdown();
    }
}

Note by invoking HashMap.put() with a different key at each time, the race condition happens less frequently than as in the original case where we used the same key. So, we should run the following program multiple times to notice distinct results.
It could produce at least two distinct results :

map size=100 
distinct values in the map=[1]
map size=99
distinct values in the map=[1]

Come back to our original OccurrenceCounter class presenting a race condition and we will see how to handle it.

The synchronized method

The Java synchronized keyword may apply on a method or a set of statements. We will discuss about the first way now. The next point handles the second way.
By convention, the synchronized word should be declared just after the access level modifier of the method.
For example :

public synchronized void addLetterOccurence(String word){
...
}

In our case, this guarantees that for the underlying instance, if a thread A is running in this method, (and even if the thread A is paused) no other threads can enter in the method while the thread A has not exited the method. This also guarantees another important thing but we don’t need to reference it for the moment.

By synchronizing the addLetterOccurence() method we ensure the thread safety of it since the condition race is handled.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
 
public class OccurrenceCounter {
 
    private Map<Character, Integer> mapCountByLetter = new HashMap<>();
 
    public synchronized void addLetterOccurence(String word) {
        Character letter = retrieveCharacter(word);
        if (letter == null) {
            return;
        }
        Integer occurrence = mapCountByLetter.get(letter);
        if (occurrence == null) {
            occurrence = 0;
        }
        mapCountByLetter.put(letter, ++occurrence);
    }
 
    private Character retrieveCharacter(String word) {
 
        Matcher matcher = Pattern.compile("([A-Za-z])")
                                 .matcher(word);
        if (matcher.find()) {
            String group = matcher.group(1);
            return group.charAt(0);
        }
        return null;
    }
 
   // IDENTICAL CODE FOR main(String[]) AS PREVIOUSLY
    public static void main(String[] args) throws InterruptedException {
         ...
    }
}

This implementation adresses the concurrency problem : the program will always produces the same output :

A occurence=100

But it also presents a drawback : it performs more synchronization than required. Indeed, the  retrieveCharacter()  call  doesn’t need to be synchronized as it doesn’t manipulate the map.

The synchronized statements

Unlike synchronized methods, synchronized statements must specify the object or the class on which the lock should be applied.
Synchronized statements allow to define in a finer way the statements to synchronize than a synchronized method.
It is more effective because contrary to the synchronized method, if some statements don’t need to be synchronized because these are already naturally thread-safe, you can ignore it and apply synchronization only on  statements which require it.
If we take back our use case, we could do that : 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
 
public class OccurrenceCounter {
 
    private Map<Character, Integer> mapCountByLetter = new HashMap<>();
 
    public void addLetterOccurence(String word) {
        Character letter = retrieveCharacter(word);
 
        if (letter == null) {
            return;
        }
 
        synchronized (this) {
            Integer occurrence = mapCountByLetter.get(letter);
            if (occurrence == null) {
                occurrence = 0;
            }
            mapCountByLetter.put(letter, ++occurrence);
        }
    }
 
    private Character retrieveCharacter(String word) {
 
        Matcher matcher = Pattern.compile("([A-Za-z])")
                                 .matcher(word);
        if (matcher.find()) {
            String group = matcher.group(1);
            return group.charAt(0);
        }
        return null;
    }
 
    // IDENTICAL CODE FOR main(String[]) AS PREVIOUSLY
    public static void main(String[] args) throws InterruptedException {
         ...
    }
}

You can notice we have used a synchronized statement by locking the current instance : this. It is not the single option.
In this example, locking on this or on the mapCountByLetter field doesn’t change many things.

This solution is more efficient than the previous one with synchronization on the whole method as it limits the time of the synchronization.

Adding a new method that performs writing on the map

In the actual use case we have defined a single entry point that accesses to the map : the addLetterOccurrence() method. Things are simple :  as we have a single entry point, we know that filtering its access with a synchronization on it is enough.
In the real life, classes provides very often multiple methods using a field Map.

Suppose now we want to provide a new method to the OccurrenceCounter class. The method has as parameter a list of alphabetical characters and it has to increment the occurrence of them.
We will overload the existing addLetterOccurence() method.
Here is the OccurrenceCounter class with the new method :

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
 
public class OccurrenceCounter {
 
    private Map<Character, Integer> mapCountByLetter = new HashMap<>();
 
    public void addLetterOccurence(String word) {
        Character letter = retrieveCharacter(word);
        if (letter == null) {
            return;
        }
        synchronized (this) {
            Integer occurrence = mapCountByLetter.get(letter);
            if (occurrence == null) {
                occurrence = 0;
            }
            mapCountByLetter.put(letter, ++occurrence);
        }
    }
 
    public void addLetterOccurence(List<Character> characters) {
        for (Character letter : characters) {
            Integer occurrence = mapCountByLetter.get(letter);
            if (occurrence == null) {
                occurrence = 0;
            }
            mapCountByLetter.put(letter, ++occurrence);
        }
 
    }
 
    private Character retrieveCharacter(String word) {
 
        Matcher matcher = Pattern.compile("([A-Za-z])")
                                 .matcher(word);
        if (matcher.find()) {
            String group = matcher.group(1);
            return group.charAt(0);
        }
        return null;
    }
 
}

For now the class doesn’t ensure any longer the consistency of the map as two methods modify the map but only one method uses synchronization.
As previously, concurrent invocations of the new addLetterOccurence() method may create the same issue as we noticed with the original method.
Here is a updated version of the class that handles the synchronization in both methods :
Invoking concurrently the first one or the second one method will be performed in a thread-safety way.

With synchronization in both addLetterOccurence()methods, multiple threads may indeed call in a concurrent way the same method without breaking the code logic or corrupt the map.

But are we thread safe when a thread calls the first thread-safe method and that another thread calls the second one thread safe method in a concurrent way?

Suppose that thread A and thread B are concurrent threads and they share the same instance of OccurrenceCounter.

The thread A  performs this code :

1
counter.addLetterOccurence("a");

and the thread B performs this code :

1
counter.addLetterOccurence(Arrays.asList("a"));

Is it thread-safe ?

Yes because it is not possible for two invocations of synchronized methods on the same object to interleave. When a thread is executing a synchronized method or synchronized statements on a object, all other threads that invoke a synchronized method or synchronized statements on the same object block until the first thread is done with the object.

We will modify our program to check that.
Half the time we will invoke the first one method and the other half the time we will invoke the second one method.
But the ‘A’ character is passed in any case. So we expect to have still the 100 value for the ‘A’ key.

public static void main(String[] args) throws InterruptedException {
    OccurrenceCounter occurrenceCounter = new OccurrenceCounter();
 
    ExecutorService executorService = Executors.newFixedThreadPool(5);
 
    List<Callable<Void>> callables = IntStream.range(0, 100)
                                              .mapToObj(i -> {
                                                  Callable<Void> callable = () -> {
                                                      if (i % 2 == 0) {
                                                          occurrenceCounter.addLetterOccurence("123A456");
                                                      }
                                                      else {
                                                          occurrenceCounter.addLetterOccurence(Arrays.asList('A', 'B'));
                                                      }
                                                      return null;
                                                  };
                                                  return callable;
                                              })
                                              .collect(Collectors.toList());
 
    executorService.invokeAll(callables);
    Integer occurence = occurrenceCounter.mapCountByLetter.get('A');
    System.out.println("A occurence=" + occurence);
    executorService.shutdown();
}

We can see that the race condition is correctly handled.
We get repeatedly :

A occurence=100

Thread safety for methods performing reading 

Now, suppose we want to add methods that perform reading only operations on the map :

– getOccurence() to retrieve the occurrence for a letter. The method takes as parameter a character and returns the occurrence of the letter.
writeOccurences() to render in a writer the occurrence of each encountered letter so far. The method takes as parameter a PrintWriter and returns nothing.

Is the synchronization required ?

It is and you should always synchronize even the reading accesses of the map if this map can be modified in a concurrent way.
You must not forget that a HashMap is not thread safe. So, getting an object from the map by a thread  while another one modifies it may create different issues such as corrupting the state of the HashMap, throwing a runtime exception or at best returning an inaccurate result. The infinite loop is also a bug known when get() and put() are used with HashMap without synchronization.
Besides  writeOccurences() should probably iterate over the map. If during iteration by a thread, another thread modifies the backed map, it is likely to corrupt the map and to rise an exception.

Here is the code with all methods ensuring thread safety. We can see that getOccurence() synchronizes just enough to avoid map dysfunctioning. The returned occurrence could have a slight delay in some cases, but in our use case it doesn’t matter. 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
import java.io.PrintWriter;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class OccurrenceCounter {
 
    private Map<Character, Integer> mapCountByLetter = new HashMap<>();
 
    public void addLetterOccurence(String word) {
        Character letter = retrieveCharacter(word);
        if (letter == null) {
            return;
        }
        synchronized (this) {
            Integer occurrence = mapCountByLetter.get(letter);
            if (occurrence == null) {
                occurrence = 0;
            }
            mapCountByLetter.put(letter, ++occurrence);
        }
    }
 
    public void addLetterOccurence(List<Character> characters) {
        for (Character letter : characters) {
            synchronized (this) {
                Integer occurrence = mapCountByLetter.get(letter);
                if (occurrence == null) {
                    occurrence = 0;
                }
                mapCountByLetter.put(letter, ++occurrence);
            }
        }
    }
 
    public int getOccurence(Character letter) {
        Integer occurrence = null;
        synchronized (this) {
            occurrence = mapCountByLetter.get(letter);
        }
        if (occurrence == null) {
            occurrence = 0;
        }
        return occurrence;
    }
 
    public void writeOccurences(PrintWriter writer) {
        synchronized (this) {
            for (Entry<Character, Integer> entry : mapCountByLetter.entrySet()) {
                writer.println(entry.getKey() + ":" + entry.getValue());
            }
        }
    }
 
    private Character retrieveCharacter(String word) {
 
        Matcher matcher = Pattern.compile("([A-Za-z])")
                                 .matcher(word);
        if (matcher.find()) {
            String group = matcher.group(1);
            return group.charAt(0);
        }
        return null;
    }
}

So is it an acceptable solution ?

We start by getting a lot of synchronization. So far we have four methods manipulating the map and all of them use synchronization on the current instance. 
If we have new methods which manipulate (reading or writing) the map, we are bound to add new synchronizations.
But synchronized methods and synchronized statements  have a cost, especially if these are called by many threads within a very short time.

The java.util.HashTable is an example of class representing a map structure (key-value) where all methods are synchronized. If an instance of it has highly-concurrent accesses, it may be very penalizing in terms of performance.
From the java.util.HashTable javadoc :

Unlike the new collection implementations, Hashtable is synchronized. If a thread-safe implementation is not needed, it is recommended to use HashMap in place of Hashtable. If a thread-safe highly-concurrent implementation is desired, then it is recommended to use java.util.concurrent.ConcurrentHashMap in place of Hashtable.

The root cause of the locking time problem comes from that accessing a key-value pair for a thread locks all key-value pairs for any other threads.

In the synchronized statements we have exactly the same problem.
We lock the whole map by locking the current instance while we need to read or put a single key-value.
The presented solutions sound as not efficient as the lock put during synchronization operates on much more data than needed.
java.util.concurrent.ConcurrentHashMap was introduced in JDK 5 to address this limitation. 

java.util.concurrent.ConcurrentHashMap

In performance terms, it is  very near from HashMap which is not thread-safe and in functional terms it acts as a HashTable since it is thread-safe and doesn’t accept null value.
Reading operations are cheap and not blocking thanks to a lock performed only on related data.
Writing operations are much less expensive than those of HashTable but are a little more expensive than those of HashMap because of the partial lock.
The advantage of this : the speed. Benchmarks confirm that.
The drawback of this : some limitations. The main drawback is that data may be stale.

From Javadoc, you can read :
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove).
Retrievals reflect the results of the most recently completed update operations holding upon their onset. For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time.

The ConcurrentHashmap cannot corrupt the map and will not throw unexpected exceptions related to this when we call get(), put() or iterate over it.
So, we are not compelled to synchronize the map when we do a get() from the map if we deem that a not updated reading is acceptable.
As explained early, in our use case, the reading only methods may return map data with a slight delay in the accuracy. A only thing is really important : the occurrences contained in the map must always reflect the correct values.
To achieve that,  we will use synchronization only when we need to modify the map in order to keep it consistent but we will not use synchronization when we do reading-only operations.

Here is the new version of OccurrenceCounter by using ConcurrentHashmap.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
import java.io.PrintWriter;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;
import java.util.concurrent.ConcurrentHashMap;
 
public class OccurrenceCounter {
 
	private Map<Character, Integer> mapCountByLetter = new ConcurrentHashMap<>();
 
	public void addLetterOccurence(String word) {
		Character letter = retrieveCharacter(word);
		if (letter == null) {
			return;
		}
		synchronized (this) {
			Integer occurrence = mapCountByLetter.get(letter);
			if (occurrence == null) {
				occurrence = 0;
			}
			mapCountByLetter.put(letter, ++occurrence);
		}
	}
 
	public void addLetterOccurence(List<Character> characters) {
		for (Character letter : characters) {
			synchronized (this) {
				Integer occurrence = mapCountByLetter.get(letter);
				if (occurrence == null) {
					occurrence = 0;
				}
				mapCountByLetter.put(letter, ++occurrence);
			}
		}
	}
 
	public int getOccurence(Character letter) {
		Integer occurrence = mapCountByLetter.get(letter);
		if (occurrence == null) {
			occurrence = 0;
		}
		return occurrence;
	}
 
	public void writeOccurences(PrintWriter writer) {
		for (Entry<Character, Integer> entry : mapCountByLetter.entrySet()) {
			writer.println(entry.getKey() + ":" + entry.getValue());
		}
	}
 
	private Character retrieveCharacter(String word) {
		Character c = null;
		// do some logic
		return c;
	}
}


The interesting part is that all map operations are much faster.
mapCountByLetter.get() is not blocking 
mapCountByLetter.put() is very few blocking
– iterate over mapCountByLetter.entrySet() is not blocking and doesn’t throw any exception

Here is a sample program that invokes all thread safe methods concurrently :

private static char CHAR = 'b';
 
// IDENTICAL CODE FOR main(String[]) AS PREVIOUSLY
public static void main(String[] args) throws InterruptedException {
    OccurrenceCounter occurrenceCounter = new OccurrenceCounter();
 
    ExecutorService executorService = Executors.newFixedThreadPool(5);
 
    List<Callable<Void>> callablesAdding = IntStream.range(0, 100)
                                                    .mapToObj(i -> {
                                                        Callable<Void> callable = () -> {
                                                            if (i % 2 == 0) {
                                                                occurrenceCounter.addLetterOccurence("123A456");
                                                            }
                                                            else {
                                                                occurrenceCounter.addLetterOccurence(Arrays.asList('A', getNextLetter()));
                                                            }
                                                            return null;
                                                        };
                                                        return callable;
                                                    })
                                                    .collect(Collectors.toList());
 
    List<Callable<Void>> callablesReading = IntStream.range(0, 100)
                                                     .mapToObj(i -> {
                                                         Callable<Void> callable = () -> {
 
                                                             if (i % 2 == 0) {
                                                                 System.out.println("getOccurence()=" + occurrenceCounter.getOccurence('A'));
                                                             }
                                                             else {
                                                                 ByteArrayOutputStream bos = new ByteArrayOutputStream();
                                                                 PrintWriter printWriter = new PrintWriter(bos);
                                                                 occurrenceCounter.writeOccurences(printWriter);
                                                             }
 
                                                             return null;
                                                         };
                                                         return callable;
                                                     })
                                                     .collect(Collectors.toList());
 
    final List<Callable<Void>> allCallables = new ArrayList<>(callablesReading);
    allCallables.addAll(callablesAdding);
    Collections.shuffle(allCallables);
 
    executorService.invokeAll(allCallables);
    Integer occurence = occurrenceCounter.mapCountByLetter.get('A');
    System.out.println("A occurence=" + occurence);
    executorService.shutdown();
}
 
private static char getNextLetter() {
    return (char) (CHAR ++);
}

We could see that constantly, no one exception is thrown and we get as final output :

A occurence=100

The thread safety is ensured. But what about performance compared to the previous solution using a HashMap with many more synchronized statements ?
It is really faster !
If you are interested to HashMap versus ConcurrentHashMap microbenchmark, you can read this article. You must be aware of ConcurrentHashMap limitations before using it.
But ConcurrentHashMap limitations are rarely a real problem because as in our last example, nothing prevents you from using synchronized statements or synchronized methods with ConcurrentHashMap to handle missing synchronizations.

Besides, ConcurrentHashMap provides specific methods to handle still more efficiently some concurrency situations. In our case, these may be applied.

To do…

Synchronized statements on an instance field rather than on this

Suppose in a class we have some methods accessing to the map with race conditions and we have other methods that don’t access to the map and have also race conditions but for other reasons.
Using synchronized methods or synchronized statements on this don’t allow to make this distinction since in both ways, the thread executing in the synchronized space owns the intrinsic lock of the object referenced by this and so all threads are blocked by waiting for the lock. 
A way of bypassing the limitation is using synchronized statements on an instance field of the class instead of synchronizing on it (this).
Of course,  you have to use as many instance fields to handle the synchronization as groups of synchronized methods that work together.
The instance field may be the map itself or a marker field. It doesn’t matter.

Ce contenu a été publié dans java, performance en JAVA, thread. Vous pouvez le mettre en favoris avec ce permalien.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *