Java 8 collecting inside a FlatMap : some limitations

Stream.map() and Stream.flatMap() methods are very close methods as discussed here.
Here we will discuss about Stream.flatMap() limitations .

We can broadly identify two usages of Stream.flatMap():
– flattening a nested stream within the initial stream to perform the main collect operation on the initial stream. That is the ideal case.
– flattening a nested stream that performs the main collect while in the initial stream we perform just a minor collect from the collect provided by the nested stream .
That is a more delicate case.

Model to illustrate :

Suppose we have a List of Employee and a List of Country.
Here is how we represent them :

public class Employee {
 
    private long id;
    private final String name;
    private final String lastName;
    private final LocalDate birthDate;
    private final String codeCountry;
    // constructor + getters
}
public class Country {
 
    private final String code;
    private final String label;
    // constructor + getters
}

The country associated to an employee may be retrieved via the mapping Employee.getCodeCountry()==Country.getCode().

Now the question :
How  to write a method that returns a Map with the countries as keys and the list of employees associated to as values ?

With an imperative style, we could achieve that with a loop nested in another : the outer one iterates on the countries and the nested one iterates on the employees.
In the nested loop, for each employee, if its codeCountry matches with the code of the current iterated country, we put in the map the country as key and the employee (accumulated in a List) as value. 

It could be done in this way :

public Map<Country, List<Employee>> findEmployeesByCountryWithLoop() {
        Map<Country, List<Employee>> map = new HashMap<>();
        for (Country country : countries) {
            for (Employee emp : employees) {
                if (emp.getCodeCountry()
                       .equals(country.getCode())) {
                    map.computeIfAbsent(country, c -> new ArrayList<>())
                       .add(emp);
                }
            }
        }
        return map;
}

The code is still straight readable but the level of nesting is rather annoying :

for (...) {
    for (...) {
        if (...) {
            ....
        }
     }
}

Could the Java 8′ functional style provide a cleaner code ?   

1) groupingBy() in the nested stream

As first option,  try to perform the main collect in the nested stream.  
In so far as we need both  the current streamed country (outer level) and the nested stream of employees to perform the groupingBy() collect, we could find this way of doing rather natural.
Here is this version :

public Map<Country, List<Employee>> findEmployeesByCountryWithGroupingByInFlatMap() {
	Map<Country, List<Employee>> map =
		countries.stream()
			 .flatMap(c -> employees.stream()
			  		        .filter(e -> e.getIdCodeCountry() == c.getCode())
						.collect(groupingBy(e -> c))
						.entrySet()
					        .stream()
			 )
			 .collect(toMap(Map.Entry::getKey, Map.Entry::getValue));
	return map;
}

The general idea is good : groupingBy() does very well its job. But in terms of  complete implementation, that is very verbose.
Indeed in order to flatten the collected Map in the nested stream, we need to stream it again because Stream.flatMap() expects a Stream and nothing else.
It appears redundant in terms of code and processing.
Besides streaming a map is not straight : Map.entrySet().stream().
Concerning the final collect performed by the initial stream, it is not better. 
We undo what we did in the nested processing (streaming the collected map) and that is verbose as well :

.collect(toMap(Map.Entry::getKey, Map.Entry::getValue));

2) Tuples in the nested stream

The idea is to map the nested stream into a tuple of two elements of all matching couples Country-Employee. In this way, the initial stream can perform the groupingBy() collect from the stream of tuples.

public Map<Country, List<Employee>> findEmployeesByCountryWithTuple() {
    Map<Country, List<Employee>> map =
            countries.stream()
                     .flatMap(c -> employees.stream()
                                            .filter(e -> e.getIdCodeCountry() == c.getCode())
                                            .map(e -> new AbstractMap.SimpleImmutableEntry<>(c, e))
                     )
                     .collect(groupingBy(AbstractMap.SimpleImmutableEntry::getKey,
                                         mapping(AbstractMap.SimpleImmutableEntry::getValue, toList())));
 
    return map;
}

It sound better than the previous option. We don’t perform as much as boiler plate code.  
But two things are rather annoying : contrary to other languages such as Scala or Python, Java doesn’t have a simple/straight API to create tuples. 
As workaround we could use AbstractMap.SimpleImmutableEntry. But that is verbose. Typing it is and the way to instantiate or to get values of the key/value is as well.
Consequently, the mapping processing from country-employee to SimpleImmutableEntry is verbose

.map(e -> new AbstractMap.SimpleImmutableEntry<>(c, e))

And the collect is also verbose and requires in addition an intermediary mapping() collector to unwrap the employees :

.collect(groupingBy(AbstractMap.SimpleImmutableEntry::getKey,
          mapping(AbstractMap.SimpleImmutableEntry::getValue, toList())));

While we could write the latter with a lambda to be shorter :

.collect(groupingBy(e-> e.getKey(),
         mapping(e -> e.getValue(), toList())));

3) Doing things in two streams : maybe the best solution for Java 8 or below

The idea is rather simple : avoiding having a nested stream.
To do that, we could first collect into Map<String, Country> that associates code country to Country objects. In this way, we don’t need to stream the List of Country to perform the mapping between Employee.getCodeCountry() and Country.getCode().

Here is the code :

public Map<Country, List<Employee>> findEmployeesByCountryInTwoStreams() {
    Map<String, Country> countryCodeByCountry =
            countries.stream()
                     .collect(toMap(Country::getCode, c -> c));
 
    Map<Country, List<Employee>> map = employees.stream()
                                                .collect(groupingBy(
                                                        e -> countryCodeByCountry.get(e.getIdCodeCountry())));
    return map;
}

While we are constraint to create two Maps to perform the task, the resulting code is both very readable and concise enough.  
But is it really an issue to do things in two steps ? 
No because Java 8 has some constraints concerning stream collects and so we should adapt our code structure according to them.

Ideally we would like writing something straighter. Why cannot we perform groupingBy() on the Country stream where we would specify as downstream Collector a mapping() Collector that would retrieve employees matching with the code country that we would store in a List ?
For example something like that (beware : this doesn’t work as expected in Java 8)

public Map<Country, List<Employee>> findEmployeesByCountryInTwoStreamsGroupBy_BUT_NOT_COMPILE() {
 
    Map<Country, List<Stream<Employee>>> map =
            countries.stream()
                     .collect(groupingBy(c -> c,
                                         mapping(c -> employees.stream()
                                                               .filter(e -> e.getIdCodeCountry() == c.getCode())
                                                 , toList()
                                         ))
                     );
    // We are stuck
}

It doesn’t work as expected because countries.stream().collect returns : Map<Country, List<Stream<Employee>>> while we need Map<Country, List<Employee>> as return type.
We get a stream because in the downstream collector of groupingBy(), we use mapping() with a function that returns a stream of employee. So we collect them in a List in the downstream collector of mapping() (here toList()). Now if we didn’t return a stream in the mapping() collector, would it work ?
It could be close of the expected result in terms of collected object but it would be still ugly and verbose. We could for example collect the stream in a List and then perform an artificial reducing operation as downstream collector of mapping().

public Map<Country, List<Employee>> findEmployeesByCountryInTwoStreamsGroupBy_COMPILE_BUT_UGH() {
 
    Map<Country, List<Employee>> map =
            countries.stream()
                     .collect(groupingBy(c -> c,
                                         mapping(c -> employees.stream()
                                                               .filter(e -> e.getIdCodeCountry() == c.getCode())
                                                               .collect(toList()),
 
                                                 collectingAndThen(
                                                         reducing((l1, l2) -> {
                                                             return l1;
                                                         }), Optional::get)
                                         ))
                     );
    return map;
}

This gives almost the expected result. We get « just in addition » as key the countries with no matching Employee.

We could easily remove it but it still requires to re-stream the Map. What a hell…
So definitively, keep the way with two streams if you fall into this kind of requirement for Java 8 and below.

4) Java 9 way : flatMapping() collector

But if you can use Java 9 or above, an elegant stream solution is now possible.
Instead of using the mapping() collector that doesn’t suit for cases where we return a stream in the mapping function, you have now the flatMapping() collector. This method expects a mapping function  that returns a stream and perform as collect according to the downstream collector that is the second argument passed.
Contrary to mapping() that applies the collector only as a container.
This would give :

public Map<Country, List<Employee>> findEmployeesByCountryWithFlatMappingJava11() {
 
    Map<Country, List<Employee>> map =
            countries.stream()
                     .collect(groupingBy(c -> c, flatMapping(c -> employees.stream()
                                                                           .filter(e -> e.getIdCodeCountry() == c.getCode()),
                                                             toList())));
    return map;
}

That is really nicer.

Ce contenu a été publié dans java, java 8, java 9, Non classé. Vous pouvez le mettre en favoris avec ce permalien.

Laisser un commentaire

Votre adresse de messagerie ne sera pas publiée. Les champs obligatoires sont indiqués avec *