The Java Stream API
The Stream API was introduced in Java 8 and it is part of the java.util.stream package. Stream API provides the functional type of capability to process the collection objects and gives readability on the operations performed on the objects. Stream API is specifically useful when filtration, transformation, sorting type of operations to be performed on individual objects of the list.
Java Stream API is not related to the Java InputStream, Java OutputStream, and Java IO. InputStream and OutputStream are related to the byte stream whereas Java Stream API is related to the stream of objects.
The classes Stream, IntStream, LongStream, and DoubleStream are streams over objects and the primitive int, long and double types. Streams differ from collections in several ways:
- No storage: A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.
- Functional in nature: An operation on a stream produces a result but does not modify its source. For example, filtering a Stream obtained from a collection produces a new Stream without the filtered elements, rather than removing elements from the source collection.
- Laziness-seeking: Many stream operations, such as filtering, mapping, or duplicate removal, can be implemented lazily, exposing opportunities for optimization. For example, “find the first String with three consecutive vowels” need not examine all the input strings. Stream operations are divided into intermediate (Stream-producing) operations and terminal (value- or side-effect-producing) operations. Intermediate operations are always lazy.
- Possibly unbounded: While collections have a finite size, streams need not. Short-circuiting operations such as limit(n) or findFirst() can allow computations on infinite streams to complete in finite time.
- Consumable: The elements of a stream are only visited once during the life of a stream. Like an Iterator, a new stream must be generated to revisit the same elements of the source.
Important terms in Stream API
- Source: Stream takes Collections, Arrays, or I/O resources as the input source.
- Non-Terminal/Intermediate/Aggregate Operations: Intermediate operation on the stream of elements like Map, filter, distinct, sorted, limit, skip
- Terminal Operations: Operation performed on the stream elements viz. forEach, toArray, reduce, collect, min, max, count, anyMatch, allMatch, noneMatch, findFirst, findAny. This will produce Primitive or Object datatype as a response.
- Pipeline: Sequence of Intermediate operations followed by a terminal operation.
Without any further ado let’s jump into the practical examples.
In order to understand how the stream makes things easier compared to the normal iterative approach, let’s start with a simple example to filter out all the odd numbers from an array of size N.
package com.rk.stream; import java.util.ArrayList; import java.util.Date; import java.util.Iterator; import java.util.List; import java.util.stream.Collectors; import java.util.stream.IntStream; public class StreamExample { public static void findEvenNumberByItr(List<Integer> integerList){ Iterator<Integer> itr = integerList.iterator(); while(itr.hasNext()){ Integer tmp = itr.next(); if(tmp%2 ==1 ) itr.remove(); } } public static List findEvenNumberByStream(List<Integer> integerList){ List<Integer> eventNumberList = integerList.stream().filter(x -> x%2 ==1 ).collect(Collectors.toList()); return eventNumberList; } public static List findEvenNumberByForLoop(List<Integer> integerList){ int size = integerList.size(); List<Integer> evenNumberlist = new ArrayList<>(); for(int index=0;index<size;index++){ if(integerList.get(index)%2 != 1) evenNumberlist.add(integerList.get(index)); } return evenNumberlist; } public static void main(String args[]){ int N = 1000000; List<Integer> numberList = IntStream.range(0,N).boxed().collect(Collectors.toList()); long startTimeStream = new Date().getTime(); List<Integer> evenNumberList = findEvenNumberByStream(numberList); long endTimeStream = new Date().getTime(); System.out.println("Using Stream time to complete the operations in seconds : "+ (endTimeStream-startTimeStream)/1000 ); long startTimeForForLoop = new Date().getTime(); evenNumberList = findEvenNumberByForLoop(numberList); long endTimeForLoop = new Date().getTime(); System.out.println("Using For Loop that returns new filtered list, time to complete the operations in seconds : "+ (endTimeForLoop-startTimeForForLoop)/1000 ); long startTimeForItr = new Date().getTime(); findEvenNumberByItr(numberList); long endTimeForItr = new Date().getTime(); System.out.println("Using iterator to modify the same list, time to complete the operations in seconds : "+ (endTimeForItr-startTimeForItr)/1000 ); } }
Using Stream time to complete the operations in seconds : 0 Using For Loop that returns new filtered list, time to complete the operations in seconds : 0 Using iterator to modify the same list, time to complete the operations in seconds : 115
Did you see that ?. findEvenNumberByItr() that modify List object by removing the odd integer took almost 2 minutes to complete and on the other hand, functions findEvenNumberByStream() & findEvenNumberByForLoop() that return the new filtered list completed in no time. It took more time in the case of findEvenNumberByItr() because when remove() operation is performed, it creates a space, list then rearranges itself to fill the empty place so this operation takes time.
So the next question would be if there’s no space constrain ( more likely you will never have space constrain 😉 ) and if I can return a new Collection object from the original Collection object using while/do-while/for loop, what’s the need of Stream API? Wait did I tell stream has parallelStream() function that makes use of available thread pool, process each element independently. Said that, will you write while/do-while/for loop to process individual objects of the collection and implement the runnable task on the class that will operate parallelly? Looks complicated right?
To demonstrate this let us take an example, there are N number of students whose information needs to be written into files. The filename would be studentName.txt.
package com.rk.stream; import lombok.AllArgsConstructor; import lombok.Getter; import lombok.NoArgsConstructor; import lombok.Setter; import java.io.*; import java.util.ArrayList; import java.util.Date; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; @NoArgsConstructor @Getter @AllArgsConstructor class Student implements Serializable { String name; int Id; String email; Long contact; } @AllArgsConstructor @Getter @Setter public class StudentInfoWrite implements Runnable{ Student student; String dir; @Override public void run() { save(student,dir); } public static void main(String[] args){ int numOfEmp=50000; ArrayList<Student> studentArrayList=new ArrayList<>(); Long contactNum=9000000000L; for(int i=1;i<=numOfEmp;i++){ Student s=new Student("Name"+i,i,"dummymail"+i+"@xyz.com",contactNum+i); studentArrayList.add(s); } long startTimeForStreamBased = new Date().getTime(); // Stream based Approach studentArrayList.parallelStream().forEach(x->{save(x,"./streamBased/");}); long endTimeForStreamBased = new Date().getTime(); System.out.println("Stream based time took in seconds :"+(endTimeForStreamBased-startTimeForStreamBased)/1000); long startTimeThreadBased = new Date().getTime(); // Thread based Approach ExecutorService executorService=Executors.newWorkStealingPool(); for(int i=0;i < numOfEmp ;i++){ StudentInfoWrite studentRun=new StudentInfoWrite(studentArrayList.get(i),"./threadBased/"); executorService.execute(studentRun); } executorService.shutdown(); while (!executorService.isTerminated()); long endTimeForThreadBased = new Date().getTime(); System.out.println("Thread based time took in seconds :"+(endTimeForThreadBased-startTimeThreadBased)/1000); } private static void save(Student input,String dir) { try (FileOutputStream fos = new FileOutputStream(new File(dir + input.getName() + ".txt")); ObjectOutputStream obs = new ObjectOutputStream(fos)) { obs.writeObject(input); } catch (IOException e) { e.printStackTrace(); } } }
Stream based time took in seconds :61 Thread based time took in seconds :107
Even after using the newWorkStealingPool Execution service framework to write the information to files parallelly still, the stream-based approach looks far better. Also, code writing effort is very minimal as the Stream API is handling the parallel operation. No need for implementation of the Runnable class.
Usage of Stream API pretty flexible and the sky is only the limit. The same goal can be performed in as many ways one can think of by altering the intermediate operation in the pipeline. Before we close into Stream API let’s remember one thing
“With great power comes great responsibility“
Uncle Ben
Important Points to Remember
- Stream doesn’t modify the original collection.
- Intermediate operations produce the stream object. So you can call as many as intermediate operations.
- The terminal operation produces Primitive or Reference Object. So you can call termination operation only once.
- Stream implements BaseStream which then implements the AutoCloseable interface. Though calling Close is not mandatory if stream source is IO channel, declare the stream inside try with a resource this makes sure close is invoked.
- Always try to keep the filter operation at the beginning of the pipeline so that subsequent operations in the pipeline will have fewer operations to perform.
Don’ts when working with Streams
- When using the parallelStream, do not have Blocking operations in the intermediate steps. Since the stream uses the pool of threads this may block the other stream operations using the same thread pool.
- Do not create the Stream objects if not going to use them. This may result in a memory leak.
This article is just to give a glimpse of Stream API’s capability and features. If you have any suggestions, good Stream API examples drop in the comment box.
Find more examples on Java stream at howtodoinjava and stackify.