Java Collections Framework (JCF)
The Java Collections Framework (JCF) is a set of interfaces and classes in Java that provide a comprehensive, standardized architecture for representing and manipulating collections of objects.
It is composed of core interfaces
(such as Collection
, List
, Set
, Map
, and Queue
) that you can implement
to create your own data structures, and it also contains concrete implementations of those interfaces (such as ArrayList
, LinkedList
, HashSet
, HashMap
).
JCF helps in simplifying the task of designing complex programs by helping developers apply OO concepts such as Polymorphism, Inheritance and Interfaces to create their own designs by reusing classes from this framework.
Let’s see an overview of the core interfaces
-
Collection (interface) The root interface in the Java Collections Framework hierarchy. Represents a group of objects known as elements. Core methods include
add
,remove
,contains
,size
,isEmpty
, anditerator
. -
List (interface) Represents an ordered sequence of elements where duplicates are allowed. Core implementations include
ArrayList
,LinkedList
, andVector
. -
Set (interface) Represents a collection of *unique elements with no duplicate elements (sets can be ordered or unordered). Core implementations include
HashSet
,LinkedHashSet
, andTreeSet
. -
Map (interface) Represents a mapping between keys and values, where each key is associated with exactly one value. Core implementations include
HashMap
,LinkedHashMap
,TreeMap
, andHashtable
. -
Queue (interface) Represents a collection designed for holding elements prior to processing. Core implementations include
LinkedList
(it is also a Queue), andPriorityQueue
-
Deque (interface) Represents a double-ended queue, supporting element insertion and removal at both ends. Core implementations include
ArrayDeque
andLinkedList
(it is also a Deque).
LinkedList
is the most versatile data structure because is implementing both List, Queue, and Dequeue interfaces.
The List Interface
You should be familiar with the List
interface as we used ArrayList
many times in SOFTENG281. ArrayList
is a concrete implementation of the interface. Another important concrete implementation of List
is LinkedList
.
ArrayList
and LinkedList
have different underlying implementations and performance characteristics.
Java’s ArrayList
is implemented using an array internally. The array dynamically resizes itself as needed. When the array reaches its maximum capacity, a new, larger array is allocated, and the elements from the old array are copied to the new array.
Java’s LinkedList
is implemented using a doubly linked list internally. Each element is stored in a node that contains references to the previous and next nodes in the list. The first and last nodes are known as the head
and tail
, respectively.
Here are some pros and cons of each, along with recommendations on when to use them:
- Random Access
list.get(0); // get the element at the first position
list.set(4, "hello"); // insert hello at index 4 (i.e, position 5)
ArrayList
is much faster than LinkedList
because ArrayList
uses an array internally, while LinkedList
requires traversing the list from the beginning or end to reach the desired position.
- Memory Efficiency
Generally ArrayList
is more memory-efficient than LinkedList
because it only needs to store the elements inside an internal array. Conversely, each element in LinkedList
requires additional memory for storing the node references.
- Better Performance for Iteration
for(String s : list){
System.out.println(s);
}
Iterating over elements using an enhanced for loop (for-each) or using iterators is typically faster with ArrayList
than LinkedList
- Insertion and Deletion
list.remove(4); // remove element at index 4
list.add("hello"); // append element at the end of the list
remove
and add
operations are often slower in ArrayList
, because elements may need to be shifted in the internal array. Also resizing the internal array can be expensive. Conversely, LinkedList
allows faster remove
and add
operations because only the relevant references need to be updated.
When to use ArrayList
?
Use ArrayList
when you need fast access to elements by index (e.g., list.get(5)
), and when you frequently perform iteration over the elements (e.g., for (String s : list)
), and most of the time insertion and deletion operations are infrequent or performed mainly at the end of the list.
When to use LinkedList
?
Use LinkedList
when you frequently perform insertion and deletion operations at arbitrary positions within the list.
The Set Interface
A Set
is a collection of unique items. Thus, sets have no duplicate elements. Popular concrete implementations are HashSet
and LinkedHashSet
, the main difference of the two is that LinkedHashSet
maintains the order of the elements as they are added inside the set, HashSet
does not (i.e., it is an unordered set).
This is an example of usage of HashSet
Set<String> hashSet = new HashSet<>();
hashSet.add("a");
hashSet.add("b");
hashSet.add("b"); // Duplicate, will not be added
hashSet.add("c");
hashSet.add("d");
hashSet.add("e");
hashSet.add("f");
System.out.println("HashSet: " + hashSet); // Output order may vary
System.out.println("is `e` contained? " + hashSet.contains("e")); // check if the element is contained in the set
To understand how HashSet
works, we need to first understand the methods hashcode
and equals
.
-
int hashcode()
on an Object returns an integer, which should represents this Object, as uniquely as possible within the confines of an int. -
boolean equals(Object other)
determines whether this Object is semantically equal to another Object.
These two methods are intrinsically linked together as equals
must be consistent with hashcode
: That is, a.equals(b)
must imply a.hashCode() == b.hashCode()
. The reverse is not true; two objects that are not equal may have the same hashcode. (this is crucial to understand hashcode)
In the following we report the Oracle Java 17 documentation, that
defines the hashcode
contract as follows:
-
Whenever it is invoked on the same object more than once during an execution of a Java application, the
hashCode
method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application. -
If two objects are equal according to the
equals
method, then calling thehashCode
method on each of the two objects must produce the same integer result. -
It is not required that if two objects are unequal according to the
equals
method, then calling thehashCode
method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
This is how HashSet
works:
When you add an element to a HashSet
, the hashCode()
method of the element is called internally to generate the hashcode of the element. This hashcode is used to determine the index (bucket) in the underlying array where the element will be stored. HashSet
will use equals
to determine if the element already exists in the bucket
Why is called bucket? Because there could be semantically different objects (equals
return false) that have the same hashcode (as discussed above this is possible! the opposite should not!). This is situation is called hashcode collisions. Hashcode collisions are inevitable because the number of possible integers is a finite number much smaller than all possible states of a Java Object. In Java, the int data type is a 32-bit signed integer, which means it can represent integer values in the range from -2,147,483,648
to 2,147,483,647
(inclusive), which means 4,294,967,296
possible hashcodes. However, let’s consider a simple example in which a class Point
has two int fields x
and y
(2D positions):
public class Point{
private int x;
private int y;
}
There are 18,446,744,073,709,551,616
possible states for objects of type Points
, with two int
fields (x = 0, y = 0), (x = 1, y = 0) ...
. So it is mathematically impossible that all possible objects of type Point
have a distinct hashcode. Try to consider more complex objects, they will have a much higher number of possible states, but there are only be 4,294,967,296
possible unique hashcodes. This is ok. This situation is called hashcode collision. The HashSet
will still work fine, will be just slower to operate. There will be many hashcode collisions. Indeed, the hashcode is only computed and used for performance reasons, is the equals
method that has the last say if an object is unique or not.
For example, let’s assume that we add in the Set two different objects, but they have the same hashcode, this is how it works:
System.out.println(a.equals(b)); // false
System.out.println(a.haschode() == b.hashcode()); // true
// lets' assume the a.haschode() = b.hashcode() = c.hashcode() = 23
hashSet.add(a); // the bucket with ID 23 is empty, HashSet puts element a in the bucket is 100% sure that there isn't another object in the list that is equal to a, because the bucket is empty
hashSet.add(b); // in the bucket with ID 23 there is already element a, HashSet needs to check if a.equals(b) is false. If so it will add b, this is the case so now the bucket contains a and b, which are different objects, they just have the same hashcode.
Clearly, the more hashcode collisions there are, the more equals
checks HashSet
has to do to be sure to add an object that is unique (there is no semantically equivalent object already inside the set).
Buckets, HashSet
implements buckets internally with a LinkedList
.
When we want to retrieve an element from the HashSet
, it computes the hashcode of the element, and it searches for the corresponding bucket. If HashSet
finds the element in the bucket (by comparing using equals
) it returns the element.
You might wonder, do I need to implement hashcode
and equals
methods? The answer is yes, if you need to, for example because you want to insert those elements in a HashSet
(hashcode
and equals
are also needed in other situations).
Let’s do a step back. As you know, all classes in Java extends java.lang.Object
by default. java.lang.Object
already contains the following default implementations of hashcode
and equals
.
public boolean equals(Object obj) {
return this == obj;
}
public int hashCode() {
return System.identityHashCode(this);
}
If you don’t implement the hashcode
and equals
by overriding these default implementations, the default implementation will be invoked when you use the HashSet
. However, the resulting behaviour of HashSet
will be unlikely the behaviour that you want.
For example:
import java.util.Set;
import java.util.HashSet;
class Point{
int x;
int y;
public Point(int x, int y){
this.x = x;
this.y = y;
}
@Override
public String toString() {
return "Point [x=" + x + ", y=" + y + "]";
}
}
public class Main {
public static void main(String args[]) {
Point p1 = new Point(3,4);
Point p2 = new Point(3,4);
Point p3 = new Point(8,7);
System.out.println(p1.equals(p2)); // prints false
System.out.println(p1.equals(p3)); // prints false
System.out.println(p2.equals(p3)); // prints false
System.out.println(p1.equals(p1)); // prints true
System.out.println(p1.hashCode()); // 112810359
System.out.println(p2.hashCode()); // 2124308362
System.out.println(p3.hashCode()); // 146589023
System.out.println(p3.hashCode()); // 146589023
Set<Point> hashSet = new HashSet<>();
hashSet.add(p1);
hashSet.add(p2);
hashSet.add(p3);
System.out.println(hashSet); // prints [Point [x=8, y=7], Point [x=3, y=4], Point [x=3, y=4]]
}
}
In the example above, I am not overriding the hashcode
and equals
methods so the JVM will use the default implementation of java.lang.Object
I would expect that p1
and p2
are equals, but they are not, because the default implementation of equals
checks if they are exactly the same object in memory, similarly is the hascode
, which is based on what the reference of the object is, not the object content.
So we need to override the hashcode
and equals
methods:
import java.util.Set;
import java.util.HashSet;
class Point{
int x;
int y;
public Point(int x, int y){
this.x = x;
this.y = y;
}
@Override
public String toString() {
return "Point [x=" + x + ", y=" + y + "]";
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + x;
result = prime * result + y;
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Point other = (Point) obj;
if (x != other.x)
return false;
if (y != other.y)
return false;
return true;
}
}
public class Main {
public static void main(String args[]) {
Point p1 = new Point(3,4);
Point p2 = new Point(3,4);
Point p3 = new Point(8,7);
System.out.println(p1.equals(p2)); // prints true
System.out.println(p1.equals(p3)); // prints false
System.out.println(p2.equals(p3)); // prints false
System.out.println(p1.equals(p1)); // prints true
System.out.println(p1.hashCode()); // 1058
System.out.println(p2.hashCode()); // 1058
System.out.println(p3.hashCode()); // 1216
System.out.println(p3.hashCode()); // 1216
Set<Point> hashSet = new HashSet<>();
hashSet.add(p1);
hashSet.add(p2);
hashSet.add(p3);
System.out.println(hashSet); // prints [Point [x=8, y=7], Point [x=3, y=4]]
}
}
Now, p1
and p2
are equals, and they give the same hashcode. Also, the HashSet
has only two elements because p1
and p2
are semantically equivalent although are different objects (they have a different memory address)
Ok, but how to implement them ? Why the hashcode is multiplied by 31 ? Using 31, which is a prime number helps to reduce the chance of collisions.
Don’t worry your IDE (VS Code) can generate them automatically :D
Right click inside the Java file
-> Source Action...
-> hashCode and equals
-> you choose the fields that you want to use for equals and hashcode (you might not want to consider all of them)
Get your hands dirty!
you have this class
class Car{
private String plate;
private String country;
private String colour;
public Car(String plate, String country, String colour) {
this.plate = plate;
this.country = country;
this.colour = colour;
}
@Override
public String toString() {
return "Car [plate=" + plate + ", country=" + country + ", colour=" + colour + "]";
}
}
Two cars are equals if they have the same plate and country. Indeed, different countries can have cars that share the same plate. The Colour of the car is irrelevant. Implement hashcode
and equals
methods accordingly.
Operations on Set
When dealing with sets you might need to perform these operations
Consider two sets setA {1, 2, 3, 4}
and setB = {2, 4, 6, 8}
Intersection
The intersection of these two sets is a set, that for each element e
in the set, it satisfies setA.contains(e)
is true AND setB.contains(e)
is true
In this case, the intersection of setA and setB is {2, 4}
Union
The union of these two sets is a set, that for each element e
in the set, it satisfies setA.contains(e)
is true OR setB.contains(e)
is true
In this case, the union of setA and setB is {1, 2, 3, 4, 6, 8}
Difference
The difference of setA
and setB
is a set, that for each element e
in the set, it satisfies setA.contains(e)
is true AND setB.contains(e)
is false
In this case, the difference between setA and setB is {1, 3}
and the difference between setB and setA is {6, 8}
Subsets
Consider two sets setA
and setB
, we say setA is a subset of setB if and only if for every element e
in setA, setB.contains(e)
is true.
When setA is a subset of setB and setB is a setA, then it must be the case that setA.equals(setB)
is true.
- A set that contains no element is a subset of any set
- A set is a subset of itself.
Power set
Consider setA = {2, 8, 1}
The power set of setA is a set, that contains all the subsets of setA.
In this case, the power set of setA is {empty set, {1}, {2}, {8}, {1,2}, {1,8}, {2,8}, {1,2,8}}
Cross product
Consider setA = {s, e}
and setB = {2, 8, 1}
The cross product of setA and setB is a set of all pairs (a, b), where a is an element in setA and b is an element in setB.
In this case, the cross product of setA and setB is {(s,2), (s,8), (s,1), (e,2), (e,8), (e,1)}
Get your hands dirty!
Have a look to the official JavaDoc of Set
to familiarize with the methods of the interface.
Implement the set operations using HashSet
. Important methods to implement the operations are addAll
, containsAll
, removeAll
, and retainAll
.
The Map Interface
The Map
data structure is together with lists and sets one of the most popular (and extremely useful) data structures. HashMap
is a concrete implementation provided by the JDK that also uses hashcode
to increase the performance.
HashMap
stores data in <key,value> pairs, where each key is unique. It allows you to store and retrieve elements based on a key. The key is an unique identifier for accessing the value.
HashMap vs HashSet HashMap
is used for storing key-value pairs, where each key is unique, while HashSet
is used for storing unique elements without associated values.
You typically use HashMap
when you need to associate keys with values and perform lookups based on keys. HashSet
is used when you need to store a collection of unique elements.
Have a look at the JavaDoc of Map
to understand the available methods
The method to add elements in the map is put
, where you specify the key
and the value
associated to that key. Note that both key
and value
are objects, you cannot use primitive values: Map<int,String> map2 = new HashMap<>(); // NOT ALLOWED
but you can use the Integer
class.
Let’s consider our example below.
import java.util.Map;
import java.util.HashMap;
class Point{
int x;
int y;
public Point(int x, int y){
this.x = x;
this.y = y;
}
@Override
public String toString() {
return "Point [x=" + x + ", y=" + y + "]";
}
@Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + x;
result = prime * result + y;
return result;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Point other = (Point) obj;
if (x != other.x)
return false;
if (y != other.y)
return false;
return true;
}
}
public class Main {
public static void main(String args[]) {
Point p1 = new Point(3,4);
Point p2 = new Point(3,4);
Point p3 = new Point(8,7);
Map<Point,String> map = new HashMap<>();
map.put(p1,"hello");
System.out.println(map); // prints {Point [x=3, y=4]=hello}
map.put(p2,"world");
System.out.println(map); // prints {Point [x=3, y=4]=world}
map.put(p3,"!");
System.out.println(map.get(p1)); // prints world
System.out.println(map); // prints {Point [x=8, y=7]=!, Point [x=3, y=4]=world}
}
}
map.put(p1,"hello");
adds the key value p1
and hello
, it does that by computing the hashcode
of p1
, the hashcode
of p1 is the key of the map. The map contains one key-pair {Point [x=3, y=4]=hello}
map.put(p2,"world");
adds the key value p2
and hello
, but p1
has the same hashcode
of p2
(i.e., p1
and p2
are the same key) AND equals
on these two objects returns true, so the map replaces the value hello
with world
. The map still has one key pair but with an updated value {Point [x=3, y=4]=hello}
map.put(p3,"!");
because p3
has a different hashcode and p3
is not equals to p1
(or p2
), then it adds a new pair: {Point [x=8, y=7]=!, Point [x=3, y=4]=world}
System.out.println(map.get(p1));
returns world
because we replaced the value hello
You can use putIfAbsent
to put the value only if there is no key with the same hashcode
public class Main {
public static void main(String args[]) {
Point p1 = new Point(3,4);
Point p2 = new Point(3,4);
Point p3 = new Point(8,7);
Map<Point,String> map = new HashMap<>();
map.putIfAbsent(p1,"hello");
System.out.println(map); // prints {Point [x=3, y=4]=hello}
map.putIfAbsent(p2,"world");
System.out.println(map); // prints {Point [x=3, y=4]=hello}
map.putIfAbsent(p3,"!");
System.out.println(map.get(p1)); // prints hello
System.out.println(map); // prints {Point [x=8, y=7]=!, Point [x=3, y=4]=hello}
}
}
Similarly to HashSet
you should not worry about hash-collision because the HashMap
will still use equals in case of collisions. REMEMBER hash-collisions are inevitable. It just a matter of performance. For example, let’s create a very bad hashcode function that always returns 5
, so all objects will have the same hashcode! of course never do that !
import java.util.Map;
import java.util.HashMap;
class Point{
int x;
int y;
public Point(int x, int y){
this.x = x;
this.y = y;
}
@Override
public String toString() {
return "Point [x=" + x + ", y=" + y + "]";
}
@Override
public int hashCode() {
return 5;
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Point other = (Point) obj;
if (x != other.x)
return false;
if (y != other.y)
return false;
return true;
}
}
public class Main {
public static void main(String args[]) {
Point p1 = new Point(3,4);
Point p2 = new Point(3,4);
Point p3 = new Point(8,7);
Map<Point,String> map = new HashMap<>();
map.putIfAbsent(p1,"hello");
System.out.println(map); // prints {Point [x=3, y=4]=hello}
map.putIfAbsent(p2,"world");
System.out.println(map); // prints {Point [x=3, y=4]=hello}
map.putIfAbsent(p3,"!");
System.out.println(map.get(p1)); // prints world
System.out.println(map); // prints {Point [x=8, y=7]=!, Point [x=3, y=4]=hello}
}
}
The result is the same, the HashMap
works fine will just be very slow because all the elements will be in the same bucket
(they all have the same hashcode)
Get your hands dirty!
I have a list of words, I want to print the occurrence of each word. For example:
public static void main(String[] args) {
// Sample list of words
List<String> words = new ArrayList<>();
words.add("apple");
words.add("banana");
words.add("banana");
words.add("banana");
words.add("apple");
words.add("orange");
words.add("banana");
words.add("apple");
words.add("banana");
}
banana appears 5 times apple appears 3 times orange appears 1 times
Can you implement this with `HashMap` ?
What if I ask you to implement the same without and `HashMap` using only `ArrayList` can you do that? Note that the content of the list might change overtime, you don't know the content of the list when you write the code, it has to work for any arbitrary list of words.
</div>
<br>
<br>
<div class="purple">
# ![Edit](../../../_images/hand.png) Get your hands dirty!
# REWIND!
Download the ACP execise `281_REWIND_hashset_hashmap`
Read an inventory of items and their quantities from the file `inventory.csv` and store this data in a HashMap. Implement methods to add items, remove items, update quantities, and display the inventory.
Sure! Here’s a Java exercise that involves reading an inventory of items and their quantities from a file, and then storing this information in a `HashMap`. The exercise will guide students to read the file, process its contents, and then allow them to perform some basic operations on the inventory.
Implement the following methods:
- `void addItem(String item, int quantity)`: Adds a new item with the specified quantity to the inventory. If the item already exists, update its quantity.
- `void removeItem(String item)`: Removes an item from the inventory, if item is not there throw an exception.
- `void getItem(String item, int quantity)` get the specific quantity, if quantity is not enough throw an exception, if the quantity goes to zero remove the item.
- `void updateQuantity(String item, int quantity)`: Updates the quantity of an existing item.
- `void displayInventory()`: Displays all items and their quantities.
</div>