Introduction to Collections
A motivating example
An OO view of Twitter
- Twitter may be viewed as a collection of Twitter accounts
- Each account may be viewed as an Object with the following
attributes
- The Twitter handle
- A collection of followers
- A collection of following
- A collection of tweets etc.
Collections Framework
- A Framework provides a sophisticated hierarchy of interfaces and classes for for managing groups of objects. These are thus large bodies of pre-written code e.g. the Google Web Toolkit (GWT).
- The Collections Framework is one such popular frameworks, which provides a range of pre-developed data-structures and associated algorithms to operate on them.
- This helps in simplifying the task of designing complex programs by helping developers apply OO concepts such as Polymorphism, Inheritance and Interfaces to create their own designs by reusing classes from this framework.
- We will focus on the
Collection
interface and theMap
interface. - In the
Collection
interface, we will learn aboutList
andSet
. - In the
Map
interface, we will learn aboutHashMap
. - Items used in a
Collection
or aMap
need to be an object type (e.g. Integer) rather than a primitive type (e.g. int).
An intro to the different data structures
[A video by Alex Xu, who has worked at Twitter and Apple]:
The Set Interface
A Set
is a Collection
of unique items. Thus, sets have no
duplicate elements. Also, by definition, they are unordered.
It has many useful APIs:
- Create a set of type
Integer
Set<Integer> numbers = new HashSet<Integer>();
- Add a member into a set
numbers.add(520);
- Check if a set contains an element
numbers.contains(1314);
- Check if two sets contain exactly the same elements
setA.equals(setB);
Get your hands dirty!
The Repl below has created two hash set instances for you, now try yourself:
- Add ‘o’, ‘o’, ‘p’ into the first set
- Use
System.out.println()
to display the first set, what do you see? - Add ‘o’, ‘p’ into the second set
- Use
System.out.println()
to display the second set, what do you see? - Use
equals
method to compare two sets, what can you find?
Operations on set
Consider two sets setA {1, 2, 3, 4}
and setB = {2, 4, 6, 8}
Intersection
The intersection of these two sets is a set, that for each element e
in the set, it satisfies setA.contains(e)
is true AND setB.contains(e)
is true
In this case, the intersection of setA and setB is {2, 4}
Union
The union of these two sets is a set, that for each element e
in the set, it satisfies setA.contains(e)
is true OR setB.contains(e)
is true
In this case, the union of setA and setB is {1, 2, 3, 4, 6, 8}
Difference
The difference of setA
and setB
is a set, that for each element e
in the set, it satisfies setA.contains(e)
is true AND setB.contains(e)
is false
In this case, the difference between setA and setB is {1, 3}
and the difference between setB and setA is {6, 8}
Subsets
Consider two sets setA
and setB
, we say setA is a subset of setB if and only if for every element e
in setA, setB.contains(e)
is true.
When setA is a subset of setB and setB is a setA, then it must be the case that setA.equals(setB)
is true.
- A set that contains no element is a subset of any set
- A set is a subset of itself.
Power set
Consider setA = {2, 8, 1}
The power set of setA is a set, that contains all the subsets of setA.
In this case, the power set of setA is {empty set, {1}, {2}, {8}, {1,2}, {1,8}, {2,8}, {1,2,8}}
Cross product
Consider setA = {s, e}
and setB = {2, 8, 1}
The cross product of setA and setB is a set of all pairs (a, b), where a is an element in setA and b is an element in setB.
In this case, the cross product of setA and setB is {(s,2), (s,8), (s,1), (e,2), (e,8), (e,1)}
Implement your own Set
Get your hands dirty!
public class SetOfStrings {
private List<String> setv;
/**
* create an empty SetOfStrings -- constructor
*/
public SetOfStrings() {
setv = new ArrayList<String>();
}
/**
* returns true if the SetOfStrings is an empty set false otherwise
*/
public boolean isEmpty() {
return setv.isEmpty();
}
/**
* returns the size of the set
*/
public int size() {
return setv.size();
}
/**
* Checks if a given String argument is a member of the set. return true if
* 'element' is a member of the set; false otherwise.
*
* @param element a String
*/
public boolean isMember(String element) {
return true;
}
/**
* return a List<String> of the elements of this SetOfStrings
*/
public List<String> elements(){
return setv;
}
/**
* insets a String 'newElement' to the set. care should be taken to insert so
* that the property of set is not violated after insertion.
*
* @param newElement a String
*/
public void insertElement(String newElement) {
}
/**
* deletes a string 'element' from the set if 'element' is not a member of the
* set then an exception called NoSuchElementException is thrown.
*
* @param element a String
* @throws NoSuchElementException if the String 'element' is not a member of the
* set
*/
public void deleteElement(String element) throws NoSuchElementException {
}
//helper method
private void addAllElements(SetOfStrings set1) {
}
/**
* determines the union of the current set with 'set2' and returns the new set
* which is the union of these two sets. This operation SHOULDN'T MODIFY EITHER
* OF THE INPUT SETS.
*
* @param set2 a SetOfStrings
*/
public SetOfStrings union(SetOfStrings set2) {
SetOfStrings out = new SetOfStrings();
return out;
}
/**
* determines the intersection of the current set with 'set2' and returns the
* new set which is the intersection of these two sets. This operation SHOULDN'T
* MODIFY EITHER OF THE INPUT SETS.
*
* @param set2 a SetOfStrings
*/
public SetOfStrings intersection(SetOfStrings set2) {
SetOfStrings out = new SetOfStrings();
return out;
}
/**
* determines the difference of the current set with 'set2' and returns the new
* set which contains all the elements that are in the current set but not in
* "set2" This operation SHOULDN'T MODIFY EITHER OF THE INPUT SETS.
*
* @param set2 a SetOfStrings
*/
public SetOfStrings difference(SetOfStrings set2) {
SetOfStrings out = new SetOfStrings();
return out;
}
/**
* determines the product of the current set with set2 and returns a new
* SetOfStrings representing the product. In the product, the pairs are denoted
* by a new string "a,b" where "a" is a member of the first set and "b" is a
* member of the second set.
*
* @param set2 a SetOfStrings
**/
public SetOfStrings product(SetOfStrings set2) {
SetOfStrings out = new SetOfStrings();
return out;
}
/**
* determines if set2 is a subset of the current set returns true when every
* member of set2 to is present in the current set
*
* @param set2 a SetOfStrings
**/
public boolean subset(SetOfStrings set2) {
}
/**
* Displays the contents of the SetOfStrings using set notation
*/
public void print() {
}
}
You are provided this skeleton code in code runner. Complete the following actions through code runner exercises as a Quiz called Collections Code Runner Exercise
- Implement a test case for testing the
isMember
method, which has an implementation. - Implement the
insertElement
method along with a test case. Ensure that the test case passes. - Implement the
deleteElement
method. - Implement the
intersection
method along with two different test cases: one that tests the actual size and the other that tests the values. - Implement the
union
method and likewise write two different test cases similar to the previous step. - Implement the
subset
method with two different test cases like above. - Implement the
product
method with two different test cases again.
The Map Interface
Suppose we want to search for a given Twitter user using an unique ID,
say the twitter handle. The search functionality can be
realised using the concept of a Map
: a data-structure that
associates <key,value> pairs, where the key is an unique identifier
for accessing the value. In this case, the key is the twitterHandle
and
value
is all the data associated in the account for that twitterHandle
.
- Each key is distinct i.e. there are no duplicate keys
- Each key maps to a distinct value i.e. a map is a mathematical function.
-
Some of the important methods of this interface are:
size()
that returns the number of entries in the map.isEmpty()
, which returns true when the map is emptyV get(Object key)
, returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.V put(K key, V value)
, associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced by the specified value.Set<K> keySet()
, which returns a Set view of the keys contained in this map.Collection<V> values()
, which returns a Collection view of the values contained in this map.
Hashing and Hashing Functions
In Java every Object has an associated hashing function. The objective of this function is to map objects to buckets, similar to what is shown in the figure below. Note that in this example, objects C, E map to this same bucket. This is known as a collision.
Hash Table
A Hash Table is an efficient way of mapping <key,value> tuples. Let’s
initially assume that the keys are integers in the range
\([0, N-1]\). Then we may conceive a bucket array shown below, where
the entry for the key k
is inserted in the bucket array A[k]
.
The main limitations of this approach is that:
- Keys are arbitrary Objects and not values in the range \([0, N-1]\).
- There are issues with space utilisation, if the number of entries are much smaller than the allocated space.
Hash Functions
To overcome this, we can use a hash function \(h(k)\), which takes any
key k
as a specific Object
and returns an index into the bucket
array i.e. input to the function is a key k
and the output is an
index in the range \([0, N-1]\).
Ideally, \(h\) has to be an injective function i.e. no two distinct elements can be mapped to the same value. However, this requirement may not work always and in such a case a collision has said to have happened.
Indeed this function is neither injective nor surjective and consequently is also not bijective.
So, in Java the following two steps are carried out:
- The function \(h(k)\) maps the key to an integer value called the hash code. Typical hashing function is a polynomial based on the different components of the object \(k\).
-
Then a compression function is used to map the hash code to an unique value in the range \([0, N-1]\). A very simple compression function is \(|h(k)| mod N\). The mod operator represents the operator
%
in Java, wherea % b
returns the remainder ofa/b
. - If N is not a prime number, then the likelihood of collisions increases. For example, if \(N=100\) then hashcodes say 205, 305, 405 etc. will all collide. One simple fix would be to select a prime \(N\). However, this may still cause collisions.
Java’s contract for hashCode
-
Returns a hash code value for the object. This method is supported for the benefit of hash tables like the HashMap. The general contract of hashCode is:
-
In the following we reproduce the Oracle Java 17 documentation, that defines this contract as follows:
-
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
-
If two objects are equal according to the equals method, then calling the hashCode method on each of the two objects must produce the same integer result.
-
It is not required that if two objects are unequal according to the equals method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
This blog by Ralf Sternberg is a good source for understanding this contract. Here, this contract is nicely summarised as ``Objects that are equal must have the same hash code within a running process’’.
A HashSet Example: hashCode and equals method
-
We want to create sets of
Students
objects belonging to different specialisations and then test if some of these sets are equal or not. -
The Class
Student
is shown below. -
You are required to create three different sets of students in main and test if two sets are equal or not.
-
To do this you need to override the
hashCode
and theequals
methods.
import java.util.HashSet;
import java.util.List;
import java.util.Objects;
import java.util.Set;
public class Student {
protected Integer id;
protected String name;
protected List<String> courses;
public Student(Integer id, String name) {
this.id = id;
this.name = name;
}
public void enollCourses(List<String> courses) {
this.courses = courses;
}
}
An important NOTE
While generating the hashCode
and equals
, you are required to
use the identical values for the key, where a key is a combination of
the objects attributes.
Using a HashMap
Get your hands dirty!
We will now work on the following ACP exercise for using a HashMap
:
You are given the following class called Student
import java.util.Objects;
public class Student {
protected Integer id;
protected String name;
public Student(Integer id, String name) {
this.id = id;
this.name = name;
}
public String getName() {
return name;
}
public Integer getID() {
return id;
}
@Override
public int hashCode() {
return Objects.hash(id);
}
@Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Student other = (Student) obj;
return Objects.equals(id, other.id);
}
}
Consider the class Course
specified below, where the value of the
studentsInCourse
is the marks they have obtained in the course:
import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class Course {
Map<Student,Double> studentsInCourse;
public Course() {
this.studentsInCourse = new HashMap<Student,Double>();
}
public void enrol(List<Student> students) {
for(Student student: students) {
studentsInCourse.put(student, Math.random()*100);
System.out.println
("The next enrolled student " + student.getID() + "has Mark = " + getMark(student));
}
}
public boolean searchStudent(Student student) {
//You have to implement this method
}
public Double getMark(Student student) throws RuntimeException {
//You have to implement this method
}
public Set<Student> passList(){
//You have to implement this method
}
public static void main(String args[]) {
Student s1 = new Student(1, "s1");
Student s2 = new Student(2, "s2");
Student s3 = new Student(3, "s1");
Student s4 = new Student(4, "s2");
List<Student> enrolled = new ArrayList<Student>();
enrolled.add(s1);
enrolled.add(s2);
enrolled.add(s3);
Course se281 = new Course();
se281.enrol(enrolled);
if(se281.searchStudent(s1)) {
System.out.println("Student " + s1.getID() + " is enrolled");
}
else {
System.out.println("Student " + s1.getID() + " is not enrolled");
}
if(se281.searchStudent(s4)) {
System.out.println("Student " + s4.getID() + " is enrolled");
}
else {
System.out.println("Student " + s4.getID() + " is not enrolled");
}
Set<Student> passList = se281.passList();
for(Student nextStudent: passList) {
System.out.println("Next student who passed 281 is " + nextStudent.getID()
+ " with Mark = "+ se281.getMark(nextStudent));
}
}
}
You are provided this skeleton code in code runner. Complete the following actions through code runner exercises as a Quiz called Collections Code Runner Exercise
-
Implement the method
searchStudent(student)
that returnstrue
whenstudent
is enrolled in the course orfalse
otherwise. - Implement the method
getMark(student)
which returns the mark obtained by the student, if they are enrolled. Otherwise, aRuntimeException
is thrown. - Implement the method
passList()
, which returns a set of students who have secured 50 or more marks in the course.
TLDR
- The Java Collections Framework provides a hierarchy of interfaces and classes for the systematic representation of data using data-structures.
- Three different interfaces, the
List
, theSet
and theMap
have been covered. - The
List
interface provides the implementation of classes such asArrayList
andLinkedList
. - The
Set
interface provides concrete classes such as theHashSet
and theLinkedHashSet
. - The
Map
interface provides the mapping of <key,value> tuples and is implemented by classes such asHashMap
,LinkedHashMap
andTreeMap
etc. We learnt about using theHashMap
class.