A motivating example

An OO view of Twitter

  • Twitter may be viewed as a collection of Twitter accounts
  • Each account may be viewed as an Object with the following attributes
    1. The Twitter handle
    2. A collection of followers
    3. A collection of following
    4. A collection of tweets etc.




Collections Framework

  • A Framework provides a sophisticated hierarchy of interfaces and classes for for managing groups of objects. These are thus large bodies of pre-written code e.g. the Google Web Toolkit (GWT).
  • The Collections Framework is one such popular frameworks, which provides a range of pre-developed data-structures and associated algorithms to operate on them.
  • This helps in simplifying the task of designing complex programs by helping developers apply OO concepts such as Polymorphism, Inheritance and Interfaces to create their own designs by reusing classes from this framework.
  • We will focus on the Collection interface and the Map interface.
  • In the Collection interface, we will learn about List and Set.
  • In the Map interface, we will learn about HashMap.
  • Items used in a Collection or a Map need to be an object type (e.g. Integer) rather than a primitive type (e.g. int).

An intro to the different data structures

[A video by Alex Xu, who has worked at Twitter and Apple]:

Find Alex on LinkedIn here

The Set Interface

A Set is a Collection of unique items. Thus, sets have no duplicate elements. Also, by definition, they are unordered.

It has many useful APIs:

  • Create a set of type Integer
Set<Integer> numbers = new HashSet<Integer>();
  • Add a member into a set
numbers.add(520);
  • Check if a set contains an element
numbers.contains(1314);
  • Check if two sets contain exactly the same elements
setA.equals(setB);

Edit Get your hands dirty!

The Repl below has created two hash set instances for you, now try yourself:

  1. Add ‘o’, ‘o’, ‘p’ into the first set
  2. Use System.out.println() to display the first set, what do you see?
  3. Add ‘o’, ‘p’ into the second set
  4. Use System.out.println() to display the second set, what do you see?
  5. Use equals method to compare two sets, what can you find?





Operations on set

Consider two sets setA {1, 2, 3, 4} and setB = {2, 4, 6, 8}

Intersection

The intersection of these two sets is a set, that for each element e in the set, it satisfies setA.contains(e) is true AND setB.contains(e) is true

In this case, the intersection of setA and setB is {2, 4}

Union

The union of these two sets is a set, that for each element e in the set, it satisfies setA.contains(e) is true OR setB.contains(e) is true

In this case, the union of setA and setB is {1, 2, 3, 4, 6, 8}

Difference

The difference of setA and setB is a set, that for each element e in the set, it satisfies setA.contains(e) is true AND setB.contains(e) is false

In this case, the difference between setA and setB is {1, 3} and the difference between setB and setA is {6, 8}





Subsets

Consider two sets setA and setB, we say setA is a subset of setB if and only if for every element e in setA, setB.contains(e) is true.

When setA is a subset of setB and setB is a setA, then it must be the case that setA.equals(setB) is true.

  • A set that contains no element is a subset of any set
  • A set is a subset of itself.




Power set

Consider setA = {2, 8, 1}

The power set of setA is a set, that contains all the subsets of setA.

In this case, the power set of setA is {empty set, {1}, {2}, {8}, {1,2}, {1,8}, {2,8}, {1,2,8}}




Cross product

Consider setA = {s, e} and setB = {2, 8, 1}

The cross product of setA and setB is a set of all pairs (a, b), where a is an element in setA and b is an element in setB.

In this case, the cross product of setA and setB is {(s,2), (s,8), (s,1), (e,2), (e,8), (e,1)}


Implement your own Set

Edit Get your hands dirty!

public class SetOfStrings {
	private List<String> setv;

	/**
	 * create an empty SetOfStrings -- constructor
	 */
	public SetOfStrings() {
		setv = new ArrayList<String>();
	}

	/**
	 * returns true if the SetOfStrings is an empty set false otherwise
	 */
	public boolean isEmpty() {
		return setv.isEmpty();
	}

	/**
	 * returns the size of the set
	 */
	public int size() {
		return setv.size();
	}

	/**
	 * Checks if a given String argument is a member of the set. return true if
	 * 'element' is a member of the set; false otherwise.
	 *
	 * @param element a String
	 */
	public boolean isMember(String element) {
		return true;
	}

	/**
	 * return a List<String> of the elements of this SetOfStrings
	 */
	
	
	public List<String> elements(){
		return setv;
	}

	/**
	 * insets a String 'newElement' to the set. care should be taken to insert so
	 * that the property of set is not violated after insertion.
	 *
	 * @param newElement a String
	 */
	public void insertElement(String newElement) {
		

	}

	/**
	 * deletes a string 'element' from the set if 'element' is not a member of the
	 * set then an exception called NoSuchElementException is thrown.
	 *
	 * @param element a String
	 * @throws NoSuchElementException if the String 'element' is not a member of the
	 *                                set
	 */
	public void deleteElement(String element) throws NoSuchElementException {
		
	}

	//helper method
	private void addAllElements(SetOfStrings set1) {
		
	}

	/**
	 * determines the union of the current set with 'set2' and returns the new set
	 * which is the union of these two sets. This operation SHOULDN'T MODIFY EITHER
	 * OF THE INPUT SETS.
	 *
	 * @param set2 a SetOfStrings
	 */
	public SetOfStrings union(SetOfStrings set2) {
		SetOfStrings out = new SetOfStrings();
	
		return out;
	}

	/**
	 * determines the intersection of the current set with 'set2' and returns the
	 * new set which is the intersection of these two sets. This operation SHOULDN'T
	 * MODIFY EITHER OF THE INPUT SETS.
	 *
	 * @param set2 a SetOfStrings
	 */
	public SetOfStrings intersection(SetOfStrings set2) {
		SetOfStrings out = new SetOfStrings();
	
		return out;
	}

	/**
	 * determines the difference of the current set with 'set2' and returns the new
	 * set which contains all the elements that are in the current set but not in
	 * "set2" This operation SHOULDN'T MODIFY EITHER OF THE INPUT SETS.
	 *
	 * @param set2 a SetOfStrings
	 */
	public SetOfStrings difference(SetOfStrings set2) {
		SetOfStrings out = new SetOfStrings();
		return out;
	}

	/**
	 * determines the product of the current set with set2 and returns a new
	 * SetOfStrings representing the product. In the product, the pairs are denoted
	 * by a new string "a,b" where "a" is a member of the first set and "b" is a
	 * member of the second set.
	 *
	 * @param set2 a SetOfStrings
	 **/

	public SetOfStrings product(SetOfStrings set2) {

		SetOfStrings out = new SetOfStrings();

		
		return out;
	}

	/**
	 * determines if set2 is a subset of the current set returns true when every
	 * member of set2 to is present in the current set
	 *
	 * @param set2 a SetOfStrings
	 **/
	public boolean subset(SetOfStrings set2) {

	}

	/**
	 * Displays the contents of the SetOfStrings using set notation
	 */
	public void print() {
	
	}

}

You are provided this skeleton code in code runner. Complete the following actions through code runner exercises as a Quiz called Collections Code Runner Exercise

  1. Implement a test case for testing the isMember method, which has an implementation.
  2. Implement the insertElement method along with a test case. Ensure that the test case passes.
  3. Implement the deleteElement method.
  4. Implement the intersection method along with two different test cases: one that tests the actual size and the other that tests the values.
  5. Implement the union method and likewise write two different test cases similar to the previous step.
  6. Implement the subset method with two different test cases like above.
  7. Implement the product method with two different test cases again.


The Map Interface

Suppose we want to search for a given Twitter user using an unique ID, say the twitter handle. The search functionality can be realised using the concept of a Map: a data-structure that associates <key,value> pairs, where the key is an unique identifier for accessing the value. In this case, the key is the twitterHandle and value is all the data associated in the account for that twitterHandle.

  • Each key is distinct i.e. there are no duplicate keys
  • Each key maps to a distinct value i.e. a map is a mathematical function.
  • Some of the important methods of this interface are:

  • size() that returns the number of entries in the map.
  • isEmpty(), which returns true when the map is empty
  • V get(Object key), returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.
  • V put(K key, V value), associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced by the specified value.
  • Set<K> keySet(), which returns a Set view of the keys contained in this map.
  • Collection<V> values(), which returns a Collection view of the values contained in this map.

Hashing and Hashing Functions

In Java every Object has an associated hashing function. The objective of this function is to map objects to buckets, similar to what is shown in the figure below. Note that in this example, objects C, E map to this same bucket. This is known as a collision.

Hash Table

A Hash Table is an efficient way of mapping <key,value> tuples. Let’s initially assume that the keys are integers in the range \([0, N-1]\). Then we may conceive a bucket array shown below, where the entry for the key k is inserted in the bucket array A[k].

The main limitations of this approach is that:

  1. Keys are arbitrary Objects and not values in the range \([0, N-1]\).
  2. There are issues with space utilisation, if the number of entries are much smaller than the allocated space.

Hash Functions

To overcome this, we can use a hash function \(h(k)\), which takes any key k as a specific Object and returns an index into the bucket array i.e. input to the function is a key k and the output is an index in the range \([0, N-1]\).

Ideally, \(h\) has to be an injective function i.e. no two distinct elements can be mapped to the same value. However, this requirement may not work always and in such a case a collision has said to have happened.

Indeed this function is neither injective nor surjective and consequently is also not bijective.

So, in Java the following two steps are carried out:

  1. The function \(h(k)\) maps the key to an integer value called the hash code. Typical hashing function is a polynomial based on the different components of the object \(k\).
  2. Then a compression function is used to map the hash code to an unique value in the range \([0, N-1]\). A very simple compression function is \(|h(k)| mod N\). The mod operator represents the operator % in Java, where a % b returns the remainder of a/b.

  3. If N is not a prime number, then the likelihood of collisions increases. For example, if \(N=100\) then hashcodes say 205, 305, 405 etc. will all collide. One simple fix would be to select a prime \(N\). However, this may still cause collisions.

Java’s contract for hashCode

  • Returns a hash code value for the object. This method is supported for the benefit of hash tables like the HashMap. The general contract of hashCode is:

  • In the following we reproduce the Oracle Java 17 documentation, that defines this contract as follows:

  1. Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.

  2. If two objects are equal according to the equals method, then calling the hashCode method on each of the two objects must produce the same integer result.

  3. It is not required that if two objects are unequal according to the equals method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

This blog by Ralf Sternberg is a good source for understanding this contract. Here, this contract is nicely summarised as ``Objects that are equal must have the same hash code within a running process’’.

A HashSet Example: hashCode and equals method

  • We want to create sets of Students objects belonging to different specialisations and then test if some of these sets are equal or not.

  • The Class Student is shown below.

  • You are required to create three different sets of students in main and test if two sets are equal or not.

  • To do this you need to override the hashCode and the equals methods.

import java.util.HashSet;
import java.util.List;
import java.util.Objects;
import java.util.Set;

public class Student {

    protected Integer id;
    protected String name;
    protected List<String> courses;

    public Student(Integer id, String name) {
	this.id = id;
	this.name = name;
    }

    public void enollCourses(List<String> courses) {
	this.courses = courses;
    }

    }

An important NOTE

While generating the hashCode and equals, you are required to use the identical values for the key, where a key is a combination of the objects attributes.

Using a HashMap

Edit Get your hands dirty!

We will now work on the following ACP exercise for using a HashMap:

You are given the following class called Student

import java.util.Objects;

public class Student {

	protected Integer id;
	protected String name;

	public Student(Integer id, String name) {
		this.id = id;
		this.name = name;
	}

	public String getName() {
		return name;
	}
	
	public Integer getID() {
		return id;
	}

	@Override
	public int hashCode() {
		return Objects.hash(id);
	}

	@Override
	public boolean equals(Object obj) {
		if (this == obj)
			return true;
		if (obj == null)
			return false;
		if (getClass() != obj.getClass())
			return false;
		Student other = (Student) obj;
		return Objects.equals(id, other.id);
	}
}

Consider the class Course specified below, where the value of the studentsInCourse is the marks they have obtained in the course:

import java.util.HashMap;
import java.util.HashSet;
import java.util.List;
import java.util.Map;
import java.util.Set;

public class Course {
	Map<Student,Double> studentsInCourse;
	
	public Course() {
		this.studentsInCourse = new HashMap<Student,Double>();
		
	}
	
	public void enrol(List<Student> students) {
		for(Student student: students) {
			studentsInCourse.put(student, Math.random()*100);
			System.out.println
			("The next enrolled student " + student.getID() + "has Mark = " + getMark(student));
		}
		
	}

	
	public boolean searchStudent(Student student) {
		//You have to implement this method
	}
	
	public Double getMark(Student student) throws RuntimeException {
		//You have to implement this method
	}
	
	public Set<Student> passList(){
		//You have to implement this method
	
	}
	
	public static void main(String args[]) {
		Student s1 = new Student(1, "s1");
		Student s2 = new Student(2, "s2");
		Student s3 = new Student(3, "s1");
		Student s4 = new Student(4, "s2");
		
		List<Student> enrolled = new ArrayList<Student>();
		enrolled.add(s1);
		enrolled.add(s2);
		enrolled.add(s3);
		
		Course se281 = new Course();
		se281.enrol(enrolled);
		
		if(se281.searchStudent(s1)) {
			System.out.println("Student " + s1.getID() + " is enrolled");
		}
		else {
			System.out.println("Student " + s1.getID() + " is not enrolled");
		}
		
		if(se281.searchStudent(s4)) {
			System.out.println("Student " + s4.getID() + " is enrolled");
		}
		else {
			System.out.println("Student " + s4.getID() + " is not enrolled");
		}
		
		Set<Student> passList = se281.passList();
		
		for(Student nextStudent: passList) {
			System.out.println("Next student who passed 281 is " + nextStudent.getID() 
			+ " with Mark = "+ se281.getMark(nextStudent));
		}
	}
 
}

You are provided this skeleton code in code runner. Complete the following actions through code runner exercises as a Quiz called Collections Code Runner Exercise

  1. Implement the method searchStudent(student) that returns true when student is enrolled in the course or false otherwise.

  2. Implement the method getMark(student) which returns the mark obtained by the student, if they are enrolled. Otherwise, a RuntimeException is thrown.
  3. Implement the method passList(), which returns a set of students who have secured 50 or more marks in the course.


TLDR

  • The Java Collections Framework provides a hierarchy of interfaces and classes for the systematic representation of data using data-structures.
  • Three different interfaces, the List, the Set and the Map have been covered.
  • The List interface provides the implementation of classes such as ArrayList and LinkedList.
  • The Set interface provides concrete classes such as the HashSet and the LinkedHashSet.
  • The Map interface provides the mapping of <key,value> tuples and is implemented by classes such as HashMap, LinkedHashMap and TreeMap etc. We learnt about using the HashMap class.