Lab 6: Disjoint Sets

## Reminder about queueing up for labs

Recall from this weeks weekly announcements that you will no longer see lab locations on oh.datastructur.es. Instead, as the post outlines, we will be conducting lab help queues on Discord. This way, you’ll have a lot less tabs open and we’ll be able to help students out a lot quicker. Make sure to always include your room number in your message!

## Introduction

Here are the slides for this lab.

In this lab, you’ll create UnionFind which will be used to solve the dynamic connectivity problem. As mentioned in discussion “Union Find” is a synonym for “Disjoint Sets”.

Your implementation of the Union Find data structure will be the most advanced version from lecture, namely Weighted Quick Union. For more information about this please reference the slides from lecture. The version from lecture has one major difference from our implementation: Ours will not keep a seperate size array. Instead, we will be using an approach that uses a negative value to indicate both that the element is a root and the size of its set. To better visualize and help you understand the operations on disjoint sets check out this online datastructure visualizer. Be sure to turn on Union by Rank where Rank = # of nodes (Union by Rank is just another way of saying “Weighted Quick Union”).

Optionally, you may also implement path compression. If you do so, make sure to enable Path Compression in the visualizer linked above.

## Weighted Quick Union

A skeleton will be provided in which you will have to fill in the following methods to complete the data structure:

• public UnionFind(int n): Creates a UnionFind data structure holding n vertices, in the set {0, …, n-1}. Initially, all vertices are in disjoint sets.
• public void validate(int v1): Throws an IllegalArgumentException if v1 is not a valid vertex.
• public int sizeOf(int v1): Returns the size of the set v1 belongs to.
• public int parent(int v1): Returns the parent of v1. If v1 is the root of a tree, returns the negative size of the tree for which v1 is the root.
• public boolean isConnected(int v1, int v2): Returns true if nodes v1 and v2 are connected.
• public void connect(int v1, int v2): Connects two elements v1 and v2 together. v1 and v2 can be any valid elements, and a union-by-size heuristic is used. If the sizes of the sets are equal, tie break by making v2’s root the root of the combined set. Connecting a vertex with itself or vertices that are already connected should not change the sets, but it may alter the internal structure of the data structure.
• public int find(int v1): Returns the root of the set v1 belongs to. Optionally, path-compression can be employed allowing for fast search-time.

For the above functions you should also correctly handle faulty inputs, e.g if invalid vertices are passed into the above functions, throw an IllegalArgumentException.

When implementing UnionFind feel free to either follow the above order or any other order which makes sense to you. Please note that that these methods are not independent from one another, and some methods might be useful as helper methods for others.

You are free to write your own tests for this assignment if you want to ensure that your implementation is right without running it through the autograder, but we will not be grading any tests.

## Optional Questions / Exercises

Below are a number of exercises that deal with Quick Find, Weighted Quick Union, and some variants of them. These questions are not required, but will serve as good practice in thinking about the differences between these differing implementations. The answers are displayed in white text, so if you want to check what you think highlight the line to reveal the text.

### Quick Find

1. Using the Quick Find algorithm with $N$ items, what is the worst-case runtime for the find operation?

Answer: The worst case runtime is $\Theta(1)$

2. Using the Quick Find algorithm with $N$ items, what is the worst-case runtime for the union operation?

Answer: The worst case runtime is $\Theta(N)$.

### Weighted Quick Union

Define a fully connected DisjointSets object as one in which connected returns true for any arguments, due to prior calls to union. Suppose we have a fully connected DisjointSets object with 6 items. Give the best and worst case height for the two implementations below. We define height as the number of links from the root to the deepest leaf, so a tree with 1 element has a height of 0. Give your answer as an exact value. Assume HeightedQuickUnion is like WeightedQuickUnion, except uses height instead of size/weight to determine which subtree is the new root. Hint: Try drawing out a few disjoint set trees and think about the different possible sequences of union operations that will result in the maximum height vs. the minimum height tree.

1. What is the best-case height for a WeightedQuickUnion containing 6 items?

Answer: The best case height is 1.

2. What is the worst-case height for a WeightedQuickUnion containing 6 items?

Answer: The worst case height is 2.

3. What is the best-case height for a HeightedQuickUnion containing 6 items?

Answer: The best case height is 1.

4. What is the worst-case height for a HeightedQuickUnion containing 6 items?

Answer: The worse case height is 2.

## TA Overview

At the end of lab, your TA will go over the reference solution to UnionFind. This will be helpful if you haven’t finished the lab, since we don’t want you to be stuck working on lab too much outside of lab. (This is also an incentive for you to go to lab!)

## Submission and Recap

In this lab we covered the implementation of Weighted Quick Union, optionally with path compression. You already know the drill, but just to remind you, make sure to push your code to GitHub and submit to Gradescope to get credit for this lab. We will only be grading the code you have written in UnionFind.