Skip to main content
CodeFlow
data structures~10 minRoute 02 6 of 9

Sets

A collection where duplicates do not exist — just membership.

Prerequisites:Hash Tables

Story — the bouncer at the door

A list keeps a guest in for every time they arrive. A set keeps each guest exactly once — try to add them again, and the bouncer just nods them past. The set doesn't track when they came in or where in line they are. It tracks one thing only: are they in?

That sharp focus is what makes sets fast. Membership lookup is O(1) on average — typically faster than the same operation on a sorted array, even with binary search.

Sety says

Three operations earn most of the headlines: union (everything in either), intersection (in both), and difference (in this one but not the other). They replace whole loops of code.

See it — union, intersect, difference

Add items to set A or B. Then click an operator and watch the result pop out. Add a duplicate and notice — nothing changes.

Set A
catdogfish
Set B
dogfishbird
|

Sety says

I only answer one question: is this thing in me? I don't track order, I don't store duplicates. That makes me fast.

Try it — when do you reach for a set?

Exercise. Which of these problems is set-shaped?

  • (a) Track which student IDs have submitted homework.
  • (b) Store a leaderboard of the top 100 scores in order.
  • (c) Find which words appear in both document A and document B.
  • (d) Keep a queue of pending background jobs.
Reveal answer

(a) Set — only membership matters; duplicates are meaningless. (c) Set — you literally want set intersection. (b) is ordered + indexed → use a sorted list or a heap. (d) is FIFO → use a queue.

Code it — five languages

tags = {"python", "web", "tutorial"}
tags.add("python")           # already there — no error, no duplicate
tags.add("video")            # added
print("python" in tags)      # True (O(1))

a = {"cat", "dog", "fish"}
b = {"dog", "fish", "bird"}
print(a | b)                 # union
print(a & b)                 # intersection
print(a - b)                 # difference

Quiz it — make it stick

  1. Question 1

    Why use a set instead of a list when you only care about membership?

  2. True or false — Question 2

    Adding a value that's already in a set raises an error.

  3. Predict — Question 3

    In Python:

    a = {1, 2, 3, 4}
    b = {3, 4, 5}
    print(sorted(a & b))
    print(sorted(a - b))

No Dumb Questions

Real questions other learners asked on this page.

  • Why use a set instead of a list?
    Constant-time membership: `x in my_set` is O(1) on average; `x in my_list` is O(n). Use a set whenever you keep asking 'have I seen this before?' or 'is this on the allowed list?'
  • What's set difference for?
    `a - b` keeps everything in a that's not in b. Useful for "what's new since last sync" or "what permissions are missing." Pair with intersection (`&`) and union (`|`) and you have the basis of most set logic.
  • Is a set ordered?
    Standard hash-based sets are unordered. Some languages ship a separate "ordered set" (TreeSet in Java, std::set in C++) that keeps elements sorted at the cost of O(log n) operations.