Lazy Removal - How to make heap operations (almost) O(1)

4 minute read

Published:

Introduction

This problem was inspired by this last week’s Leetcode contest Question 3:3092. Most Frequent IDs. Although this problem is the third and should be on the harder end, I was able to solve this problem ~4 minutes because I had practiced this technique.

Problem Statement

The problem basically boils down to this:

Given a list of ids and frequency updates, give the most frequent element’s frequency after each update.

For example:

Input: nums = [2,3,2,1], freq = [3,2,-3,1]

Output: [3,3,2,2]

i = 0 -> { 2: 3 } -> res = [3]

i = 1 -> { 2: 3, 3: 2 } -> res = [3, 3]

i = 2 -> { 2: 0, 3: 2 } -> res = [3, 3, 2]

i = 3 -> { 2: 0, 3: 2, 1: 2 } -> res = [3, 3, 2, 2]

Solution

This is a classic scenario for using a max heap (or in Python a negative min heap since Python doesn’t do max heaps) since the heap will maintain the element with the maximum frequency at the very top, allowing us to perform each query in O(1).

But, we need to be able to also update elements in the heap without a) traversing the entire heap b) re-heapifying at each step!

This would lead to a O(n^2) runtime complexity which won’t do with the constraints provided that nums and freq can be as large as 10^5.

Lazy Removal

So what can we do instead? We can still maintain the max heap, but we don’t necesarrily need to update anything in the max heap. We can maintain a ‘logbook’ or dictionary which maintains the most recent frequency instead and keep stale elements inside the max heap.

What if we encounter a stale element when looking to append to the result you ask? Well, we can clearly identify it is stale since the element’s frequency won’t be equal to what is stored in the logbook/dictionary.

Let’s go back to the example to make things more clear:

Input: nums = [2,3,2,1], freq = [3,2,-3,1]

Output: [3,3,2,2]

i = 0 -> logbook = { 2: 3 }, maxHeap = [(3, 2)] (here were storing the count, id as a tuple) -> res = [3]

i = 1 -> logbook = { 2: 3, 3: 2 }, maxHeap = [(3, 2), (2, 3)] -> res = [3, 3]

i = 2 -> logbook = { 2: 0, 3: 2 }, maxHeap = [(3, 2), (2, 3)]

At this point we see that our max heap says the most frequent element is 2 with a frequency of 3. But this contradicts our logbook. So let’s trash it.

-> maxHeap = (2, 3)]

This concurs with our logbook, so we add it to the result.

-> res = [3, 3, 2]

i = 3 -> logbook = { 2: 0, 3: 2, 1: 2 }, maxHeap = [(2, 3), (2, 1)] -> res = [3, 3, 2, 2]

Actual Code

What does this look like in practice? It’s relatively simple and regurgitable for when you need to write it quickly. Here is my methodology:

def mostFrequentIDs(self, nums: List[int], freq: List[int]) -> List[int]:
    collection = collections.defaultdict(int)
    res = []
    maxHeap = []
    for n, f in zip(nums, freq):
        # Update logbook
        collection[n] += f
        # Add to heap
        heapq.heappush(maxHeap, (-collection[n], n))
        # Check if value at top of max heap matches what logbook says
        # Remove while they are inconsistent
        while maxHeap and -maxHeap[0][0] != collection[maxHeap[0][1]]:
            heapq.heappop(maxHeap)
        # Add the top value's frequency to result
        res.append(-maxHeap[0][0])
        
    return res

Now this may look like O(n^2) with the loop within a loop, but it actually runs in linear time (most of the time). In the worst case, you will have to remove multiple elements, but this isn’t often the case.

I hope you find this useful and find heap problems easier in the future!