 墨滴 2022/01/03  阅读：30  主题：红绯

一种通过编程面试的算法

Over the past few years, I’ve interviewed with a dozen or so companies and have completed ~50 or so individual algorithm problems. I’m frequently given feedback that I did a great job at the algorithms problem. In this post, I’m going to share how exactly I approach algorithm problems.

Thought Process 思维过程

A guiding principle I use is that every interview problem is designed to be solved. You aren’t going to be asked to prove Fermat’s Last Theorem in an interview. If you are given some impossible problem, it’s not like you’re going to have much of a chance anyways.

In my experience, around 80% of the time an algorithm problem will boil down to a few core data structures and algorithms. The data structures I see the most are are:

• Hash Tables 哈希表
• Binary Search Trees 二叉搜索树

As for the algorithms:

• Depth-first search 深度优先搜索
• Binary Search 二进制搜索
• Sorting Algorithms 排序算法

(You probably won’t be expected to implement a binary search or sorting algorithm, but you should be aware that they exist.)

(你可能不会被要求实现二进制搜索或排序算法搜索，但是你应该意识到它们的存在。)

There’s also two additional programming techniques you should be aware of:

• Dynamic Programming 动态编程
• Recursion 递归

Assuming the solution to your problem can be solved using the items in this list, we can come up with a simple algorithm to solve the problem.

The Algorithm 算法

The algorithm goes like this:

1. After being given the algorithm problem, ask for the specific runtime your solution will need to have. Almost certainly, the interviewer will tell you.
2. 在给出算法问题之后，请求您的解决方案将需要具有的特定运行时。几乎可以肯定，面试官会告诉你。
3. Filter out the data structures and algorithms that obviously aren’t relevant to the problem at hand. This will eliminate most of the list and you will usually be left with 2-3 data structures and algorithms.过滤掉那些明显与手头问题无关的数据结构和算法。这将消除大部分的列表，您通常会留下2-3个数据结构和算法。
1. You can filter out data structures that are too slow. If you need to solve a problem in O(1) time, it’s impossible to use a binary tree in the solution, because a binary tree will always take at least O(log n) time.
2. 您可以过滤掉速度太慢的数据结构。如果你需要在 o (1)时间内解决一个问题，在解决方案中不可能使用二叉树，因为一棵二叉树总是至少需要 o (log n)时间。
3. You can filter out algorithms if they are impossible to use for the given problem. If there isn’t a graph in the problem, you know that depth-first search can’t be relevant.
4. 如果算法不能用于给定的问题，可以过滤掉它们。如果问题中没有一个图表，你就知道深度优先搜索和这个问题没有关系。
4. Go through the use cases for the remaining data structures and see which ones are relevant to the problem at hand. The solution for the problem is going to be a combination of these use cases. All you need to do is piece together these different use cases and you’ll come up with the solution to the problem.
5. 检查剩余数据结构的用例，看看哪些用例与手头的问题相关。问题的解决方案将是这些用例的组合。您所需要做的就是将这些不同的用例拼凑在一起，然后您就会得到问题的解决方案。

Let’s go through the runtimes and main use cases for each data structure and algorithm. Then walk through a few examples so you can see just how easy it is to use the algorithm.

Runtimes and Use Cases 运行时和用例

 Algorithm or Data Structure 算法或数据结构 Runtime 运行时间 Use Cases 用例 Hash Tables 哈希表 O(1) insertion, lookup, and deletion.O (1)插入、查找和删除。 When you only need to store and lookup objects. 当只需要存储和查找对象时。 When you need to partition a list of objects distinct groups by some property (basically what a group by in SQL does) 当您需要使用某个属性对一个对象列表进行分区时(基本上就是 SQL 中的 group by 所做的) You need to count the number of distinct items in a list. 您需要计算列表中不同项的数量。 Linked Lists 链接列表 O(1) insertion, lookup, and deletion at the ends or next to a node you already have a pointer to.O (1)插入、查找和删除位于末端或已有指针的节点旁边。 The main use cases of a linked list revolve around the fact that a linked list maintains relative ordering of the elements. In programming interviews, a linked list is primarily used as either a stack or queue. 链表的主要用例围绕着这样一个事实: 链表维护元素的相对顺序。在编程面试中，链表主要用作堆栈或队列。 Binary Trees 二叉树 O(log n) insertion, lookup, and deletion.O (logn)插入、查找和删除。 Used when you need to keep your data in sorted order. This lets you quickly answer questions like how many elements fall into a given range or what is the Nth highest element in the tree. 当您需要将数据按排序顺序保存时使用。这可以让您快速回答一些问题，比如有多少个元素属于给定的范围，或者树中第 n 个最高的元素是什么。 Binary Search 二进制搜索 O(log n) O (log n) You need to find the number in a sorted array closest to another number. 您需要在排序后最接近另一个数字的数组中找到该数字。 You need to find the smallest number in a sorted array that is larger than another number. 您需要查找排序数组中比另一个数大的最小数字。 You need to find the largest number in a sorted array that is smaller than another number. 您需要找到排序数组中比另一个数字小的最大数字。 If you aren’t able to use a hash table for whatever reason, you can use a binary search to check if a given element is in a sorted array. 如果由于某种原因无法使用哈希表，则可以使用二进制搜索来检查给定元素是否在排序数组中。 Depth-first Search 深度优先搜索 O(n) O (n) Traverse an entire graph. 遍历整个图表 Find a specific element in a graph. 在图中找到一个特定的元素。 Find the connected components of a graph. 求图的连通分量。 Sorting 分类 O(n log n) O (n log n) Can be used if you need to process elements in a specific order. First sort by that order, then iterate through the elements. 如果您需要以特定顺序处理元素，则可以使用。首先按顺序排序，然后遍历元素。 Can be used to sort an array that you will later perform binary search on. 可用于对数组进行排序，稍后将对该数组执行二进制搜索。

Dynamic programming and recursion are a bit different in that they are general techniques for solving algorithm problems and not specific algorithms themselves. This means they don’t have concrete runtimes or use cases. The good news is after a bit of practice, it becomes fairly easy to recognize programs that can be solved with dynamic programming or recursion. My recommendation is practice a few problems that require dynamic programming or recursion so you can get a feel for them. Fully explaining dynamic programming and recursion is outside the scope of this post.

Examples 例子

Now let’s take a look at a few different interview problems and how we can use the algorithm to solve them.

问题 # 1: 建立速率限制器

This is a problem that I’ve seen multiple times across several different companies.

You need to write a function that can only be called at most N times within any 1 minute window. For example, it can be called at most 10 times in a minute. If the function is called more than N times it should throw an exception. The function should have expected O(1) performance.

Take a look at the data structures and algorithms we can use and see which ones can be used to solve the problem. Then try to see how you can use those data structures to solve the problem. Give it a shot before moving onto the solution.

• Hash Tables 哈希表
• Binary Trees 二叉树
• Depth-first search 深度优先搜索
• Binary Search 二进制搜索
• Sorting 分类
• Dynamic Programming 动态编程
• Recursion 递归

Solution 解决方案

Let’s start by eliminating all the data structures and algorithms that obviously can’t be used:

• Hash Tables 哈希表
• Binary Trees 二叉树 – Too slow. 太慢了
• Depth-first search 深度优先搜索 – Too slow. There’s also no graph to run DFS on.- 太慢了也没有可以运行 DFS 的图表。
• Binary Search 二进制搜索 – Too slow. Also no sorted array.- 太慢。也没有排序数组。
• Sorting 分类 – Too slow. There’s also no elements to sort.- 太慢了，也没有要分类的元素。
• Dynamic Programming 动态编程 – No way to apply dynamic programming to the problem.- 没有办法应用动态规划的问题。
• Recursion 递归 – No way to apply recursion to the problem.- 无法对问题应用递归。

That leaves just hash tables and linked lists. If we go through the common use cases for hash tables, there doesn’t appear to be any way to apply them to the problem.  We don’t need to quickly lookup different objects and we don’t need to partition a list of objects into different groups. That means we can cross off hash tables from the list.

The only data structure remaining are linked lists. Looking at the use cases the only two are for a stack and a queue. Can we use either of those to keep track of how many times a function has been called in the last minute? Yes! What we can do is keep a queue that has one entry for each time the function was called within the last minute. Whenever the function is called, we remove all entries from the queue that were inserted more than a minute ago. If the queue still has a length greater than N, we throw an exception. Otherwise we add a new entry to the queue with the current time. By keeping track of the length of the queue with a counter, we can determine with O(1) expected time, this function will have O(1) expected performance.

Problem #2: Anagrams 问题 # 2: 字谜

Given a list of words, produce a list of words in the input that are anagrams of at least one other word in the list. Two words are anagrams of each other if you can rearrange the letters in one to get the other. The runtime should be O(n) assuming the words have constant length.

Again attempt the problem yourself before reading the solution. Here’s the complete list of data structures and algorithms for reference:

• Hash Tables 哈希表
• Binary Trees 二叉树
• Depth-first search 深度优先搜索
• Binary Search 二进制搜索
• Sorting 分类
• Dynamic Programming 动态编程
• Recursion 递归

Solution 解决方案

Let’s start by eliminating the items from the list that can’t possibly be used for this problem:

• Hash Tables 哈希表
• Binary Trees 二叉树 – Too slow. 太慢了
• Depth-first search 深度优先搜索 – There’s no graph. 没有图表
• Binary Search 二进制搜索 – There’s no sorted array.- 没有排序数组。
• Sorting 分类 – Too slow. 太慢了
• Dynamic Programming 动态编程 – No way to apply DP to the problem.- 无法将 DP 应用于该问题。
• Recursion 递归 – No way to apply recursion to the problem.- 无法对问题应用递归。

That leaves us with just hash tables and linked lists. Linked lists don’t seem to be relevant to the problem since it doesn’t look like there’s any way a stack or queue would help us. That leaves only hash tables left.

The only hash table use case that seems relevant here is the ability to split apart elements in a list into separate groups based on a property. In this case, if had a way to split the list into separate groups based on which words are anagrams of each other, that would give us a way to solve the problem:

1. Split the list of words into separate groups based on which words are anagrams of each other.
2. 将单词列表分成单独的组，根据这些组，单词是彼此之间的字母组合。
3. Append all the groups together that have more than one word in it. This will produce a list of words that are anagrams of at least one other word in the input list.
4. 将包含一个以上单词的所有组追加到一起。这将产生一个单词列表，这些单词是输入列表中至少一个其他单词的变形组。

The only bit that’s left to solve is to find some property we can use to group anagrams together. We need to find some function f such that f(x) is the same as f(y) when x and y are anagrams of each other. There’s two different functions we can use to do this:

• Sort the characters of the words alphabetically. Since anagrams are all made up of the same letters. This will give us the same string for any pair of words that are anagrams of each.
• 按字母顺序排列单词的字符。因为字谜都是由相同的字母组成的。这将给我们相同的字符串对任何一对单词，是每个字母的颠倒组合。
• Produce a dictionary of the number of times each letter occurs in each word. This solution is a bit trickier since you will need to somehow use the dictionary as a key in a hash table. Some languages have a way to do this, while others don’t.
• 编写一本词典，说明每个单词中每个字母出现的次数。这个解决方案比较棘手，因为您需要以某种方式使用字典作为散列表中的键。一些语言有办法做到这一点，而其他语言没有。

Now that we know how we can group words that are anagrams of each other together, we can put everything together to produce the solution!

Now let’s try one more problem.

Problem #3: K-sorted 问题 # 3: k 排序

Given an array of objects that is partially sorted, sort the array. Every element in the array is at most distance k away from its actual spot. You aren’t given the target runtime for this problem.

As usual, here’s the list of algorithms and data structures:

• Hash Tables 哈希表
• Binary Trees 二叉树
• Depth-first search 深度优先搜索
• Binary Search 二进制搜索
• Sorting 分类
• Dynamic Programming 动态编程
• Recursion 递归

Solution 解决方案

Let’s first see if we can deduce anything about the runtime. The fastest runtime we could possibly achieve is O(n) since that’s how long it would take to iterate through the list. We could also always perform a normal sort on the list which gives us a runtime of O(n log n). Let’s see if it’s possible to do better than O(n log n).

Is it possible to get as fast as O(n)? Well, if k=n, this problem becomes the same as sorting the list, so it’s not possible to hit O(n) all the time. Maybe it’s still possible to achieve something better than O(n log n). Let’s now see which data structures and algorithms are relevant:

• Hash Tables 哈希表
• Binary Trees 二叉树
• Depth-first search 深度优先搜索 – No graph. - 没有图表
• Binary Search 二进制搜索 – No sorted array. - 没有排序数组
• Sorting 分类 – Too slow. 太慢了
• Dynamic Programming 动态编程 – Not relevant. - 无关紧要
• Recursion 递归 – Not relevant. - 无关紧要

Of the data structures left, the only data structure that is relevant to the problem is a binary tree. Binary trees are the only data structure in the list that deals with sorting elements. If you think a little about how a binary tree can be used to solve the problem, the answer becomes clear. We can keep a binary tree of the last k elements. We repeatedly remove the smallest element from the binary tree and add the next element from the input array. The full algorithm looks like this:

• Insert the first k elements of the input array into the binary tree.
• 将输入数组的前 k 个元素插入到二叉树中。
• Iterate through the rest of the input array. On each iteration remove the smallest element from the binary tree and add it to the result array. Then add the current element in the input list to the binary tree.
• 循环遍历输入数组的其余部分。在每次迭代中，从二叉树中删除最小的元素并将其添加到结果数组中。然后将输入列表中的当前元素添加到二叉树中。
• Once we get to the end of the input array, repeatedly remove the smallest element from the binary tree until the tree is empty.
• 一旦我们到达输入数组的末尾，重复从二叉树中删除最小的元素，直到树为空。

Analyzing the runtime of this solution, this gives us a runtime of O(n log k). Is it possible to do better than that? It seems intuitive that there won’t be a faster algorithm. What possible algorithm could have a runtime between O(n) and O(n log k), especially one that you would have to come up with in an interview? Here’s an informal proof that you can’t solve the problem faster than O(n log k). Given that it’s non-trivial to come up with, you wouldn’t be expected to come up with it in an interview. If you aren’t interested in the proof, you can skip it.

Let’s assume you have an algorithm that is faster than O(n log k). We can use this algorithm to come up with a sorting algorithm faster than O(n log n) which is impossible. Let’s say we have n/k separate lists, each of size k, with the elements of each list strictly greater than the elements of the previous. If you concatenate all the lists together, run k-sorted on them, and then split up every k elements into separate lists, you would have sorted n/k lists in less than O(n log k) time. This means on average, you sorted each list in less than O(n/(n/k) log k) = O(k log k) time which is impossible. Therefore no algorithm for k-sorting is faster than O(n log k).

That means we have found the optimal solution to the problem.

Hopefully by this point I’ve convinced you that the algorithm is an effective way for solving algorithm problems. Note that not only is it effective at solving problems in interviews, but if you’ve encountered an algorithm problem in the real world, you can use it to check if the problem has a solution composed of the basic data structures in our list.

If you want to learn about other ways to solve problems, I highly recommend the book How to Solve It. The book covers tons of different approaches you can use to solve any problem. How to Solve It has been a huge influence on how I approach any kind of problem today. 2022/01/03  阅读：30  主题：红绯 