Loading...
墨滴

逸之

2022/01/03  阅读:30  主题:红绯

一种通过编程面试的算法

An Algorithm for Passing Programming Interviews

一种通过编程面试的算法

Over the past few years, I’ve interviewed with a dozen or so companies and have completed ~50 or so individual algorithm problems. I’m frequently given feedback that I did a great job at the algorithms problem. In this post, I’m going to share how exactly I approach algorithm problems.

在过去的几年里,我采访了十几家公司,完成了大约50个算法问题。我经常收到反馈,说我在算法问题上做得很好。在这篇文章中,我将分享我是如何处理算法问题的。

Thought Process 思维过程

A guiding principle I use is that every interview problem is designed to be solved. You aren’t going to be asked to prove Fermat’s Last Theorem in an interview. If you are given some impossible problem, it’s not like you’re going to have much of a chance anyways.

我使用的一个指导原则是,每个面试问题都是为了解决而设计的。你不会被要求在访谈中证明费马最后定理。如果你遇到了一些不可能的问题,反正你也不会有多少机会。

In my experience, around 80% of the time an algorithm problem will boil down to a few core data structures and algorithms. The data structures I see the most are are:

根据我的经验,大约80% 的情况下,算法问题可以归结为几个核心数据结构和算法。我看到的最多的数据结构是:

  • Hash Tables 哈希表
  • Linked Lists 链接列表
  • Binary Search Trees 二叉搜索树

As for the algorithms:

至于算法:

  • Depth-first search 深度优先搜索
  • Binary Search 二进制搜索
  • Sorting Algorithms 排序算法

(You probably won’t be expected to implement a binary search or sorting algorithm, but you should be aware that they exist.)

(你可能不会被要求实现二进制搜索或排序算法搜索,但是你应该意识到它们的存在。)

There’s also two additional programming techniques you should be aware of:

还有两种额外的编程技巧你需要注意:

  • Dynamic Programming 动态编程
  • Recursion 递归

Assuming the solution to your problem can be solved using the items in this list, we can come up with a simple algorithm to solve the problem.

假设问题的解决方案可以用列表中的项目来解决,我们可以想出一个简单的算法来解决这个问题。

The Algorithm 算法

The algorithm goes like this:

算法是这样的:

  1. After being given the algorithm problem, ask for the specific runtime your solution will need to have. Almost certainly, the interviewer will tell you.
  2. 在给出算法问题之后,请求您的解决方案将需要具有的特定运行时。几乎可以肯定,面试官会告诉你。
  3. Filter out the data structures and algorithms that obviously aren’t relevant to the problem at hand. This will eliminate most of the list and you will usually be left with 2-3 data structures and algorithms.过滤掉那些明显与手头问题无关的数据结构和算法。这将消除大部分的列表,您通常会留下2-3个数据结构和算法。
    1. You can filter out data structures that are too slow. If you need to solve a problem in O(1) time, it’s impossible to use a binary tree in the solution, because a binary tree will always take at least O(log n) time.
    2. 您可以过滤掉速度太慢的数据结构。如果你需要在 o (1)时间内解决一个问题,在解决方案中不可能使用二叉树,因为一棵二叉树总是至少需要 o (log n)时间。
    3. You can filter out algorithms if they are impossible to use for the given problem. If there isn’t a graph in the problem, you know that depth-first search can’t be relevant.
    4. 如果算法不能用于给定的问题,可以过滤掉它们。如果问题中没有一个图表,你就知道深度优先搜索和这个问题没有关系。
  4. Go through the use cases for the remaining data structures and see which ones are relevant to the problem at hand. The solution for the problem is going to be a combination of these use cases. All you need to do is piece together these different use cases and you’ll come up with the solution to the problem.
  5. 检查剩余数据结构的用例,看看哪些用例与手头的问题相关。问题的解决方案将是这些用例的组合。您所需要做的就是将这些不同的用例拼凑在一起,然后您就会得到问题的解决方案。

Let’s go through the runtimes and main use cases for each data structure and algorithm. Then walk through a few examples so you can see just how easy it is to use the algorithm.

让我们来看一下每个数据结构和算法的运行时和主要用例。然后浏览一些例子,这样您就可以看到使用该算法是多么简单。

Runtimes and Use Cases 运行时和用例

Algorithm or Data Structure

算法或数据结构

Runtime 运行时间

Use Cases

用例

Hash Tables

哈希表

O(1) insertion, lookup, and deletion.O (1)插入、查找和删除。
  • When you only need to store and lookup objects.
  • 当只需要存储和查找对象时。
  • When you need to partition a list of objects distinct groups by some property (basically what a group by in SQL does)
  • 当您需要使用某个属性对一个对象列表进行分区时(基本上就是 SQL 中的 group by 所做的)
  • You need to count the number of distinct items in a list.
  • 您需要计算列表中不同项的数量。

Linked Lists

链接列表

O(1) insertion, lookup, and deletion at the ends or next to a node you already have a pointer to.O (1)插入、查找和删除位于末端或已有指针的节点旁边。

The main use cases of a linked list revolve around the fact that a linked list maintains relative ordering of the elements. In programming interviews, a linked list is primarily used as either a stack or queue.

链表的主要用例围绕着这样一个事实: 链表维护元素的相对顺序。在编程面试中,链表主要用作堆栈或队列。

Binary Trees

二叉树

O(log n) insertion, lookup, and deletion.O (logn)插入、查找和删除。

Used when you need to keep your data in sorted order. This lets you quickly answer questions like how many elements fall into a given range or what is the Nth highest element in the tree.

当您需要将数据按排序顺序保存时使用。这可以让您快速回答一些问题,比如有多少个元素属于给定的范围,或者树中第 n 个最高的元素是什么。

Binary Search

二进制搜索

O(log n) O (log n)
  • You need to find the number in a sorted array closest to another number.
  • 您需要在排序后最接近另一个数字的数组中找到该数字。
  • You need to find the smallest number in a sorted array that is larger than another number.
  • 您需要查找排序数组中比另一个数大的最小数字。
  • You need to find the largest number in a sorted array that is smaller than another number.
  • 您需要找到排序数组中比另一个数字小的最大数字。
  • If you aren’t able to use a hash table for whatever reason, you can use a binary search to check if a given element is in a sorted array.
  • 如果由于某种原因无法使用哈希表,则可以使用二进制搜索来检查给定元素是否在排序数组中。

Depth-first Search

深度优先搜索

O(n) O (n)
  • Traverse an entire graph. 遍历整个图表
  • Find a specific element in a graph.
  • 在图中找到一个特定的元素。
  • Find the connected components of a graph.
  • 求图的连通分量。
Sorting 分类 O(n log n) O (n log n)
  • Can be used if you need to process elements in a specific order. First sort by that order, then iterate through the elements.
  • 如果您需要以特定顺序处理元素,则可以使用。首先按顺序排序,然后遍历元素。
  • Can be used to sort an array that you will later perform binary search on.
  • 可用于对数组进行排序,稍后将对该数组执行二进制搜索。

Dynamic programming and recursion are a bit different in that they are general techniques for solving algorithm problems and not specific algorithms themselves. This means they don’t have concrete runtimes or use cases. The good news is after a bit of practice, it becomes fairly easy to recognize programs that can be solved with dynamic programming or recursion. My recommendation is practice a few problems that require dynamic programming or recursion so you can get a feel for them. Fully explaining dynamic programming and recursion is outside the scope of this post.

动态编程和递归有一点不同,它们是解决算法问题的通用技术,而不是具体算法本身。这意味着它们没有具体的运行时或用例。好消息是,经过一段时间的实践,识别可以用动态编程或递归解决的程序变得相当容易。我的建议是练习一些需要动态编程或递归的问题,这样您就可以对它们有所了解。充分解释动态编程和递归不在本文的讨论范围之内。

Examples 例子

Now let’s take a look at a few different interview problems and how we can use the algorithm to solve them.

现在让我们看看一些不同的面试问题,以及我们如何使用算法来解决它们。

Problem #1: Build a rate limiter

问题 # 1: 建立速率限制器

This is a problem that I’ve seen multiple times across several different companies.

这个问题我在好几个不同的公司都见过很多次。

You need to write a function that can only be called at most N times within any 1 minute window. For example, it can be called at most 10 times in a minute. If the function is called more than N times it should throw an exception. The function should have expected O(1) performance.

你需要写一个函数,在任何一个1分钟的窗口中最多只能被调用 n 次。例如,它可以在一分钟内被调用最多10次。如果函数被调用超过 n 次,它应该抛出一个异常。函数应该具有 o (1)性能。

Take a look at the data structures and algorithms we can use and see which ones can be used to solve the problem. Then try to see how you can use those data structures to solve the problem. Give it a shot before moving onto the solution.

看看我们可以使用的数据结构和算法,看看哪些可以用来解决问题。然后尝试看看如何使用这些数据结构来解决问题。在移动到溶液之前试一试。

  • Hash Tables 哈希表
  • Linked Lists 链接列表
  • Binary Trees 二叉树
  • Depth-first search 深度优先搜索
  • Binary Search 二进制搜索
  • Sorting 分类
  • Dynamic Programming 动态编程
  • Recursion 递归

Solution 解决方案

Let’s start by eliminating all the data structures and algorithms that obviously can’t be used:

让我们从消除所有明显不能使用的数据结构和算法开始:

  • Hash Tables 哈希表
  • Linked Lists 链接列表
  • Binary Trees 二叉树 – Too slow. 太慢了
  • Depth-first search 深度优先搜索 – Too slow. There’s also no graph to run DFS on.- 太慢了也没有可以运行 DFS 的图表。
  • Binary Search 二进制搜索 – Too slow. Also no sorted array.- 太慢。也没有排序数组。
  • Sorting 分类 – Too slow. There’s also no elements to sort.- 太慢了,也没有要分类的元素。
  • Dynamic Programming 动态编程 – No way to apply dynamic programming to the problem.- 没有办法应用动态规划的问题。
  • Recursion 递归 – No way to apply recursion to the problem.- 无法对问题应用递归。

That leaves just hash tables and linked lists. If we go through the common use cases for hash tables, there doesn’t appear to be any way to apply them to the problem.  We don’t need to quickly lookup different objects and we don’t need to partition a list of objects into different groups. That means we can cross off hash tables from the list.

这样就只剩下散列表和链表了。如果我们研究哈希表的常见用例,似乎没有任何方法可以将它们应用到问题中。我们不需要快速查找不同的对象,也不需要将对象列表划分到不同的组中。这意味着我们可以从列表中删除散列表。

The only data structure remaining are linked lists. Looking at the use cases the only two are for a stack and a queue. Can we use either of those to keep track of how many times a function has been called in the last minute? Yes! What we can do is keep a queue that has one entry for each time the function was called within the last minute. Whenever the function is called, we remove all entries from the queue that were inserted more than a minute ago. If the queue still has a length greater than N, we throw an exception. Otherwise we add a new entry to the queue with the current time. By keeping track of the length of the queue with a counter, we can determine with O(1) expected time, this function will have O(1) expected performance.

剩下的唯一数据结构是链表。查看用例时,只有两个用例用于堆栈和队列。我们可以使用这两种方法中的任何一种来跟踪一个函数在最后一分钟被调用了多少次吗?太好了!我们可以做的是保持一个队列,每次在最后一分钟内调用该函数时,它都有一个条目。每当调用该函数时,我们都会从队列中删除一分钟前插入的所有条目。如果队列的长度仍然大于 n,则抛出异常。否则,我们将使用当前时间向队列添加一个新条目。通过使用计数器跟踪队列的长度,我们可以用 o (1)期望时间确定该函数将具有 o (1)期望性能。

Problem #2: Anagrams 问题 # 2: 字谜

Given a list of words, produce a list of words in the input that are anagrams of at least one other word in the list. Two words are anagrams of each other if you can rearrange the letters in one to get the other. The runtime should be O(n) assuming the words have constant length.

给定一个单词列表,在输入中生成一个单词列表,这些单词是列表中至少一个其他单词的变形词。如果你能把字母重新排列成一个字母,那么这两个单词就是对方的字母组合。运行时应该是 o (n)假设字符具有常量长度。

Again attempt the problem yourself before reading the solution. Here’s the complete list of data structures and algorithms for reference:

在阅读解决方案之前,再次尝试解决这个问题。以下是数据结构和算法的完整列表,供参考:

  • Hash Tables 哈希表
  • Linked Lists 链接列表
  • Binary Trees 二叉树
  • Depth-first search 深度优先搜索
  • Binary Search 二进制搜索
  • Sorting 分类
  • Dynamic Programming 动态编程
  • Recursion 递归

Solution 解决方案

Let’s start by eliminating the items from the list that can’t possibly be used for this problem:

让我们从清单中剔除那些不可能用于解决这个问题的项目开始:

  • Hash Tables 哈希表
  • Linked Lists 链接列表
  • Binary Trees 二叉树 – Too slow. 太慢了
  • Depth-first search 深度优先搜索 – There’s no graph. 没有图表
  • Binary Search 二进制搜索 – There’s no sorted array.- 没有排序数组。
  • Sorting 分类 – Too slow. 太慢了
  • Dynamic Programming 动态编程 – No way to apply DP to the problem.- 无法将 DP 应用于该问题。
  • Recursion 递归 – No way to apply recursion to the problem.- 无法对问题应用递归。

That leaves us with just hash tables and linked lists. Linked lists don’t seem to be relevant to the problem since it doesn’t look like there’s any way a stack or queue would help us. That leaves only hash tables left.

这样就只剩下散列表和链表了。链接列表似乎与问题无关,因为看起来堆栈或队列没有任何帮助。这样就只剩下散列表了。

The only hash table use case that seems relevant here is the ability to split apart elements in a list into separate groups based on a property. In this case, if had a way to split the list into separate groups based on which words are anagrams of each other, that would give us a way to solve the problem:

这里看起来唯一相关的散列表用例是能够根据属性将列表中的元素拆分为单独的组。在这种情况下,如果有一种方法可以将列表分成单独的组,根据这些组,每个单词都是相互重组的,那么这将给我们一种解决问题的方法:

  1. Split the list of words into separate groups based on which words are anagrams of each other.
  2. 将单词列表分成单独的组,根据这些组,单词是彼此之间的字母组合。
  3. Append all the groups together that have more than one word in it. This will produce a list of words that are anagrams of at least one other word in the input list.
  4. 将包含一个以上单词的所有组追加到一起。这将产生一个单词列表,这些单词是输入列表中至少一个其他单词的变形组。

The only bit that’s left to solve is to find some property we can use to group anagrams together. We need to find some function f such that f(x) is the same as f(y) when x and y are anagrams of each other. There’s two different functions we can use to do this:

剩下唯一需要解决的问题就是找到一些属性,我们可以用它来把字谜组合在一起。我们需要找到一个函数 f,使得当 x 和 y 互相变换时,f (x)和 f (y)是一样的。我们可以使用两种不同的函数来实现这一点:

  • Sort the characters of the words alphabetically. Since anagrams are all made up of the same letters. This will give us the same string for any pair of words that are anagrams of each.
  • 按字母顺序排列单词的字符。因为字谜都是由相同的字母组成的。这将给我们相同的字符串对任何一对单词,是每个字母的颠倒组合。
  • Produce a dictionary of the number of times each letter occurs in each word. This solution is a bit trickier since you will need to somehow use the dictionary as a key in a hash table. Some languages have a way to do this, while others don’t.
  • 编写一本词典,说明每个单词中每个字母出现的次数。这个解决方案比较棘手,因为您需要以某种方式使用字典作为散列表中的键。一些语言有办法做到这一点,而其他语言没有。

Now that we know how we can group words that are anagrams of each other together, we can put everything together to produce the solution!

现在我们知道了如何将相互组合在一起的字词组合在一起,我们可以将所有的字词组合在一起来产生解决方案!

Now let’s try one more problem.

现在让我们再试一个问题。

Problem #3: K-sorted 问题 # 3: k 排序

Given an array of objects that is partially sorted, sort the array. Every element in the array is at most distance k away from its actual spot. You aren’t given the target runtime for this problem.

给定一个对象数组,对其进行部分排序,对数组进行排序。数组中的每个元素离实际点的距离最远为 k。对于这个问题,没有提供目标运行时。

As usual, here’s the list of algorithms and data structures:

像往常一样,下面是算法和数据结构的列表:

  • Hash Tables 哈希表
  • Linked Lists 链接列表
  • Binary Trees 二叉树
  • Depth-first search 深度优先搜索
  • Binary Search 二进制搜索
  • Sorting 分类
  • Dynamic Programming 动态编程
  • Recursion 递归

Solution 解决方案

Let’s first see if we can deduce anything about the runtime. The fastest runtime we could possibly achieve is O(n) since that’s how long it would take to iterate through the list. We could also always perform a normal sort on the list which gives us a runtime of O(n log n). Let’s see if it’s possible to do better than O(n log n).

Is it possible to get as fast as O(n)? Well, if k=n, this problem becomes the same as sorting the list, so it’s not possible to hit O(n) all the time. Maybe it’s still possible to achieve something better than O(n log n). Let’s now see which data structures and algorithms are relevant:

让我们先看看能否推断出运行时的任何信息。我们可能实现的最快的运行时是 o (n) ,因为这是遍历列表所需的时间。我们还可以对列表执行一个常规排序,这样就得到了一个 o (nlogn)的运行时。让我们看看是否有可能做得比 o (nlogn)更好。有可能达到 o (n)那么快吗?如果 k = n,这个问题就变成了对列表进行排序,所以不可能一直命中 o (n)。也许仍然有可能实现比 o (n log n)更好的目标。现在让我们看看哪些数据结构和算法是相关的:

  • Hash Tables 哈希表
  • Linked Lists 链接列表
  • Binary Trees 二叉树
  • Depth-first search 深度优先搜索 – No graph. - 没有图表
  • Binary Search 二进制搜索 – No sorted array. - 没有排序数组
  • Sorting 分类 – Too slow. 太慢了
  • Dynamic Programming 动态编程 – Not relevant. - 无关紧要
  • Recursion 递归 – Not relevant. - 无关紧要

Of the data structures left, the only data structure that is relevant to the problem is a binary tree. Binary trees are the only data structure in the list that deals with sorting elements. If you think a little about how a binary tree can be used to solve the problem, the answer becomes clear. We can keep a binary tree of the last k elements. We repeatedly remove the smallest element from the binary tree and add the next element from the input array. The full algorithm looks like this:

在剩下的数据结构中,唯一与问题相关的数据结构是二叉树。二叉树是列表中唯一处理排序元素的数据结构。如果你稍微思考一下如何用二叉树来解决这个问题,答案就会变得清晰起来。我们可以保留最后一个 k 元素的二叉树。我们重复地从二叉树中删除最小的元素,并从输入数组中添加下一个元素。完整的算法如下:

  • Insert the first k elements of the input array into the binary tree.
  • 将输入数组的前 k 个元素插入到二叉树中。
  • Iterate through the rest of the input array. On each iteration remove the smallest element from the binary tree and add it to the result array. Then add the current element in the input list to the binary tree.
  • 循环遍历输入数组的其余部分。在每次迭代中,从二叉树中删除最小的元素并将其添加到结果数组中。然后将输入列表中的当前元素添加到二叉树中。
  • Once we get to the end of the input array, repeatedly remove the smallest element from the binary tree until the tree is empty.
  • 一旦我们到达输入数组的末尾,重复从二叉树中删除最小的元素,直到树为空。

Analyzing the runtime of this solution, this gives us a runtime of O(n log k). Is it possible to do better than that? It seems intuitive that there won’t be a faster algorithm. What possible algorithm could have a runtime between O(n) and O(n log k), especially one that you would have to come up with in an interview? Here’s an informal proof that you can’t solve the problem faster than O(n log k). Given that it’s non-trivial to come up with, you wouldn’t be expected to come up with it in an interview. If you aren’t interested in the proof, you can skip it.

Let’s assume you have an algorithm that is faster than O(n log k). We can use this algorithm to come up with a sorting algorithm faster than O(n log n) which is impossible. Let’s say we have n/k separate lists, each of size k, with the elements of each list strictly greater than the elements of the previous. If you concatenate all the lists together, run k-sorted on them, and then split up every k elements into separate lists, you would have sorted n/k lists in less than O(n log k) time. This means on average, you sorted each list in less than O(n/(n/k) log k) = O(k log k) time which is impossible. Therefore no algorithm for k-sorting is faster than O(n log k).

通过分析这个解决方案的运行时间,我们得到了一个 o (nlogk)的运行时间。有可能做得更好吗?似乎直觉告诉我们不会有更快的算法。什么算法可以在 o (n)和 o (n log k)之间有一个运行时间,特别是你必须在面试中想出来的运行时间?这里有一个非正式的证明,证明你不能比 o (nlogk)更快地解决问题。考虑到这不是一件小事,你不会期望在面试中提出这个问题。如果你对证明不感兴趣,你可以跳过它。假设您有一个比 o (nlogk)更快的算法。我们可以使用这个算法来得到一个比 o (n log n)更快的排序算法,这是不可能的。假设我们有 n/k 个单独的列表,每个大小为 k,每个列表的元素都严格大于前一个列表的元素。如果您将所有列表连接在一起,对它们运行 k 排序,然后将每个 k 元素分割成单独的列表,那么您将在不到 o (nlogk)的时间内对 n/k 列表进行排序。这意味着,平均而言,您在小于 o (n/(n/k) log k) = o (k log k)的时间内对每个列表进行排序,这是不可能的。因此,没有任何 k 排序算法比 o (nlogk)快。

That means we have found the optimal solution to the problem.

这意味着我们已经找到了问题的最优解。

Hopefully by this point I’ve convinced you that the algorithm is an effective way for solving algorithm problems. Note that not only is it effective at solving problems in interviews, but if you’ve encountered an algorithm problem in the real world, you can use it to check if the problem has a solution composed of the basic data structures in our list.

希望通过这一点,我已经说服您,该算法是一个解决算法问题的有效途径。请注意,它不仅可以有效地解决面试中的问题,而且如果你在现实世界中遇到了算法问题,你可以使用它来检查问题是否有一个由我们列表中的基本数据结构组成的解决方案。

If you want to learn about other ways to solve problems, I highly recommend the book How to Solve It. The book covers tons of different approaches you can use to solve any problem. How to Solve It has been a huge influence on how I approach any kind of problem today.

如果你想学习其他解决问题的方法,我强烈推荐《如何解决问题》这本书。这本书涵盖了你可以用来解决任何问题的大量不同的方法。如何解决它对我今天处理任何问题的方式都产生了巨大的影响。

逸之

2022/01/03  阅读:30  主题:红绯

作者介绍

逸之