Design Search Autocomplete System
Design a search autocomplete system for a given list of sentences. The system should return the top 3 most frequent sentences that start with a given prefix.
Constraints:
- The input sentences will be a list of strings.
- The input prefix will be a string.
- The output should be a list of the top 3 most frequent sentences that start with the given prefix.
Examples:
Input: ["The quick brown fox jumps over the lazy dog", "The dog runs quickly", "The quick dog runs"], "The"
Output: ["The quick brown fox jumps over the lazy dog", "The dog runs quickly", "The quick dog runs"]
Explanation: The top 3 most frequent sentences that start with the prefix "The" are returned.
Solutions
Trie with Hash Table
The solution uses a Trie data structure to store the sentences and a hash table to store the frequency of each sentence. The search function uses a depth-first search to find all sentences that start with the given prefix and then sorts them based on their frequency.
class AutocompleteSystem:
def __init__(self, sentences):
self.trie = {}
self.frequency = {}
for sentence in sentences:
node = self.trie
for char in sentence:
if char not in node:
node[char] = {}
node = node[char]
node['end'] = True
self.frequency[sentence] = self.frequency.get(sentence, 0) + 1
def search(self, prefix):
node = self.trie
for char in prefix:
if char not in node:
return []
node = node[char]
sentences = []
self.dfs(node, prefix, sentences)
sentences.sort(key=lambda x:
self.frequency[x], reverse=True)
return sentences[:3]
def dfs(self, node, sentence, sentences):
if 'end' in node:
sentences.append(sentence)
for char in node:
if char != 'end':
self.dfs(node[char], sentence + char, sentences)
Follow-up:
How would you optimize the solution to handle a large number of sentences?