Design Search Autocomplete System
Design a search autocomplete system for a given list of sentences. The system should return the top 3 most frequent sentences that start with a given prefix.
Constraints:
- The input sentences will be a list of strings.
- The input prefix will be a string.
- The output should be a list of the top 3 most frequent sentences that start with the given prefix.
Examples:
Input: ["The quick brown fox jumps over the lazy dog", "The dog runs quickly", "The quick dog runs"], "The"
Output: ["The quick brown fox jumps over the lazy dog", "The dog runs quickly", "The quick dog runs"]
Explanation: The top 3 most frequent sentences that start with the prefix "The" are returned.
Solutions
Trie with Hash Table
The solution uses a Trie data structure to store the sentences and a hash table to store the frequency of each sentence. The search function uses a depth-first search to find all sentences that start with the given prefix and then sorts them based on their frequency.
class AutocompleteSystem {
private:
struct TrieNode {
unordered_map<char, TrieNode*> children;
bool end;
}
;
TrieNode* root;
unordered_map<string, int> frequency;
public:
AutocompleteSystem(vector<string> sentences) {
root = new TrieNode();
for (string sentence : sentences) {
TrieNode* node = root;
for (char c : sentence) {
if (node->children.find(c) == node->children.end()) node->children[c] = new TrieNode();
node = node->children[c];
}
node->end = true;
frequency[sentence]++;
}
}
vector<string> search(string prefix) {
TrieNode* node = root;
for (char c : prefix) {
if (node->children.find(c) == node->children.end()) return {
}
;
node = node->children[c];
}
vector<string> sentences;
dfs(node, prefix, sentences);
sort(sentences.begin(), sentences.end(), [this](string a, string b) {
return frequency[a] > frequency[b];
}
);
return vector<string>(sentences.begin(), sentences.begin() + min(3, (int)sentences.size()));
}
void dfs(TrieNode* node, string sentence, vector<string>& sentences) {
if (node->end) sentences.push_back(sentence);
for (auto& child : node->children) {
dfs(child.second, sentence + child.first, sentences);
}
}
}
;
Follow-up:
How would you optimize the solution to handle a large number of sentences?