LeetCode 3093. 最长公共后缀查询

历史记录

清除记录

猜你想搜

AcWing热点
App
登录/注册

LeetCode 3093. 最长公共后缀查询原题链接困难

作者：

wzc1995 , 2024-03-24 12:03:59 , 所有人可见 , 阅读 18

题目描述

给你两个字符串数组 wordsContainer 和 wordsQuery。

对于每个 wordsQuery[i]，你需要从 wordsContainer 中找到一个与 wordsQuery[i] 有 最长公共后缀 的字符串。如果 wordsContainer 中有两个或者更多字符串有最长公共后缀，那么答案为长度最短的。如果有超过两个字符串有相同最短长度，那么答案为它们在 wordsContainer 中出现更早的一个。

请你返回一个整数数组 ans，其中 ans[i] 是 wordsContainer 中与 wordsQuery[i] 有 最长公共后缀 字符串的下标。

样例

输入：wordsContainer = ["abcd","bcd","xbcd"], wordsQuery = ["cd","bcd","xyz"]

输出：[1,1,1]

解释：

我们分别来看每一个 wordsQuery[i]：

对于 wordsQuery[0] = "cd"，wordsContainer 中有最长公共后缀 "cd" 的字符串下标分别为 0，1 和 2。
    这些字符串中，答案是下标为 1 的字符串，因为它的长度为 3，是最短的字符串。
对于 wordsQuery[1] = "bcd"，wordsContainer 中有最长公共后缀 "bcd" 的字符串下标分别为 0，1 和 2。
    这些字符串中，答案是下标为 1 的字符串，因为它的长度为 3，是最短的字符串。
对于 wordsQuery[2] = "xyz"，wordsContainer 中没有字符串跟它有公共后缀，所以最长公共后缀为 ""，
    下标为 0，1 和 2 的字符串都得到这一公共后缀。这些字符串中，答案是下标为 1 的字符串，
    因为它的长度为 3，是最短的字符串。

输入：wordsContainer = ["abcdefgh","poiuygh","ghghgh"], wordsQuery = ["gh","acbfgh","acbfegh"]

输出：[2,0,2]

解释：

我们分别来看每一个 wordsQuery[i]：

对于 wordsQuery[0] = "gh"，wordsContainer 中有最长公共后缀 "gh" 的字符串下标分别为 0 ，1 和 2。
    这些字符串中，答案是下标为 2 的字符串，因为它的长度为 6，是最短的字符串。
对于 wordsQuery[1] = "acbfgh"，只有下标为 0 的字符串有最长公共后缀 "fgh"。
    所以尽管下标为 2 的字符串是最短的字符串，但答案是 0。
对于 wordsQuery[2] = "acbfegh"，wordsContainer 中有最长公共后缀 "gh" 的字符串下标分别为 0，1 和 2。
    这些字符串中，答案是下标为 2 的字符串，因为它的长度为 6，是最短的字符串。

限制

1 <= wordsContainer.length, wordsQuery.length <= 10^4
1 <= wordsContainer[i].length <= 5 * 10^3
1 <= wordsQuery[i].length <= 5 * 10^3
wordsContainer[i] 只包含小写英文字母。
wordsQuery[i] 只包含小写英文字母。
wordsContainer[i].length 的和至多为 5 * 10^5。
wordsQuery[i].length 的和至多为 5 * 10^5。

算法

(字典树，深度优先遍历) $O(m + |\Sigma| \cdot \Sigma_i \text{wordsContainer}_i.length + \Sigma_i \text{wordsQuery}_i.length)$

将 wordsContainer 的所有字符串逆序插入到一棵字典树中。字典树的每个节点维护 $idx$ 和 $len$ 两个状态，分别表示匹配到当前位置前缀时，最短字符串的下标和长度。
插入时，直接更新最后一个节点的状态为当前字符串的下标和长度（倒序遍历字符串，当前字符串的长度一定是最小的）。
然后通过深度优先遍历，自底向上的更新所有节点的 $idx$ 和 $len$。
查询时，逆序找到最长匹配的位置，将这个位置的 $idx$ 记录到答案中。

时间复杂度

构造和深度优先遍历字典树的时间复杂度为 $(|\Sigma| \cdot \Sigma_i \text{wordsContainer}_i.length)$。
匹配答案的时间复杂度为 $O(m + \Sigma_i \text{wordsQuery}_i.length)$。
故总时间复杂度为 $O(m + |\Sigma| \cdot \Sigma_i \text{wordsContainer}_i.length + \Sigma_i \text{wordsQuery}_i.length)$。

空间复杂度

需要 $(m + |\Sigma| \cdot \Sigma_i \text{wordsContainer}_i.length)$ 的额外空间存储字典树、递归的系统栈和答案。

C++ 代码

struct Node {
    Node *nxt[26];
    int idx, len;

    Node() {
        memset(nxt, 0, sizeof(nxt));
        idx = -1;
        len = INT_MAX;
    }
};

class Solution {
private:
    Node *root;

    void dfs(Node *u) {
        for (int i = 0; i < 26; i++) {
            if (u->nxt[i] == NULL)
                continue;

            dfs(u->nxt[i]);

            if (u->nxt[i]->len < u->len) {
                u->len = u->nxt[i]->len;
                u->idx = u->nxt[i]->idx;
            } else if (u->nxt[i]->len == u->len && u->nxt[i]->idx < u->idx) {
                u->idx = u->nxt[i]->idx;
            }
        }
    }

public:
    vector<int> stringIndices(vector<string>& wordsContainer, vector<string>& wordsQuery) {
        const int n = wordsContainer.size(), m = wordsQuery.size();

        root = new Node();

        for (int i = n - 1; i >= 0; i--) {
            Node *p = root;
            for (int j = wordsContainer[i].size() - 1; j >= 0; j--) {
                int c = wordsContainer[i][j] - 'a';
                if (p->nxt[c] == NULL)
                    p->nxt[c] = new Node();
                p = p->nxt[c];
            }

            p->idx = i;
            p->len = wordsContainer[i].size();
        }

        dfs(root);

        vector<int> ans(m);
        for (int i = 0; i < m; i++) {
            Node *p = root;
            for (int j = wordsQuery[i].size() - 1; j >= 0; j--) {
                int c = wordsQuery[i][j] - 'a';
                if (p->nxt[c] == NULL)
                    break;

                p = p->nxt[c];
            }


            ans[i] = p->idx;
        }

        return ans;
    }
};

LeetCode 3093. 最长公共后缀查询 原题链接 困难