2020-07-09 leetcode 恢复空格

作者: XaviSong | 来源:发表于2020-07-09 16:42 被阅读0次

2020-07-09 leetcode 恢复空格
LeetCode 面试题 17.13. 恢复空格 | Pytho
Generate Parentheses
2018.7.22（一）
这些电脑快捷键，你知道几个？
[数组]73. Set Matrix Zeroes
String to Integer (atoi)
Day12 替换空格+两个链表的第一个公共节点+第一个只出现一次
面试题5
LeetCode 每日一题 [40] 替换空格

哦，不！你不小心把一个长篇文章中的空格、标点都删掉了，并且大写也弄成了小写。像句子"I reset the computer. It still didn’t boot!"已经变成了"iresetthecomputeritstilldidntboot"。在处理标点符号和大小写之前，你得先把它断成词语。当然了，你有一本厚厚的词典dictionary，不过，有些词没在词典里。假设文章用sentence表示，设计一个算法，把文章断开，要求未识别的字符最少，返回未识别的字符数。

注意：本题相对原题稍作改动，只需返回未识别的字符数。

示例：

输入：
dictionary = ["looked","just","like","her","brother"]
sentence = "jesslookedjustliketimherbrother"
输出： 7
解释： 断句后为"jess looked just like tim her brother"，共7个未识别字符。

提示：

0 <= len(sentence) <= 1000
dictionary中总字符数不超过 150000。
你可以认为dictionary和sentence中只包含小写字母。

解题思路

方法一：直接动态规划

数组dp[i]的元素代表：第i个字符之前的未识别字符有几个。利用字符串切片来找是否与dictionary中的词匹配，匹配时状态转移方程为：
$dp[i] = min(dp[i],dp[i - len(word)])$
之所以会取min，是因为存在重叠的可能，譬如：

dictionary = ['leo','apple']
sentence = 'appleo'
'''
这种情况下，最少的未匹配方案就是'apple','o'
'''

最外层遍历sentence，在每个位置对dictionary中的单词再遍历，找切片字符串是否匹配，提前先给dictionary单词按照长度进行升序排序，不然会出错。

class Solution:
    def respace(self, dictionary: List[str], sentence: str) -> int:
        if not sentence:
            return 0
        if not dictionary:
            return len(sentence)
        dictionary.sort(key = lambda x: len(x))
        dp = [0] * (len(sentence)+1) # 多一个，参看dp的定义
        for i in range(1,len(sentence)+1):
            dp[i] = dp[i-1] + 1
            for word in dictionary:
                if len(word) > i:
                    break
                if sentence[i-len(word):i] == word:
                    dp[i] = min(dp[i], dp[i-len(word)])
        return dp[-1]

方法二：Trie树+DP

参照官方题解，对dictionary中的单词倒叙插入到树中，对于结束词打上word_end标记。

倒叙插入的Trie树会好好利用单词的相同后缀。如：

Trie树匹配过程.gif

代码：

# 根据给定的单词表构建嵌套的字典树
class Trie:
    def __init__(self):
        self.root = {}
        self.word_end = -1
    
    def insert(self, word):
        curNode = self.root
        # 将单词逆序构建
        for c in word[::-1]:
            if c not in curNode:
                curNode[c] = {}
            curNode = curNode[c]      
        curNode[self.word_end] = True

class Solution:
    def respace(self, dictionary: List[str], sentence: str) -> int:
        # 从给定的单词表中构建一颗逆序字典树
        tire = Trie()
        for word in dictionary:
            tire.insert(word)
        n = len(sentence)
        dp = [0] * (n + 1)
        for i in range(1, n + 1):
            dp[i] = dp[i-1] + 1

            # Trie树查找
            curNode = tire.root
            for j in range(i, 0, -1):
                c = sentence[j - 1]
                if c not in curNode:
                    break
                elif tire.word_end in curNode[c]:
                    dp[i] = min(dp[i], dp[j - 1])
                curNode = curNode[c]
        return dp[-1]