美文网首页
ElasticSearch 内置 Analyzer 一览表

ElasticSearch 内置 Analyzer 一览表

作者: 字母数字或汉字 | 来源:发表于2017-01-12 15:28 被阅读601次

Analyzer

analyzer logical name description
standard analyzer standard standard tokenizer, standard filter, lower case filter, stop filter
simple analyzer simple lower case tokenizer
stop analyzer stop lower case tokenizer, stop filter
keyword analyzer keyword 不分词,内容整体作为一个token(not_analyzed)
pattern analyzer whitespace 正则表达式分词,默认匹配\W+
language analyzers lang 各种语言
snowball analyzer snowball standard tokenizer, standard filter, lower case filter, stop filter, snowball filter
custom analyzer custom 至少需要指定一个 Tokenizer, 零个或多个Token Filter, 零个或多个Char Filter

Character Filter

character filter logical name description
mapping char filter mapping 根据配置的映射关系替换字符
html strip char filter html_strip 去掉HTML元素
pattern replace char filter pattern_replace 用正则表达式处理字符串

Tokenizer

tokenizer logical name description
standard tokenizer standard
edge ngram tokenizer edgeNGram
keyword tokenizer keyword 不分词
letter analyzer letter 按单词分
lowercase analyzer lowercase letter tokenizer, lower case filter
ngram analyzers nGram
whitespace analyzer whitespace 以空格为分隔符拆分
pattern analyzer pattern 定义分隔符的正则表达式
uax email url analyzer uax_url_email 不拆分 url 和 email
path hierarchy analyzer path_hierarchy 处理类似 /path/to/somthing样式的字符串

Token Filter

token filter logical name description
standard filter standard
ascii folding filter asciifolding
length filter length 去掉太长或者太短的
lowercase filter lowercase 转成小写
ngram filter nGram
edge ngram filter edgeNGram
porter stem filter porterStem 波特词干算法
shingle filter shingle 定义分隔符的正则表达式
stop filter stop 移除 stop words
word delimiter filter word_delimiter 将一个单词再拆成子分词
stemmer token filter stemmer
stemmer override filter stemmer_override
keyword marker filter keyword_marker
keyword repeat filter keyword_repeat
kstem filter kstem
snowball filter snowball
phonetic filter phonetic 插件
synonym filter synonyms 处理同义词
compound word filter dictionary_decompounder, hyphenation_decompounder 分解复合词
reverse filter reverse 反转字符串
elision filter elision 去掉缩略语
truncate filter truncate 截断字符串
unique filter unique
pattern capture filter pattern_capture
pattern replace filte pattern_replace 用正则表达式替换
trim filter trim 去掉空格
limit token count filter limit 限制 token 数量
hunspell filter hunspell 拼写检查
common grams filter common_grams
normalization filter arabic_normalization, persian_normalization

相关文章

网友评论

      本文标题:ElasticSearch 内置 Analyzer 一览表

      本文链接:https://www.haomeiwen.com/subject/elndbttx.html