# Elasticsearch8+
ECE 认证工程师备考
# 安装
使用 Ubuntu16
系统
1 | apt-get install elasticsearch |
1 | =================================================================================== |
1 | ./elasticsearch-setup-passwords interactive |
1 | =================================================================================== |
# Data Management
# Define an index that satisfies a given set of requirements —— 定义一个满足条件的索引
- analysis
- setting
- mapper
# Define and use an index template for a given pattern that satisfies a given set of requirements
1 | PUT index-test |
1 | PUT test_index |
# Define and use a dynamic template that satisfies a given set of requirements
8 + 版本后
_template
逐渐被废弃,改为_component_template
和_index_template
1 | PUT _index_template/my_template_name |
# Define an Index Lifecycle Management policy for a time-series index
kibana: Stack Management >>> Index Lifecycle Management
# Define an index template that creates a new data stream
类似 ILM
need
-
a matching index template
-
a
@timestamp
field
原理:read all indices (now + back) /write now/newest index
索引格式: .ds-<data-stream>-<yyyy.MM.dd>-000001
add a new document: not use PUT /<target>/_doc/<_id>
,use PUT /<target>/_create/<_id>
# Use the Data Visualizer to upload a text file into Elasticsearch(不考)
kibana >>> machine learning >>> data visualizer
# Searching Data
# Write and execute a search query for terms and/or phrases in one or more fields of an index
match、term、match_phrase 的区别
match
查询会进行分词,匹配到单个即可term
查询不会分词,必须完全匹配(适用 keyword 类型,类似 like)match_phrase
查询会分词,必须完全匹配(适用 text 类型,类似 like)
# Boolean query
1 | POST _search |
minimum_should_match
存在 should 时默认为 1,否则默认为 0
boost
:计算分数时使用
bool.filter
:不计算分数
_name
:每一个 top query 都可以添加,response 的 matched_queries
查看结果满足哪个 query。
# Boosting query & Constant score query
1 | GET /_search |
# Disjunction max query
1 | GET /_search |
# Intervals query
精确控制查询的 terms 顺序、terms 之间的距离以及包含关系的灵活控制
1 | GET /_search |
# Match query
匹配 text、number、date or boolean value
1 | GET _search |
# Match boolean prefix
1 | GET /_search |
# match phrase query & match phrase prefix query
1 | GET my-index-00002/_search |
# Combined fields
1 | GET _search |
nested query
# Write and execute a search query that is a Boolean combination of multiple queries and filters
1 | GET my-index-000001/_msearch |
返回的 response 数组结果集,每个元素为一个查询结果
# search template
# Write an asynchronous search
异步搜索
1 | POST my_index/_async_search |
# Write and execute metric and bucket aggregations
-
前置条件:【doc_values: true】
-
不能用于
text
类型 -
缓存,aggs 执行会将高频率的数据进行缓存,使用
"size":0
有效减少不必要的缓存,缓存适用于相同的preference string
demo search aggs 如下
1 | // typed_keys: 原本return name, 现在return type#name |
# metric aggs
计算指标:Max/Min、Average、Sum
不支持 sub-aggregations
-
Boxplot
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19"boxplot": {
"field": "load_time"
}
// response
"aggregations": {
"load_time_boxplot": {
"min": 0.0,
"max": 990.0,
// 第一个四分位数(前一半数据的中位数)
"q1": 165.0,
// 第二个四分位数(中位数)
"q2": 445.0,
// 第三个四分位数(后一半数据的中位数)
"q3": 725.0,
"lower": 0.0,
"upper": 990.0
}
}
# bucket aggs
支持 sub-aggregations
-
Adjacency matrix
【向量】filter A B C -> return A B C A&B A&C B&C
-
Auto-interval date histogram
自定义
buckets
,es 自动划分需要区间1
2
3
4"auto_date_histogram": {
"field": "date",
"buckets": 10
} -
Children
适用于
join
类型1
2
3
4
5
6
7"aggs": {
"join-aggs": {
"children": {
"type": "answer"
}
}
} -
Composite
组合,合并多个 aggs 查询,source1:A1、A2,source2:B1、B2 -> A1B1、A1B2、A2B1、A2B2 的查询结果
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22"aggs": {
"my-aggs-name": {
"composite": {
"sources": [
{
"name01": {
"type": {
"field": "field-name"
}
}
},
{
"name02": {
"type": {
"field": "field-name"
}
}
}
]
}
}
} -
Histogram
直方图 / 柱状图
bucket_key = Math.floor((value - offset) / interval) * interval + offset
1
2
3
4
5
6
7"histogram": {
"field": "field-name",
// 间隔
"interval": 5,
// 过滤 count < min_doc_count 的区间
"min_doc_count": 1
} -
Date histogram
date 作为 field type 的特殊 histogram
bucket_key = Math.floor(value / interval) * interval
-
Range
范围聚合查询,from <= value < to
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16"aggs": {
"price_ranges": {
"range": {
"keyed": true,
// 可以查询script field
"field": "price",
"ranges": [
// 可以没有to,也可以没有from
{ "to": 100.0 },
// 范围可以重叠
{ "from": 80.0, "to": 200.0 },
{ "from": 200.0 }
]
}
}
} -
Date range
-
Global
定义一个单独包含全部数据的 bucket,使得当前聚合不受 query 查询的影响(全部数据作为聚合目标)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17POST /sales/_search?size=0
{
"query": {
"match": { "type": "t-shirt" }
},
"aggs": {
// 全部商品平均价格(match_all)
"all_products": {
"global": {},
"aggs": {
"avg_price": { "avg": { "field": "price" } }
}
},
// t-shirt商品平均价格(query有关)
"t_shirts": { "avg": { "field": "price" } }
}
} -
Missing
缺少字段或者字段值为 null 的
document
缺少bucket
,通过missing
聚合统计1
2
3
4
5"aggs": {
"products_without_a_price": {
"missing": { "field": "price" }
}
} -
IP prefix 和 IP range
IP 地址相关的聚合,xxx.xxx.xxx.xxx
-
Random sampler
随机取样聚合,0 < probability < 0.5 || probability = 1
随机取集合 * probability 的集合,每次结果可能不相同
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16{
"aggregations": {
"sampling": {
"random_sampler": {
"probability": 0.1
},
"aggs": {
"price_percentiles": {
"percentiles": {
"field": "taxful_total_price"
}
}
}
}
}
}
# pipeline aggs
# Other
# CCR(Cross-cluster replication) & CCS(Cross-cluster search)
配置 elasticsearch.yml
1 | transport.host: xxx.xxx.xxx.xxx |
证书问题
1)使用相同的 CA 证书:复制 ca.crt
和 ca.key
2)使用不同证书
# 创建集群远程连接
Stack Management >>> Remote Clusters
或通过 Dev Tools 实现
1 |
|
# 配置跨集群复制
Stack Management >>> Cross Cluster Replication
参数 | 说明 |
---|---|
Remote cluster | 添加的集群名称 |
Leader index | 待迁移的索引。 |
Follower index | 迁移数据生成的索引。索引名称不可重复。 |