聚合示例
需求:计算每个studymodel下的商品数量
sql语句: select studymodel,count(*) from book group by studymodel
GET /book/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"group_by_model": {
"terms": { "field": "studymodel" }
}
}
}
需求:计算每个tags下的商品数量
设置字段"fielddata": true
PUT /book/_mapping/
{
"properties": {
"tags": {
"type": "text",
"fielddata": true
}
}
}
查询
GET /book/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"group_by_tags": {
"terms": { "field": "tags" }
}
}
}
需求:加上搜索条件,计算每个tags下的商品数量
GET /book/_search
{
"size": 0,
"query": {
"match": {
"description": "java程序员"
}
},
"aggs": {
"group_by_tags": {
"terms": { "field": "tags" }
}
}
}
需求:先分组,再算每组的平均值,计算每个tag下的商品的平均价格
GET /book/_search
{
"size": 0,
"aggs" : {
"group_by_tags" : {
"terms" : {
"field" : "tags"
},
"aggs" : {
"avg_price" : {
"avg" : { "field" : "price" }
}
}
}
}
}
需求:计算每个tag下的商品的平均价格,并且按照平均价格降序排序
GET /book/_search
{
"size": 0,
"aggs" : {
"group_by_tags" : {
"terms" : {
"field" : "tags",
"order": {
"avg_price": "desc"
}
},
"aggs" : {
"avg_price" : {
"avg" : { "field" : "price" }
}
}
}
}
}
需求:按照指定的价格范围区间进行分组,然后在每组内再按照tag进行分组,最后再计算每组的平均价格
GET /book/_search
{
"size": 0,
"aggs": {
"group_by_price": {
"range": {
"field": "price",
"ranges": [
{
"from": 0,
"to": 40
},
{
"from": 40,
"to": 60
},
{
"from": 60,
"to": 80
}
]
},
"aggs": {
"group_by_tags": {
"terms": {
"field": "tags"
},
"aggs": {
"average_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
两个核心概念:bucket和metric
bucket:一个数据分组
city name 北京 张三 北京 李四 天津 王五 天津 赵六
天津 王麻子
划分出来两个bucket,一个是北京bucket,一个是天津bucket 北京bucket:包含了2个人,张三,李四 上海bucket:包含了3个人,王五,赵六,王麻子
metric:对一个数据分组执行的统计
metric,就是对一个bucket执行的某种聚合分析的操作,比如说求平均值,求最大值,求最小值
select count() from book group studymodelbucket:group by studymodel --> 那些studymodel相同的数据,就会被划分到一个bucket中 metric:count(),对每个user_id bucket中所有的数据,计算一个数量。还有avg(),sum(),max(),min()
电视案例
创建索引及映射
PUT /tvs
PUT /tvs/_search
{
"properties": {
"price": {
"type": "long"
},
"color": {
"type": "keyword"
},
"brand": {
"type": "keyword"
},
"sold_date": {
"type": "date"
}
}
}
插入数据
POST /tvs/_bulk
{ "index": {}}
{ "price" : 1000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-10-28" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-11-05" }
{ "index": {}}
{ "price" : 3000, "color" : "绿色", "brand" : "小米", "sold_date" : "2019-05-18" }
{ "index": {}}
{ "price" : 1500, "color" : "蓝色", "brand" : "TCL", "sold_date" : "2019-07-02" }
{ "index": {}}
{ "price" : 1200, "color" : "绿色", "brand" : "TCL", "sold_date" : "2019-08-19" }
{ "index": {}}
{ "price" : 2000, "color" : "红色", "brand" : "长虹", "sold_date" : "2019-11-05" }
{ "index": {}}
{ "price" : 8000, "color" : "红色", "brand" : "三星", "sold_date" : "2020-01-01" }
{ "index": {}}
{ "price" : 2500, "color" : "蓝色", "brand" : "小米", "sold_date" : "2020-02-12" }
需求1 统计哪种颜色的电视销量最高
GET /tvs/_search
{
"size" : 0,
"aggs" : {
"popular_colors" : {
"terms" : {
"field" : "color"
}
}
}
}
查询条件解析
size:只获取聚合结果,而不要执行聚合的原始数据 aggs:固定语法,要对一份数据执行分组聚合操作 popular_colors:就是对每个aggs,都要起一个名字, terms:根据字段的值进行分组 field:根据指定的字段的值进行分组
返回
{
"took" : 18,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"popular_colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "红色",
"doc_count" : 4
},
{
"key" : "绿色",
"doc_count" : 2
},
{
"key" : "蓝色",
"doc_count" : 2
}
]
}
}
}
返回结果解析
hits.hits:我们指定了size是0,所以hits.hits就是空的 aggregations:聚合结果 popular_color:我们指定的某个聚合的名称 buckets:根据我们指定的field划分出的buckets key:每个bucket对应的那个值 doc_count:这个bucket分组内,有多少个数据 数量,其实就是这种颜色的销量
每种颜色对应的bucket中的数据的默认的排序规则:按照doc_count降序排序
需求2 统计每种颜色电视平均价格
GET /tvs/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
在一个aggs执行的bucket操作(terms),平级的json结构下,再加一个aggs,这个第二个aggs内部,同样取个名字,执行一个metric操作,avg,对之前的每个bucket中的数据的指定的field,price field,求一个平均值
返回:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 8,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"colors" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "红色",
"doc_count" : 4,
"avg_price" : {
"value" : 3250.0
}
},
{
"key" : "绿色",
"doc_count" : 2,
"avg_price" : {
"value" : 2100.0
}
},
{
"key" : "蓝色",
"doc_count" : 2,
"avg_price" : {
"value" : 2000.0
}
}
]
}
}
}
buckets,除了key和doc_count avg_price:我们自己取的metric aggs的名字 value:我们的metric计算的结果,每个bucket中的数据的price字段求平均值后的结果
相当于sql: select avg(price) from tvs group by color
需求3 继续下钻分析
每个颜色下,平均价格及每个颜色下,每个品牌的平均价格
GET /tvs/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
},
"aggs": {
"color_avg_price": {
"avg": {
"field": "price"
}
},
"group_by_brand": {
"terms": {
"field": "brand"
},
"aggs": {
"brand_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
需求4:更多的metric
count:bucket,terms,自动就会有一个doc_count,就相当于是count avg:avg aggs,求平均值 max:求一个bucket内,指定field值最大的那个数据 min:求一个bucket内,指定field值最小的那个数据 sum:求一个bucket内,指定field值的总和
GET /tvs/_search
{
"size" : 0,
"aggs": {
"colors": {
"terms": {
"field": "color"
},
"aggs": {
"avg_price": { "avg": { "field": "price" } },
"min_price" : { "min": { "field": "price"} },
"max_price" : { "max": { "field": "price"} },
"sum_price" : { "sum": { "field": "price" } }
}
}
}
}
需求5:划分范围 histogram
GET /tvs/_search
{
"size" : 0,
"aggs":{
"price":{
"histogram":{
"field": "price",
"interval": 2000
},
"aggs":{
"income": {
"sum": {
"field" : "price"
}
}
}
}
}
}
histogram:类似于terms,也是进行bucket分组操作,接收一个field,按照这个field的值的各个范围区间,进行bucket分组操作
"histogram":{
"field": "price",
"interval": 2000
}
interval:2000,划分范围,02000,20004000,40006000,60008000,8000~10000,buckets
bucket有了之后,一样的,去对每个bucket执行avg,count,sum,max,min,等各种metric操作,聚合分析
需求6:按照日期分组聚合
date_histogram,按照我们指定的某个date类型的日期field,以及日期interval,按照一定的日期间隔,去划分bucket
min_doc_count:即使某个日期interval,2017-01-01~2017-01-31中,一条数据都没有,那么这个区间也是要返回的,不然默认是会过滤掉这个区间的 extended_bounds,min,max:划分bucket的时候,会限定在这个起始日期,和截止日期内
GET /tvs/_search
{
"size" : 0,
"aggs": {
"sales": {
"date_histogram": {
"field": "sold_date",
"interval": "month",
"format": "yyyy-MM-dd",
"min_doc_count" : 0,
"extended_bounds" : {
"min" : "2019-01-01",
"max" : "2020-12-31"
}
}
}
}
}
需求7 统计每季度每个品牌的销售额
GET /tvs/_search
{
"size": 0,
"aggs": {
"group_by_sold_date": {
"date_histogram": {
"field": "sold_date",
"interval": "quarter",
"format": "yyyy-MM-dd",
"min_doc_count": 0,
"extended_bounds": {
"min": "2019-01-01",
"max": "2020-12-31"
}
},
"aggs": {
"group_by_brand": {
"terms": {
"field": "brand"
},
"aggs": {
"sum_price": {
"sum": {
"field": "price"
}
}
}
},
"total_sum_price": {
"sum": {
"field": "price"
}
}
}
}
}
}
需求8 :搜索与聚合结合,查询某个品牌按颜色销量
搜索与聚合可以结合起来。
select count(*)
from tvs
where brand like "%小米%"
group by color
es aggregation,scope,任何的聚合,都必须在搜索出来的结果数据中之行,搜索结果,就是聚合分析操作的scope
GET /tvs/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "小米"
}
}
},
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
}
}
}
}
需求9 global bucket:单个品牌与所有品牌销量对比
aggregation,scope,一个聚合操作,必须在query的搜索结果范围内执行
出来两个结果,一个结果,是基于query搜索结果来聚合的; 一个结果,是对所有数据执行聚合的
GET /tvs/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "小米"
}
}
},
"aggs": {
"single_brand_avg_price": {
"avg": {
"field": "price"
}
},
"all": {
"global": {},
"aggs": {
"all_brand_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
需求10:过滤+聚合:统计价格大于1200的电视平均价格
搜索+聚合
过滤+聚合
GET /tvs/_search
{
"size": 0,
"query": {
"constant_score": {
"filter": {
"range": {
"price": {
"gte": 1200
}
}
}
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
需求11 bucket filter:统计品牌最近一个月的平均价格
GET /tvs/_search
{
"size": 0,
"query": {
"term": {
"brand": {
"value": "小米"
}
}
},
"aggs": {
"recent_150d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-150d"
}
}
},
"aggs": {
"recent_150d_avg_price": {
"avg": {
"field": "price"
}
}
}
},
"recent_140d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-140d"
}
}
},
"aggs": {
"recent_140d_avg_price": {
"avg": {
"field": "price"
}
}
}
},
"recent_130d": {
"filter": {
"range": {
"sold_date": {
"gte": "now-130d"
}
}
},
"aggs": {
"recent_130d_avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
aggs.filter,针对的是聚合去做的
如果放query里面的filter,是全局的,会对所有的数据都有影响
但是,如果,比如说,你要统计,长虹电视,最近1个月的平均值; 最近3个月的平均值; 最近6个月的平均值
bucket filter:对不同的bucket下的aggs,进行filter
需求12 排序:按每种颜色的平均销售额降序排序
GET /tvs/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color",
"order": {
"avg_price": "asc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
相当于sql子表数据字段可以立刻使用。
需求13 排序:按每种颜色的每种品牌平均销售额降序排序
GET /tvs/_search
{
"size": 0,
"aggs": {
"group_by_color": {
"terms": {
"field": "color"
},
"aggs": {
"group_by_brand": {
"terms": {
"field": "brand",
"order": {
"avg_price": "desc"
}
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}