# ElasticSearch 聚合(aggregations)

| Syntax      | Description |
| ----------- | ----------- |
| Header      | Title       |
| Paragraph   | Text        |


	| Syntax      | Description |
	| ----------- | ----------- |
	| Header      | Title       |
	| Paragraph   | Text        |

### 特点
	
+	聚合和搜索是使用同样的数据结构，因此聚合和搜索可以是一起执行的.
	这表示我们可以在一次json请求裡，同时对相同的数据进行 搜索/过滤 + 分析
+	桶 和 度量
	-	桶（bucket）
	
		1.是按照某种方式对数据进行分组，但不包括计算，因此bucket中往往会嵌套另一种聚合：metrics aggregations即度量
		
		2.桶可以被嵌套在其他桶里面
		
		3.比较常用的桶划分方式有
		- Terms Aggregation：根据词条内容分组，词条内容完全匹配的为一组
		- filter：一个用来过滤的桶 和用在主查询query的 "过滤filter" 的用法是一模一样的，都是过滤
		- top_hits桶 : 在某个桶底下找出这个桶的前几笔hits，返回的hits格式和主查询query返回的hits格式一模一样
		- Date Histogram Aggregation：根据日期阶梯分组，例如给定阶梯为周，会自动每周分为一组
		- Histogram Aggregation：根据数值阶梯分组，与日期类似
		- Range Aggregation：数值和日期的范围分组，指定开始和结束，然后按段分组
	
	- 	度量（metrics）
	
		分组完成以后，我们一般会对组中的数据进行聚合运算，例如求平均值、最大、最小、求和等，这些在ES中称为度量

		常用的度量集合方式有
		- Avg Aggregation：求平均值
		- Max Aggregation：求最大值
		- Min Aggregation：求最小值
		- Percentiles Aggregation：求百分比
		- Stats Aggregation：同时返回avg、max、min、sum、count等
		- Sum Aggregation：求和
		- Top hits Aggregation：求前几
		- Value Count Aggregation：求总数

### aggs 聚合的模板
 
+   当query和aggs一起存在时，会先执行query的主查询，主查询query执行完后会搜出一批结果，而这些结果才会被拿去aggs拿去做聚合
	另外要注意aggs后面会先接一层自定义的这个聚合的名字，然后才是接上要使用的聚合桶
	如果有些情况不在意查询结果是什麽，而只在意aggs的结果，可以把size设为0，如此可以让返回的hits结果集是0，加快返回的速度
	
+	一个aggs裡可以有很多个聚合，每个聚合彼此间都是独立的，因此可以一个聚合拿来统计数量、一个聚合拿来分析数据、一个聚合拿来计算标准差...，让一次搜索就可以把想要做的事情一次做完
	
+	aggs可以嵌套在其他的aggs裡面，而嵌套的桶能作用的文档集范围，是外层的桶所输出的结果集

+	模板
```
GET /test/doc/_search
{
    "query": { ... },
    "size": 0,
    "aggs": {
        "custom_name1": {  //aggs后面接著的是一个自定义的name
            "桶": { ... }  //再来才是接桶
        },
        "custom_name2": {  //一个aggs裡可以有很多聚合
            "桶": { ... }
        },
        "custom_name3": {
            "桶": {
               .....
            },
            "aggs": {  //aggs可以嵌套在别的aggs裡面
                "in_name": { //记得使用aggs需要先自定义一个name
                    "桶": { ... } //in_name的桶作用的文档是custom_name3的桶的结果
                }
            }
        }
    }
```
+ 结果模板
```
  {
   "hits": {
       "total": 8,
       "max_score": 0,
       "hits": [] //因为size设为0，所以没有查询结果返回
   },
   "aggregations": {
       "custom_name1": {
           ...
       },
       "custom_name2": {
           ...
       },
       "custom_name3": {
           ... ,
           "in_name": {
              ....
           }
       }
   }
 }
```

### 数据准备
```
PUT /test
{
  "mappings": {
    "dynamic": "strict",
    "properties": {
      "color": {
        "type": "keyword"
      },
      "price": {
        "type": "long"
      }
    }
  }
}

POST /test/_doc/1
{"color":"red","price":100}

POST /test/_doc/2
{"color":"green","price":500}

POST /test/_doc/3
{"color":["red","blue"],"price":1000}
```
	
### 示例
- trems桶

	- 找出共几组颜色和组内颜色个数
```
GET /test/_search
{
    "size": 0,
    "aggs": {
        "my_terms": {
            "terms": {
                "field": "color"
            }
        }
    }
}
```
聚合结果
```
{
	"aggregations" : {
	    "my_terms" : {
	      "doc_count_error_upper_bound" : 0,
	      "sum_other_doc_count" : 0,
	      "buckets" : [
	        {
	          "key" : "red",
	          "doc_count" : 2
	        },
	        {
	          "key" : "blue",
	          "doc_count" : 1
	        },
	        {
	          "key" : "green",
	          "doc_count" : 1
	        }
	      ]
	    }
	  }
}
```
	- 在示例1基础上,对分组的颜色求价格平均和最小值
```
GET /test/_search
{
    "size": 0,
    "aggs": {
        "my_terms": {
            "terms": {
                "field": "color"
            },
            "aggs": {  
                "my_avg_price": { 
                    "avg": {
                        "field": "price"
                    }
                },
                "my_min_price": { 
                    "min": {
                        "field": "price"
                    }
                }
            }
        }
    }
}	
```
聚合结果
```
 {
	 "aggregations" : {
	     "my_terms" : {
	       "doc_count_error_upper_bound" : 0,
	       "sum_other_doc_count" : 0,
	       "buckets" : [
	         {
	           "key" : "red",
	           "doc_count" : 2,
	           "my_avg_price" : {
	             "value" : 550.0
	           },
	           "my_min_price" : {
	             "value" : 100.0
	           }
	         },
	         {
	           "key" : "blue",
	           "doc_count" : 1,
	           "my_avg_price" : {
	             "value" : 1000.0
	           },
	           "my_min_price" : {
	             "value" : 1000.0
	           }
	         },
	         {
	           "key" : "green",
	           "doc_count" : 1,
	           "my_avg_price" : {
	             "value" : 500.0
	           },
	           "my_min_price" : {
	             "value" : 500.0
	           }
	         }
	       ]
	     }
 }
```

- filter桶
	过滤只查看红颜色的分组情况
```
GET /test/_search
{
  "size": 0,
  "aggs": {
    "my_fliter": {
      "filter": {
        "bool": {
          "must": {
            "terms": {
              "color": [
                "red"
              ]
            }
          }
        }
      }
    }
  }
}
```	
聚合结果
```	
{
	 "aggregations" : {
	    "my_fliter" : {
	      "doc_count" : 2
	    }
	  }
}
```	

	filter桶和terms桶叠加嵌套使用
	过滤含有红颜色的文档,再对其中包含的颜色进行分组
```
GET /test/_search
{
  "size": 0,
  "aggs": {
    "my_fliter": {
      "filter": {
        "bool": {
          "must": {
            "terms": {
              "color": [
                "red"
              ]
            }
          }
        }
      },
      "aggs": {
        "my_trems": {
          "terms": {
            "field": "color"
          }
        }
      }
    }
  }
}
```
聚合结果
	- 因为terms桶嵌套在filter桶内，所以query查询出来的文档们会先经过filter桶，如果符合filter桶，才会进入到terms桶内
	- 此处通过filter桶的文档只有两笔，分别是{"color": "red"}以及{"color": ["red", "blue"]}，所以terms桶只会对这两笔文档做分组
	- 这也是为什麽terms桶裡没有出现color为green的分组，因为这个文档在filter桶就被挡下来了
	- 需注意的是聚合中取的是query之后文档内容,如果query中限制只查询green的文档,那么聚合将无对应内容展示
```
{
	"aggregations" : {
	    "my_fliter" : {
	      "doc_count" : 2,
	      "my_trems" : {
	        "doc_count_error_upper_bound" : 0,
	        "sum_other_doc_count" : 0,
	        "buckets" : [
	          {
	            "key" : "red",
	            "doc_count" : 2
	          },
	          {
	            "key" : "blue",
	            "doc_count" : 1
	          }
	        ]
	      }
	    }
	  }
}
```
	当然也可以先进行trems桶嵌套filter桶,意义则是分组后再进行过滤
```
GET /test/_search
{
  "size": 0,
  "aggs": {
    "my_trems": {
      "terms": {
        "field": "color"
      },
      "aggs": {
        "my_fliter": {
          "filter": {
            "bool": {
              "must": {
                "terms": {
                  "color": [
                    "red"
                  ]
                }
              }
            }
          }
        }
      }
    }
  }
}
```
聚合结果
	- 在分组中进行过滤,可以看到green中my_filter中的doc_count结果为0
	- 而至于为什么bule中含有一条doc_count=1,是因为原文档是{"color":["red","blue"]}
```
{
	
	"aggregations" : {
	    "my_trems" : {
	      "doc_count_error_upper_bound" : 0,
	      "sum_other_doc_count" : 0,
	      "buckets" : [
	        {
	          "key" : "red",
	          "doc_count" : 2,
	          "my_fliter" : {
	            "doc_count" : 2
	          }
	        },
	        {
	          "key" : "blue",
	          "doc_count" : 1,
	          "my_fliter" : {
	            "doc_count" : 1
	          }
	        },
	        {
	          "key" : "green",
	          "doc_count" : 1,
	          "my_fliter" : {
	            "doc_count" : 0
	          }
	        }
	      ]
	    }
	  }
}
```
- top_hits桶 

	在某个桶底下找出这个桶的前几笔hits，返回的hits格式和主查询query返回的hits格式一模一样
	
	另外,该桶中不能再嵌套子聚合
		Aggregator [my_top_hit] of type [top_hits] cannot accept sub-aggregations
	
	- top_hits桶支持的参数

	 - from、size
	 - sort : 设置返回的hits的排序
	 
	 	要注意，假设在主查询query裡已经对数据设置了排序sort，此sort并不会对aggs裡面的数据造成影响，也就是说主查询query查找出来的数据会先丢进aggs而非先经过sort，因此就算主查询设置了sort，也不会影响aggs数据裡的排序
	 	因此如果在top_hits桶裡的返回的hits数据想要排序，需要自己在top_hits桶裡设置sort
	 	如果没有设置sort，默认使用主查询query所查出来的_score排序
	 - _source : 设置返回的字段

	按价格排序,取前两条记录
```
GET /test/_search
{
  "size": 0,
  "aggs": {
    "my_top_hit": {
      "top_hits": {
        "size": 2,
        "sort": ["price"] #默认升序asc
		#"sort": {"price":"desc"}这种写法也可以
      }
    }
  }
}
```
聚合结果
```
{
	"aggregations" : {
	    "my_top_hit" : {
	      "hits" : {
	        "total" : {
	          "value" : 3,
	          "relation" : "eq"
	        },
	        "max_score" : null,
	        "hits" : [
	          {
	            "_index" : "test",
	            "_type" : "_doc",
	            "_id" : "1",
	            "_score" : null,
	            "_source" : {
	              "color" : "red",
	              "price" : 100
	            },
	            "sort" : [
	              100
	            ]
	          },
	          {
	            "_index" : "test",
	            "_type" : "_doc",
	            "_id" : "2",
	            "_score" : null,
	            "_source" : {
	              "color" : "green",
	              "price" : 500
	            },
	            "sort" : [
	              500
	            ]
	          }
	        ]
	      }
	    }
	  }
}
```