Elasticsearch-document update common operations

Elasticsearch-document update common operations

1. Start es

./bin/elasticsearch -d 
 

Check if the startup is successful, listen to 9200 by default

curl http://127.0.0.1:9200

output 
{
  "name" : "Christopher Summers",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.3.3",
    "build_hash" : "218bdf10790eef486ff2c41a3df5cfa32dadcfde",
    "build_timestamp" : "2016-05-17T15:40:04Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.0"
  },
  "tagline" : "You Know, for Search"
}
 

Successfully return information to prove that our es service started successfully

2. Check how many indexes are in es

We can use the parameters under _cat to view

curl http://127.0.0.1:9200/_cat/indices?v

output:

health status index    pri rep docs.count docs.deleted store.size pri.store.size 
yellow open   test       5   1          0            0       800b           800b 
yellow open   synctest   5   1          4            0     16.2kb         16.2kb 
 

_cat is a very important query method for performance monitoring. If you are interested, you can study it yourself

curl http://127.0.0.1:9200/_cat/

output:

=^.^=
/_cat/allocation
/_cat/shards
/_cat/shards/{index}
/_cat/master
/_cat/nodes
/_cat/indices
/_cat/indices/{index}
/_cat/segments
/_cat/segments/{index}
/_cat/count
/_cat/count/{index}
/_cat/recovery
/_cat/recovery/{index}
/_cat/health
/_cat/pending_tasks
/_cat/aliases
/_cat/aliases/{alias}
/_cat/thread_pool
/_cat/plugins
/_cat/fielddata
/_cat/fielddata/{fields}
/_cat/nodeattrs
/_cat/repositories
/_cat/snapshots/{repository}
 

We can also use _all to get all index and type specific mapping information

curl http://127.0.0.1:9200/_all
 

If you need to view specific index index information, you can use

curl http://127.0.0.1:9200/test/_mapping

output:
{
    "synctest":{
        "mappings":{
            "logs":{
                "properties":{
                    "@timestamp":{
                        "type":"date",
                        "format":"strict_date_optional_time||epoch_millis"
                    },
                    "@version":{
                        "type":"string"
                    },
                    "host":{
                        "type":"string"
                    },
                    "message":{
                        "type":"string"
                    }
                }
            },
            "article":{
                "properties":{
                    "@timestamp":{
                        "type":"date",
                        "format":"strict_date_optional_time||epoch_millis"
                    },
                    "@version":{
                        "type":"string"
                    },
                    "id":{
                        "type":"long"
                    },
                    "is_deleted":{
                        "type":"long"
                    },
                    "name":{
                        "type":"string"
                    },
                    "type":{
                        "type":"string"
                    },
                    "update_time":{
                        "type":"date",
                        "format":"strict_date_optional_time||epoch_millis"
                    },
                    "user_name":{
                        "type":"string"
                    }
                }
            }
        }
    }
}
 

If you check the specific tpye _mapping, you can use

curl http://127.0.0.1:9200/synctest/article/_mapping
 

3. es create update operation

Add (PUT)

We specify insert data _id=4 in the url, and then add data

curl -X PUT 127.0.0.1:9200/synctest/article/4 -d '{"id":4,"name":"Tom cat"}'

output:
{
    "_index":"synctest",
    "_type":"article",
    "_id":"4",
    "_version":1,
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "created":true
}
 

It must be noted here that if _id=4 already exists in the system, data overwrite update will occur

curl -X PUT http://127.0.0.1:9200/synctest/article/4?pretty  -d '{"id":4,"cc":1}'

output:
{
    "_index":"synctest",
    "_type":"article",
    "_id":"4",
    "_version":2,
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    },
    "created":false
}
 

Note that there is a _version field, so the name implies the meaning of the version number. The version number will be increased by 1 every time it is updated. This can be used for concurrency control in actual work.

Adding pretty at the end of the url means to return a beautiful json format

Pay attention to the created return value we returned, if it is an update created will return false

Create safer

We can create data through the above PUT method, but it may also have side effects to update the data. In the actual working environment, it may not need to overwrite the previous data to update.

Then can we only create it through an api, if it exists, it will not be created again?

Of course there is the answer!

We can add _create after the url to specify creation

curl -X PUT http://127.0.0.1:9200/synctest/article/4/_create -d
'{"id":4,"name":"heihei"}'

output:
{
  "error" : {
    "root_cause" : [ {
      "type" : "document_already_exists_exception",
      "reason" : "[article][4]: document already exists",
      "shard" : "2",
      "index" : "synctest"
    } ],
    "type" : "document_already_exists_exception",
    "reason" : "[article][4]: document already exists",
    "shard" : "2",
    "index" : "synctest"
  },
  "status" : 409
}
 
curl -X PUT http://127.0.0.1:9200/synctest/article/5/_create?pretty -d '{"id":5,"name":"heihei"}'

output:
{
  "_index" : "synctest",
  "_type" : "article",
  "_id" : "5",
  "_version" : 1,
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "created" : true
}
 

Database transactions are manipulations that we often use, so how do we implement es transactions?

Remember the version number we mentioned above?

curl -X PUT http://127.0.0.1:9200/synctest/article/5?version=1 -d '{"id":5,"name":"heihei"}'

output:
{
    "error":{
        "root_cause":[
            {
                "type":"version_conflict_engine_exception",
                "reason":"[article][5]: version conflict, current [2], provided [1]",
                "shard":"1",
                "index":"synctest"
            }
        ],
        "type":"version_conflict_engine_exception",
        "reason":"[article][5]: version conflict, current [2], provided [1]",
        "shard":"1",
        "index":"synctest"
    },
    "status":409
}
 

The above example specifies that the version number must be version=1 for the update to succeed, otherwise the update will fail

Update partial documentation

curl -X POST  http://127.0.0.1:9200/synctest/article/4/_update 
-d {"doc":{"views":1}}

output:
{
    "_index":"synctest",
    "_type":"article",
    "_id":"4",
    "_version":7,
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    }
}

 :
{
    "_index":"synctest",
    "_type":"article",
    "_id":"4",
    "_version":7,
    "found":true,
    "_source":{
        "id":4,
        "cc":1,
        "views":1
    }
}
 

Update using script

Seeing that we have added a new field views, which is expressed as the number of views, if we need to increase by 1, we should use an api to achieve it, we can use a script (the default groovy script)

1. we need to enable script support in elasticsearch.yml and reload configuration

script.inline: on
script.indexed: on
 
curl -X POST http://127.0.0.1:9200/synctest/article/4/_update -d
'{"script":"ctx._source.views+=1"}'

output 
{
    "_index":"synctest",
    "_type":"article",
    "_id":"4",
    "_version":12,
    "_shards":{
        "total":2,
        "successful":1,
        "failed":0
    }
}
 

Because the views exist in _id=4, but if I want to update other fields that do not exist in the views field, an error will be reported

curl -X POST http://127.0.0.1:9200/synctest/article/2/_update 
-d '{"script":"ctx._source.views+=1"}'

output:
{
    "error":{
        "root_cause":[
            {
                "type":"remote_transport_exception",
                "reason":"[Ranger][192.168.2.108:9300][indices:data/write/update[s]]"
            }
        ],
        "type":"illegal_argument_exception",
        "reason":"failed to execute script",
        "caused_by":{
            "type":"script_exception",
            "reason":"failed to run inline script [ctx._source.views+=1] using lang [groovy]",
            "caused_by":{
                "type":"null_pointer_exception",
                "reason":"Cannot execute null+1"
            }
        }
    },
    "status":400
}
 

How to solve this situation?

{
    "script":"ctx._source.views+=1",
    "upsert":{
        "views":1 # 1
    }
}
 

In the environment of concurrent network requests, various problems may occur. You can understand that there is also the parameter retry_on_conflict, which indicates the number of failed retries, and the default is 0. I have not used this parameter.

 curl -X POST http://127.0.0.1:9200/synctest/article/4/_update?retry_on_conflict=5 
 -d '{"upsert":{"views":1},"script":"ctx._source.views+=1"}'
 

We can also use scripts to do more things.

Determine whether this document should be deleted according to the conditions (Gaobenban> 6.0)

curl -X POST http://127.0.0.1:9200/synctest/article/4/_update 
-d '{"script":"ctx.op = ctx._source.views>3 ? 'delete' : 'none' "}'
 

Or use the parameter form

{
   "script" : "ctx.op = ctx._source.views>count ? 'delete' : 'none'",
    "params" : {
        "count": 3 # 
    }
}
 

Besides

es also supports batch creation, update, and deletion operations (es 6.6)

 curl -X POST http://127.0.0.1:9200/_bulk 
 -d '{"delete": { "_index": "synctest", "_type": "article", "_id": "4" }
 {"update": { "_index": "synctest", "_type": "article", "_id": "3" }
 { "doc" : {"title" : "bluk update"} }'
 

Next, more exciting content, pay attention to: