How to copy an Elasticsearch index from the production to a local Docker container
I faced an issue with Elasticsearch last week, and in order to reproduce it, I wanted to have the full index on my development machine.
To do that, I have some options:
- Use a backup: but I only want one indice. It would be too heavy to download the whole backup. Moreover it does not fit on my laptop;
- Create a new snapshot: I don’t want to edit production configuration;
- Export the index as JSON and re-index it: I’ll need too much manual code, and it will be too slow;
- Use the reindex API with a remote source (production) and a local destination (my development environment): that’s what we’re gonna use.
Our setup
In production
- The cluster runs on bare-metal;
- The Elasticsearch cluster is not directly reachable over HTTP;
- The servers are not even reachable over SSH;
- We have a bastion to protect our infrastructure.
In development
- The Elasticsearch node is in a docker container.
The reindex API
I will use the reindex API.
This API allows us to copy an index to another index. And what is cool, is that it allows us to copy data from a remote cluster.
The syntax is something similar to:
POST _reindex
{
"source": {
"remote": {
"host": "http://otherhost:9200",
"username": "user",
"password": "pass"
},
"index": "my-index-000001"
},
"dest": {
"index": "my-new-index-000001"
}
}
We will use the development node to initiate the reindex. It means we will run the HTTP request on the Docker container.
How to expose the production to a local container
Since there are many security to pass through, we will use an SSH tunnel to expose the production cluster to the local container.
So we need to:
- open an SSH connection to the production cluster: keys
HostName
andUser
; - by using the bastion: key
ProxyJump
; - configure the bastion for the “proxy jump”: second part of the config;
- bind port 9200 (on the production) to 9201 (on our host): key
LocalForward
; - bind 0.0.0.0 on our host, instead of 127.0.0.1 to allow our container to reach the tunnel: key
GatewayPorts
; - disable TTY because it’s not needed: key
RequestTTY
; - display a nice message when opening the connection: key
RemoteCommand
; - use a nice name for the connection: main key
Host
.
All the configurations together:
# .ssh/config
Host project-prod-tunnel-es
ProxyJump project-prod-bastion-1
Hostname 10.20.0.243
User debian
LocalForward 9201 10.20.0.243:9200
RequestTTY no
GatewayPorts true
RemoteCommand echo "curl http://127.0.0.1:9201" && cat
Host project-prod-bastion-1
Hostname 1.2.3.4
User debian
Note: Our elasticsearch nodes are not listening to 127.0.0.1
, but the local IP.
That’s why the LocalForward
uses 10.20.0.243
and not 127.0.0.1
.
WARNING: When you use LocalForward
you are opening the big security
breach in your production cluster: all computers in your network, containers
on your computer, applications will be able to reach the production. This risk
should be treated very carefully!
You will also need to have your SSH key installed on your servers. And to open the tunnel, you must run the following command:
ssh project-prod-tunnel-es
How to start the reindex
Configuration of the local node
The remote reindex server should be “whitelisted” on your elasticsearch configuration. The remote host will be your host Docker IP. It’s usually the container gateway. You can find it with the following command:
docker inspect -f '{{range .NetworkSettings.Networks}}{{.Gateway}}{{end}} <container_id>'
Once you get the IP, you must allow it in the configuration
# /etc/elasticsearch/elasticsearch.yml
reindex.remote.whitelist: "172.21.0.1:9201"
Don’t forget to rebuild & up again the container
Start the reindex
Once you are done, you can execute the following HTTP request to start the task
POST _reindex?wait_for_completion=false
{
"source": {
"index": "my_index_to_debug",
"remote": {
"host": "http://172.21.0.1:9201"
},
"size": 10
},
"dest": {
"index": "my_index_to_debug"
}
}
This request will return a task ID. In my case: fLDgREJ0S46ETKfmPnRtHw:7330
- I disabled
wait_for_completion
, because the index is about 10Gb; - You can monitor the progress thanks to the task API:
GET _tasks/fLDgREJ0S46ETKfmPnRtHw:7330
; - I used a size of 10, because with a bigger value, I hit some memory limit on my local node, as said in the documentation;
- The bandwidth with this configuration is about 7.0MiB/s;
- It took about 1 hour to transfer 10Gb of data.
Conclusion
With very few configuration (1 line in the Dockerfile and few lines in our
.ssh/config
) we manage to call the production cluster from a
development container.
This blog post was about Elasticsearch, but It would be exactly the same for other TCP services, like PostgreSQL, RabbitMQ, or Redis.
And don’t forget to close your tunnel once you’re done!