The first step in deploying my next cluster is building a bootstrap server. This bootstrap server needs to host a number of small static services that are used by the nodes in the cluster(s) and in the worker pool. Examples of these services include, NTP, DNS, PXE/TFTP as traditional *nix services but then discovery.etcd.io as needed by etcd in order to discovery cluster membership.
coreos: etcd2: # generate a new token for each unique cluster from https://discovery.etcd.io/new: discovery: https://discovery.etcd.io/<discovery_token> # multi-region deployments, multi-cloud deployments, and Droplets without # private networking need to use $public_ipv4: advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001 initial-advertise-peer-urls: http://$private_ipv4:2380 # listen on the official ports 2379, 2380 and one legacy port 4001: listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001 listen-peer-urls: http://$private_ipv4:2380
** Sorry, clearly glogger did not paste the code properly.
The CoreOS team developed and deployed a public version of the discovery tool and then made the code available. Unfortunately the tool itself needs to be deployed in a cluster of etcd servers. And so there are two conflicts… (a) whether or not to use the public instance. (b) whether to perform the discovery manually.
(a) the TTL means that the record and UUID should not live long enough for someone to trick your cluster in order to replace one of your nodes into believing (i) that is does not belong; and (ii) that the bad guy can replace it. I’m not an expert but I imagine that one could validate the cluster peers with some list of IP addresses from a 3rd party proxy; in my case the “retrieve droplets” API at Digital Ocean.
(b) before implementing a pared down version of the discovery service and the reason I think that a lite version is required; read this doc
as it describes the discovery protocol and hints as to how easy it might be.
The challenge is the design…
Create a private discovery service inside your firewall… but it requires an etcd cluster. And that cluster depends on tokens… so either you have to hand stitch the token or use the public discover service… which depends on an etcd cluster that was already clustered… and follow the tokens and services recursively until an ops person installed the tokens manually.
Because the TTL is fairly low, some
temporary persistence, and because it’s not necessary to stay alive 24x7 it would make sense that the discovery service might be detached from the etcd service.
UPDATE: I received a response to a G+ question from Brandon @ CoreOS. “The discovery.etcd.io service only lasts for the purpose to construct the cluster. Nothing else.” So the defacto discovery service is probably safe. ON THE OTHER HAND the CoreOS clearly documents:Running Your Own Discovery ServiceThe public discovery service is just an etcd cluster made available to the public internet. Since the discovery service conducts and stores the result of the first leader election, it needs to be consistent. You wouldn’t want two machines in the same cluster to think they were both the leader.
Since etcd is designed to this type of leader election, it was an obvious choice to use it for everyone’s initial leader election. This means that it’s easy to run your own etcd cluster for this purpose.
If you’re interested in how discovery API works behind the scenes in etcd, read about etcd clustering.
It’s sort of a miscommunication here. Furthermore, creating a discovery service requires an etcd cluster which in turn requires an etcd cluster. And while there is some documentation on bootstraping an etcd cluster it seems involved and should have been scripted.