Much of the content come from the consul official documentation.
I tried to keep only the most useful part and to make it as short as possible in terms of information to know.
Terms definition
Node :
A node is a physical or virtual machine with a Consul agent installed on.
Service :
Services are located on the nodes.
Datacenter and Cluster:
Clusters and Datacenters are very close concepts. Often the one is used instead of the other to mean the same thing.
A datacenter/cluster is the networking environment that is private, low latency, and high bandwidth. It regroups consul nodes and also consul clients and servers that reside on the node.
Agent :
An agent is the long running daemon on every member of the Consul cluster. It is started by running consul agent. The agent is able to run in either client or server mode. Since all nodes must be running an agent, it is simpler to refer to the node as being either a client or server, but there are other instances of the agent.
Agent in client mode or client :
A client is an agent that forwards all RPCs to a server. The client is relatively stateless. The only background activity a client performs is taking part in the LAN gossip pool. This has a minimal resource overhead and consumes only a small amount of network bandwidth.
Agent in server mode or server:
A server is an agent with an expanded set of responsibilities including participating in the Raft quorum, maintaining cluster state, responding to RPC queries, exchanging WAN gossip with other datacenters, and forwarding queries to leaders or remote datacenters.
local agent :
Not clear for now
Consensus :
Agreement upon the elected leader as well as agreement on the ordering of transactions.
Gossip :
Consul is built on top of Serf which provides a full gossip protocol that is used for multiple purposes.
Gossip involves random node-to-node communication, primarily over UDP and here Serf provides membership, failure detection, and event broadcast.
LAN Gossip :
Refers to the LAN gossip pool which contains nodes that are all located on the same local area network or datacenter.
WAN Gossip :
Refers to the WAN gossip pool which contains only servers. These servers are primarily located in different datacenters and typically communicate over the internet or wide area network.
Architecture overview
The consul documentation provides an excellent diagram :
The take over lesson of that is that consul is finally very simple conceptually.
We could summarize it such as :
– a datacenter/cluster groups a set of consul clients and servers in the same network (low latency).
– inside a datacenter, every part communicate for some reasons (query, forward, gossip, replication): client-client, servers-servers and clients-servers.
– inside a datacenter, a server is elected as leader (consensus paradigm) that has as extra duty : the processing of all queries and transactions while other servers have as main role to replicate the leader data (failover)
– between datacenters, the communication is only done between servers.
Address terminology
In Consul, agents expose services to their clients(mainly the HTTP and DNS interfaces) but agents also have to expose an API to its peers to communicate with them (gossip protocol)
These communication don’t have at all the same semantics and roles. That’s why Consul introduces for any agent two kinds of address :
– The client address : used for client interfaces to the agent.
By default, this binds only to localhost.
– The cluster address : used for communication between Consul agents in a cluster.
All of them are configurable.
Bootstrapping a Datacenter
Before a Consul cluster can begin to service requests, a server node must be elected leader. Bootstrapping is the process of joining these initial server nodes into a cluster/datacenter.
The recommended way to bootstrap the servers is to use the -bootstrap-expect
configuration option for server agents.
This informs Consul of the expected number of server nodes and automatically bootstraps when that many servers are available.
To allow a server agent (or even a client agent) to join a server, the recommended way is the automatic way : -join
,
-start_join
or still -retry-join
option.
Starting an agent
We can run them with the dev mode or the production mode (default).
The dev mode stores the state in memory, so all is lost after the container shutdown.
This mode uses by default the loopback address as cluster/advertising address.
In prod mode we need to provide configuration : a minimal (the data-dir arg) if all default values suit for us and there is a single private IPv4.
Otherwise we need to set them.
For example, if we need to specify the broadcast IPv4 address, we could write something like
consul agent -advertise=192.168.227.42 -data-dir=/tmp/consul
If we want to specify the arguments in a configuration file we could do :
consul agent -advertise=192.168.227.42 -config-file=myConfig.json
Stopping an agent
An agent can be stopped in two ways: gracefully or forcefully.
A SIGINT(2) signal sent to the process means a gracefully stop.
As a consequence the agent first notifies the cluster that it will leave the cluster. The information is right now updated.
A SIGKILL(9) signal sent to the process means a forcefully stop. Here, the agent ends immediately. As a consequence, the failure is considered as critical by the cluster. And only after some seconds, the cluster will eventually detect that the node has died and update the cluster state consequently.
Register a service
We have multiple ways
– Service Definition
It is a static way to register services.
The service definition is a file with the .json or .hcl extension.
We could load it in consul either by providing the service definition with -config-file
option to the agent or by placing the service definition file inside the -config-dir
of the agent.
Check definitions can be updated by sending a SIGHUP to the agent.
– The client CLI
consul services register my-service.json
– The HTTP API
It is a dynamical way to register services. The method to use :
Method | Path | Produces |
---|---|---|
PUT | /agent/service/register | application/json |
To get more information, see the official doc.
– Making the app consul-aware
There are many flavors of that. The idea is including in the application, a library that bootstraps the register of the application in consul as a service.
For example, in the Java world Spring Boot may be consul-aware if we configure it for.
The advantage : minimal configuration.
– Running a independent process dedicated to
That process is not related to the application but that registers it as a service in consul.
The advantage : works even for legacy application that cannot or don’t want to be coupled to consul.
– Service integration/orchestration tools
The list is long : Registrator, DockerSwarm, Kubernetes, Nomad.
Local and remote services
We can register local services or remote services.
Local service is the preferred way since that is simple and provides a desirable coupling : low latency for health-checks, consistency : a node down = services necessarily down too.
When it happens to be complicate to run a consul agent on a node, we can also register external services.
Example of a local service definition (« address » is not valued) :
{"service": { "name": "counting", "tags": ["go"], "port": 9001, "check": { "id": "countingcheck", "name": "counting check", "http": "http://localhost:9001", "interval": "10s" } } } |
The remote service definition version (« address » is valued with the remote ip or domain) :
{"service": { "name": "counting", "tags": ["go"], "port": 9001, "address":"ipOrDomain", "check": { "id": "countingcheck", "name": "counting check", "http": "http://ipOrDomain:9001", "interval": "10s" } } } |
Relation between nodes and services
Services are registered on a node.
A node can register multiple times the same service and a same service can also be registered on multiple nodes.
When a node is not reachable or in maintenance, all theses services are not listed any longer in the DNS while they still are present in the catalog that is a static view.
The DNS Interface of Consul
The general idea
It is one of the primary interface to Consul. The DNS provided by Consul is an alternative to the HTTP api and it has the great advantage to be strongly portable and low coupling to Consul.
Some DNS record types relevant in Consul
For node lookups, by default records returned are A (IPv4 address), AAAA (IPv6 address) , and TXT (Text record/Machine readable data) containing the node_meta values of the node.
For services lookups, by default records A record is returned.
We could also specify SRV as record type. In that case is also returned the port that the service is registered on.
Convention for the DNS syntax
About lookup syntax, note that for nodes and services lookup, consul.
is default domain resolved by the Consul DNS (that is configurable) and that the datacenter part is by default which one the agent that does the lookup.
Node Lookup
The way to reference a node : <node>.node[.datacenter].<domain>
.
Note that dc1
the name of the default datacenter.
So for example if we have a node with the name client1
attached to the dc1
datacenter, we could reference the node such as client1.node.dc1.consul
.
Example with dig (without specifying the datacenter) :
dig @127.0.0.1 -p 8600 client1.node.consul
Output :
; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> @127.0.0.1 -p 8600 client1.node.consul ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14295 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 2 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;client1.node.consul. IN A ;; ANSWER SECTION: client1.node.consul. 0 IN A 172.17.0.4 ;; ADDITIONAL SECTION: client1.node.consul. 0 IN TXT "consul-network-segment=" ;; Query time: 0 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Wed Feb 05 19:39:35 CET 2020 ;; MSG SIZE rcvd: 100 |
Service Lookup
Service queries support two lookup methods: standard and strict RFC 2782.
Standard lookup
The way to reference a node : [tag.]<service>.service[.datacenter].<domain>
.
For example if we have a service with the name counting
registered on a node of the datacenter dc1, we could reference the service such as counting.service.dc1.consul
.
Example with dig (by specifying the datacenter) :
dig @127.0.0.1 -p 8600 counting.service.dc1.consul SRV
Output :
; <<>> DiG 9.11.3-1ubuntu1.11-Ubuntu <<>> @127.0.0.1 -p 8600 counting.service.dc1.consul SRV ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64461 ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 3 ;; WARNING: recursion requested but not available ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 4096 ;; QUESTION SECTION: ;counting.service.dc1.consul. IN SRV ;; ANSWER SECTION: counting.service.dc1.consul. 0 IN SRV 1 1 9001 ac110005.addr.dc1.consul. ;; ADDITIONAL SECTION: ac110005.addr.dc1.consul. 0 IN A 172.17.0.5 client1.node.dc1.consul. 0 IN TXT "consul-network-segment=" ;; Query time: 0 msec ;; SERVER: 127.0.0.1#8600(127.0.0.1) ;; WHEN: Wed Feb 05 19:51:06 CET 2020 ;; MSG SIZE rcvd: 165 |
Standard lookup
todo…
Consul : useful CLI command lines
Starting an agent
The command : consul agent
Flags :
-dev
: dev mode
-join=value
: Address of an agent to join at start time. Can be specified multiple times.
-advertise=fooIpAddress
: The advertise address is the address that we advertise/communicate to other nodes in the cluster.
It means that nodes use that address to communicate with the agent.
By default, the -bind
address is advertised.
-bind
– The address we currently bound for internal cluster communications. This is an IP address that should be reachable by all other nodes in the cluster. By default, this is « 0.0.0.0 », meaning Consul will bind to all addresses on the local machine and will advertise the private IPv4 address to the rest of the cluster. If there are multiple private IPv4 addresses available, Consul will exit with an error at startup.
-data-dir=fooDir
: the data directory for the agent to store state (very important for server agents).
-config-file=fooConfigFile
: a configuration file to load. This option can be specified multiple times to load multiple configuration files.
Reload the agent configuration
Usage: consul reload
.
This is an alternative to sending the SIGHUP signal to the agent process.
List members of an agent
Usage: consul members [options]
Flags :
-detailed
: provides detailed information
Follow logs of a running Consul agent
Usage: consul monitor [options]
Flags :
-log-level=logLevel
: log level of the messages to show. By default this is « info ». Other possible values are « trace », « debug », « warn », and « err ».
Validate consul configuration files
The command : consul validate fooJsonFileOrDirectory
Places a node or service into maintenance mode
The command to execute from the target node : consul maint fooJsonFileOrDirectory
.
Here are the options :
-disable Disable maintenance mode. -enable Enable maintenance mode. -reason=<string> Text describing the maintenance reason. -service=<string> Control maintenance mode for a specific service ID. |
List datacenters catalog
Usage: consul catalog datacenters [options]
List nodes catalog
Usage: consul catalog nodes [options]
Some useful flags :
-detailed Output detailed information about the nodes including their addresses and metadata. -filter=<string> Filter to use with the request -service=<id or name> Service `id or name` to filter nodes. Only nodes which are providing the given service will be returned. |
List services catalog
Usage: consul catalog services [options]
Some useful flags :
-node=<id or name> Node `id or name` for which to list services. -tags Display each service's tags as a comma-separated list beside each service entry. |
Consul : useful webservice / HTTP API
General advises
Append ?pretty
after the request url to get a pretty output
Agent API
Returns the members the agent sees in the cluster gossip pool (depends on the agent):
curl http://localhost:8500/v1/agent/members
Returns the services registered on the local agent:
curl http://localhost:8500/v1/agent/services
Returns the service configuration for a service registered on the local agent:
curl http://localhost:8500/v1/agent/service/:serviceId
Returns the configuration and member information of the local agent :
curl http://localhost:8500/v1/agent/self
Catalog API
Returns the list of all datacenters :
curl http://localhost:8500/v1/catalog/datacenters
Returns the nodes registered in a given datacenter :
curl http://localhost:8500/v1/catalog/nodes
Optional params :
dc (string: « ») : the dc to query. By default, the dc of the agent is used.
For example : v1/catalog/nodes?dc=dc1
Returns the services registered in a given datacenter :
curl http://localhost:8500/v1/catalog/services
Optional params :
dc (string: « ») : the dc to query.
Consul environment variables
CONSUL_HTTP_ADDR
: The default HTTP API address to the local Consul agent (not the remote server).
For example : CONSUL_HTTP_ADDR=localhost:8500