我在Xubuntu上,应用了所有最新更新。
我已经设置了Docker Repo,并安装了最新的Docker套件。
我使用了apt-get
并安装了docker-compose
。
我已经创建了以下docker-compose.yaml
文件:
version: "3.3"
networks:
cassandra-net:
driver: bridge
services:
cassandra-1:
image: "cassandra:latest"
container_name: "cassandra-1"
ports:
- "7000:7000"
- "9042:9042"
networks:
- "cassandra-net"
volumes:
- ./volumes/cassandra-1:/var/lib/cassandra:rw
cassandra-2:
image: "cassandra:latest"
container_name: "cassandra-2"
environment:
- "CASSANDRA_SEEDS=cassandra-1"
networks:
- "cassandra-net"
depends_on:
- "cassandra-1"
volumes:
- ./volumes/cassandra-2:/var/lib/cassandra:rw
cassandra-3:
image: "cassandra:latest"
container_name: "cassandra-3"
networks:
- "cassandra-net"
environment:
- "CASSANDRA_SEEDS=cassandra-1"
depends_on:
- "cassandra-1"
volumes:
- ./volumes/cassandra-3:/var/lib/cassandra:rw
当我使用此bash脚本检查三个节点(cassandra-1,cassandra-2,cassandra-3)中的每个节点的状态时:
docker exec -it cassandra-1 nodetool status
docker exec -it cassandra-2 nodetool status
docker exec -it cassandra-3 nodetool status
我得到以下输出:
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.3 252.53 KiB 16 100.0% 640dbf8a-13bb-46e5-8b1e-4542ee3352c4 rack1
UN 172.18.0.2 189.34 KiB 16 100.0% 7807001a-1885-41d1-b661-bd6c7e0db239 rack1
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.4 178.24 KiB 16 100.0% fe8fb5fe-7342-4eeb-92bb-01a755ecd8ad rack1
UN 172.18.0.2 263.65 KiB 16 100.0% 7807001a-1885-41d1-b661-bd6c7e0db239 rack1
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.18.0.3 178.2 KiB 16 100.0% 640dbf8a-13bb-46e5-8b1e-4542ee3352c4 rack1
UN 172.18.0.2 263.65 KiB 16 100.0% 7807001a-1885-41d1-b661-bd6c7e0db239 rack1
我希望看到每个节点都会看到所有三个地址(172.18.0.2,3,4),但是每个节点仅“看到”另一个节点,而不是其他两个节点。
我使用dockercompose以在本地环境上启动节点也有类似的问题。
我通过以下组合解决了它:
- 组合文件中的控制节点的启动顺序首先要确保种子节点
cassandra-1
启动且健康,然后启动cassandra-2
,并确保cassandra-2
节点启动且健康,然后启动cassandra-3
,等等。基本上,防止节点同时启动,尤其是种子节点后的节点。当节点与曲目同时启动时,它可能导致错误,例如与令牌范围的冲突,导致某些节点无法加入群集。 - 使用告密者配置,该配置更像您的生产环境,通常是multi-node或multi-cluster或multi-datacenter的必要时。例如,您可以使用gossipingpropertyfilesnitch,该gossipingpropertyfilesnitch也是Cassandra教程中用于初始化多个节点群集(多个数据中心)的类型。
- 明确设置
CASSANDRA_CLUSTER_NAME
和CASSANDRA_DC
环境变量,该变量在cassandra.yaml
配置上相应地设置了cluster_name
,并在cassandra-rackdc.properties
文件上设置dc
选项。这使您可以明确地告诉节点加入相同的数据中心和群集。这些选项仅与GossipingPropertyFileSnitch
有关。
这样,这是您的compose文件的修改版本:
version: "3.3"
networks:
cassandra-net:
driver: bridge
services:
cassandra-1:
image: "cassandra:latest" # cassandra:4.1.3
container_name: "cassandra-1"
ports:
- 7000:7000
- 9042:9042
networks:
- cassandra-net
environment:
- CASSANDRA_START_RPC=true # default
- CASSANDRA_RPC_ADDRESS=0.0.0.0 # default
- CASSANDRA_LISTEN_ADDRESS=auto # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
- CASSANDRA_CLUSTER_NAME=my-cluster
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_DC=my-datacenter-1
volumes:
- cassandra-node-1:/var/lib/cassandra:rw
healthcheck:
test: ["CMD-SHELL", "nodetool status"]
interval: 2m
start_period: 2m
timeout: 10s
retries: 3
cassandra-2:
image: "cassandra:latest" # cassandra:4.1.3
container_name: "cassandra-2"
ports:
- 9043:9042
networks:
- cassandra-net
environment:
- CASSANDRA_START_RPC=true # default
- CASSANDRA_RPC_ADDRESS=0.0.0.0 # default
- CASSANDRA_LISTEN_ADDRESS=auto # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
- CASSANDRA_CLUSTER_NAME=my-cluster
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_DC=my-datacenter-1
- CASSANDRA_SEEDS=cassandra-1
depends_on:
cassandra-1:
condition: service_healthy
volumes:
- cassandra-node-2:/var/lib/cassandra:rw
healthcheck:
test: ["CMD-SHELL", "nodetool status"]
interval: 2m
start_period: 2m
timeout: 10s
retries: 3
cassandra-3:
image: "cassandra:latest" # cassandra:4.1.3
container_name: "cassandra-3"
ports:
- 9044:9042
networks:
- cassandra-net
environment:
- CASSANDRA_START_RPC=true # default
- CASSANDRA_RPC_ADDRESS=0.0.0.0 # default
- CASSANDRA_LISTEN_ADDRESS=auto # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
- CASSANDRA_CLUSTER_NAME=my-cluster
- CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
- CASSANDRA_DC=my-datacenter-1
- CASSANDRA_SEEDS=cassandra-1
depends_on:
cassandra-2:
condition: service_healthy
volumes:
- cassandra-node-3:/var/lib/cassandra:rw
healthcheck:
test: ["CMD-SHELL", "nodetool status"]
interval: 2m
start_period: 2m
timeout: 10s
retries: 3
volumes:
cassandra-node-1:
cassandra-node-2:
cassandra-node-3:
这里主要的是healthcheck
块
healthcheck:
test: ["CMD-SHELL", "nodetool status"]
interval: 2m
start_period: 2m
timeout: 10s
retries: 3
...以及每个节点上更新的depends_on
depends_on:
cassandra-2:
condition: service_healthy
修改后的组合cassandra-3
仅在cassandra-2
健康时才开始,并且仅在cassandra-1
健康时才启动cassandra-2
。在该compose文件中:
- 2分钟后致电
nodetool status
(为bootup/bootstrap提供时间的时间) - 如果它以<10s的响应且出口代码为0,则该节点应视为健康
- 每2m重复一次检查,持续3次
如果您对docker container ls
进行了调查,您会看到类似的东西:
CONTAINER ID IMAGE ... STATUS PORTS NAMES
bce16c1b0de4 cassandra:latest ... Up About a minute (health: starting) ... cassandra-2
697fb8559c3c cassandra:latest ... Up 3 minutes (healthy) ... cassandra-1
...虽然节点从1开始。在上面的示例中,cassandra-3
在开始之前等待cassandra-2
为"(healthy)",这就是为什么您还没有看到它。
使用nodetool status
并不是最好的HealthCheck,但它至少可以等待节点完成引导。您可以通过解析输出并确保节点在列表中为UN
来改进它。 2m
interval/period也是任意的,将适用于系统的任何内容都根据您的test
设置。
那里还有一些额外的env var
environment:
- CASSANDRA_START_RPC=true # default
- CASSANDRA_RPC_ADDRESS=0.0.0.0 # default
- CASSANDRA_LISTEN_ADDRESS=auto # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
...可能不需要,因为这些已经是cassandra
docker映像上的默认值(请参阅dockerhub页面上的配置Cassandra部分。基本上,这些明确将容器的IP地址设置为倾听和广播地址。我只是在这里注意到它,以防默认值更改。
最后,在我们的环境中,如果您在同一台计算机中运行所有节点,则需要为它们指定不同的端口:
cassandra-1:
...
ports:
- 7000:7000
- 9042:9042
cassandra-2:
...
ports:
- 9043:9042
cassandra-3:
...
ports:
- 9044:9042
...否则,容器可能无法正确启动。
如果一切顺利:
$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
4f9f7459f8d5 cassandra:latest "docker-entrypoint.s…" 5 minutes ago Up About a minute (health: starting) 7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9044->9042/tcp cassandra-3
05225ba91e5d cassandra:latest "docker-entrypoint.s…" 5 minutes ago Up 3 minutes (healthy) 7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9043->9042/tcp cassandra-2
ca2882224274 cassandra:latest "docker-entrypoint.s…" 5 minutes ago Up 5 minutes (healthy) 7001/tcp, 0.0.0.0:7000->7000/tcp, 7199/tcp, 0.0.0.0:9042->9042/tcp, 9160/tcp cassandra-1
$ docker exec cassandra-1 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.27.0.4 70.22 KiB 16 76.0% 5a4908f1-6e6f-42b1-88f2-8d5c6290b361 rack1
UN 172.27.0.3 75.19 KiB 16 59.3% 7060719b-d1db-4177-a2c3-1897320e6e33 rack1
UN 172.27.0.2 109.41 KiB 16 64.7% 94345229-fd00-424d-b16c-e1556fae7849 rack1
$ docker exec cassandra-2 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.27.0.4 70.22 KiB 16 76.0% 5a4908f1-6e6f-42b1-88f2-8d5c6290b361 rack1
UN 172.27.0.3 75.19 KiB 16 59.3% 7060719b-d1db-4177-a2c3-1897320e6e33 rack1
UN 172.27.0.2 109.41 KiB 16 64.7% 94345229-fd00-424d-b16c-e1556fae7849 rack1
$ docker exec cassandra-3 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.27.0.4 70.22 KiB 16 76.0% 5a4908f1-6e6f-42b1-88f2-8d5c6290b361 rack1
UN 172.27.0.3 75.19 KiB 16 59.3% 7060719b-d1db-4177-a2c3-1897320e6e33 rack1
UN 172.27.0.2 109.41 KiB 16 64.7% 94345229-fd00-424d-b16c-e1556fae7849 rack1
这样做的主要问题是启动节点需要很长时间。在该示例组合文件中,healthcheck.interval
为2m
,所有3个节点都需要约5分钟的时间才能正确start-up。