我在Xubuntu上,应用了所有最新更新。

我已经设置了Docker Repo,并安装了最新的Docker套件。

我使用了apt-get并安装了docker-compose

我已经创建了以下docker-compose.yaml文件:

version: "3.3"
 
networks:
  cassandra-net:
    driver: bridge
    
services:
 
  cassandra-1:
    image: "cassandra:latest"
    container_name: "cassandra-1"
    ports:
      - "7000:7000"
      - "9042:9042"
    networks:
      - "cassandra-net"
    volumes:
      - ./volumes/cassandra-1:/var/lib/cassandra:rw      

  cassandra-2:
    image: "cassandra:latest"
    container_name: "cassandra-2"
    environment:
      - "CASSANDRA_SEEDS=cassandra-1"
    networks:
      - "cassandra-net"
    depends_on:
      - "cassandra-1"
    volumes:
      - ./volumes/cassandra-2:/var/lib/cassandra:rw      
 
  cassandra-3:
    image: "cassandra:latest"
    container_name: "cassandra-3"
    networks:
      - "cassandra-net"
    environment:
      - "CASSANDRA_SEEDS=cassandra-1"
    depends_on:
      - "cassandra-1"
    volumes:
      - ./volumes/cassandra-3:/var/lib/cassandra:rw

当我使用此bash脚本检查三个节点(cassandra-1,cassandra-2,cassandra-3)中的每个节点的状态时:

docker exec -it cassandra-1 nodetool status
docker exec -it cassandra-2 nodetool status
docker exec -it cassandra-3 nodetool status

我得到以下输出:

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.18.0.3  252.53 KiB  16      100.0%            640dbf8a-13bb-46e5-8b1e-4542ee3352c4  rack1
UN  172.18.0.2  189.34 KiB  16      100.0%            7807001a-1885-41d1-b661-bd6c7e0db239  rack1

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.18.0.4  178.24 KiB  16      100.0%            fe8fb5fe-7342-4eeb-92bb-01a755ecd8ad  rack1
UN  172.18.0.2  263.65 KiB  16      100.0%            7807001a-1885-41d1-b661-bd6c7e0db239  rack1

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.18.0.3  178.2 KiB   16      100.0%            640dbf8a-13bb-46e5-8b1e-4542ee3352c4  rack1
UN  172.18.0.2  263.65 KiB  16      100.0%            7807001a-1885-41d1-b661-bd6c7e0db239  rack1

我希望看到每个节点都会看到所有三个地址(172.18.0.2,3,4),但是每个节点仅“看到”另一个节点,而不是其他两个节点。

分析解答

我使用dockercompose以在本地环境上启动节点也有类似的问题。

我通过以下组合解决了它:

  • 组合文件中的控制节点的启动顺序首先要确保种子节点cassandra-1启动且健康,然后启动cassandra-2,并确保cassandra-2节点启动且健康,然后启动cassandra-3,等等。基本上,防止节点同时启动,尤其是种子节点后的节点。当节点与曲目同时启动时,它可能导致错误,例如与令牌范围的冲突,导致某些节点无法加入群集。
  • 使用告密者配置,该配置更像您的生产环境,通常是multi-node或multi-cluster或multi-datacenter的必要时。例如,您可以使用gossipingpropertyfilesnitch,该gossipingpropertyfilesnitch也是Cassandra教程中用于初始化多个节点群集(多个数据中心)的类型。
  • 明确设置CASSANDRA_CLUSTER_NAMECASSANDRA_DC环境变量,该变量在cassandra.yaml配置上相应地设置了cluster_name,并在cassandra-rackdc.properties文件上设置dc选项。这使您可以明确地告诉节点加入相同的数据中心和群集。这些选项仅与GossipingPropertyFileSnitch有关。

这样,这是您的compose文件的修改版本:

version: "3.3"

networks:
  cassandra-net:
    driver: bridge

services:

  cassandra-1:
    image: "cassandra:latest"  # cassandra:4.1.3
    container_name: "cassandra-1"
    ports:
      - 7000:7000
      - 9042:9042
    networks:
      - cassandra-net
    environment:
      - CASSANDRA_START_RPC=true       # default
      - CASSANDRA_RPC_ADDRESS=0.0.0.0  # default
      - CASSANDRA_LISTEN_ADDRESS=auto  # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
      - CASSANDRA_CLUSTER_NAME=my-cluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=my-datacenter-1
    volumes:
      - cassandra-node-1:/var/lib/cassandra:rw
    healthcheck:
      test: ["CMD-SHELL", "nodetool status"]
      interval: 2m
      start_period: 2m
      timeout: 10s
      retries: 3

  cassandra-2:
    image: "cassandra:latest"  # cassandra:4.1.3
    container_name: "cassandra-2"
    ports:
      - 9043:9042
    networks:
      - cassandra-net
    environment:
      - CASSANDRA_START_RPC=true       # default
      - CASSANDRA_RPC_ADDRESS=0.0.0.0  # default
      - CASSANDRA_LISTEN_ADDRESS=auto  # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
      - CASSANDRA_CLUSTER_NAME=my-cluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=my-datacenter-1
      - CASSANDRA_SEEDS=cassandra-1
    depends_on:
      cassandra-1:
        condition: service_healthy
    volumes:
      - cassandra-node-2:/var/lib/cassandra:rw
    healthcheck:
      test: ["CMD-SHELL", "nodetool status"]
      interval: 2m
      start_period: 2m
      timeout: 10s
      retries: 3

  cassandra-3:
    image: "cassandra:latest"  # cassandra:4.1.3
    container_name: "cassandra-3"
    ports:
      - 9044:9042
    networks:
      - cassandra-net
    environment:
      - CASSANDRA_START_RPC=true       # default
      - CASSANDRA_RPC_ADDRESS=0.0.0.0  # default
      - CASSANDRA_LISTEN_ADDRESS=auto  # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS
      - CASSANDRA_CLUSTER_NAME=my-cluster
      - CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch
      - CASSANDRA_DC=my-datacenter-1
      - CASSANDRA_SEEDS=cassandra-1
    depends_on:
      cassandra-2:
        condition: service_healthy
    volumes:
      - cassandra-node-3:/var/lib/cassandra:rw
    healthcheck:
      test: ["CMD-SHELL", "nodetool status"]
      interval: 2m
      start_period: 2m
      timeout: 10s
      retries: 3

volumes:
  cassandra-node-1:
  cassandra-node-2:
  cassandra-node-3:

这里主要的是healthcheck

    healthcheck:
      test: ["CMD-SHELL", "nodetool status"]
      interval: 2m
      start_period: 2m
      timeout: 10s
      retries: 3

...以及每个节点上更新的depends_on

    depends_on:
      cassandra-2:
        condition: service_healthy

修改后的组合cassandra-3仅在cassandra-2健康时才开始,并且仅在cassandra-1健康时才启动cassandra-2。在该compose文件中:

  • 2分钟后致电nodetool status(为bootup/bootstrap提供时间的时间)
  • 如果它以<10s的响应且出口代码为0,则该节点应视为健康
  • 每2m重复一次检查,持续3次

如果您对docker container ls进行了调查,您会看到类似的东西:

CONTAINER ID   IMAGE              ...   STATUS                                  PORTS       NAMES
bce16c1b0de4   cassandra:latest   ...   Up About a minute (health: starting)    ...         cassandra-2
697fb8559c3c   cassandra:latest   ...   Up 3 minutes (healthy)                  ...         cassandra-1

...虽然节点从1开始。在上面的示例中,cassandra-3在开始之前等待cassandra-2为"(healthy)",这就是为什么您还没有看到它。

使用nodetool status并不是最好的HealthCheck,但它至少可以等待节点完成引导。您可以通过解析输出并确保节点在列表中为UN来改进它。 2m interval/period也是任意的,将适用于系统的任何内容都根据您的test设置。

那里还有一些额外的env var

    environment:
      - CASSANDRA_START_RPC=true       # default
      - CASSANDRA_RPC_ADDRESS=0.0.0.0  # default
      - CASSANDRA_LISTEN_ADDRESS=auto  # default, use IP addr of container # = CASSANDRA_BROADCAST_ADDRESS

...可能不需要,因为这些已经是cassandra docker映像上的默认值(请参阅dockerhub页面上的配置Cassandra部分。基本上,这些明确将容器的IP地址设置为倾听和广播地址。我只是在这里注意到它,以防默认值更改。

最后,在我们的环境中,如果您在同一台计算机中运行所有节点,则需要为它们指定不同的端口:

  cassandra-1:
    ...
    ports:
      - 7000:7000
      - 9042:9042

  cassandra-2:
    ...
    ports:
      - 9043:9042

  cassandra-3:
    ...
    ports:
      - 9044:9042

...否则,容器可能无法正确启动。

如果一切顺利:

$ docker container ls 
CONTAINER ID   IMAGE              COMMAND                  CREATED         STATUS                                 PORTS                                                                          NAMES
4f9f7459f8d5   cassandra:latest   "docker-entrypoint.s…"   5 minutes ago   Up About a minute (health: starting)   7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9044->9042/tcp                      cassandra-3
05225ba91e5d   cassandra:latest   "docker-entrypoint.s…"   5 minutes ago   Up 3 minutes (healthy)                 7000-7001/tcp, 7199/tcp, 9160/tcp, 0.0.0.0:9043->9042/tcp                      cassandra-2
ca2882224274   cassandra:latest   "docker-entrypoint.s…"   5 minutes ago   Up 5 minutes (healthy)                 7001/tcp, 0.0.0.0:7000->7000/tcp, 7199/tcp, 0.0.0.0:9042->9042/tcp, 9160/tcp   cassandra-1
$ docker exec cassandra-1 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.27.0.4  70.22 KiB   16      76.0%             5a4908f1-6e6f-42b1-88f2-8d5c6290b361  rack1
UN  172.27.0.3  75.19 KiB   16      59.3%             7060719b-d1db-4177-a2c3-1897320e6e33  rack1
UN  172.27.0.2  109.41 KiB  16      64.7%             94345229-fd00-424d-b16c-e1556fae7849  rack1
$ docker exec cassandra-2 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.27.0.4  70.22 KiB   16      76.0%             5a4908f1-6e6f-42b1-88f2-8d5c6290b361  rack1
UN  172.27.0.3  75.19 KiB   16      59.3%             7060719b-d1db-4177-a2c3-1897320e6e33  rack1
UN  172.27.0.2  109.41 KiB  16      64.7%             94345229-fd00-424d-b16c-e1556fae7849  rack1
$ docker exec cassandra-3 nodetool status
Datacenter: my-datacenter-1
===========================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load        Tokens  Owns (effective)  Host ID                               Rack 
UN  172.27.0.4  70.22 KiB   16      76.0%             5a4908f1-6e6f-42b1-88f2-8d5c6290b361  rack1
UN  172.27.0.3  75.19 KiB   16      59.3%             7060719b-d1db-4177-a2c3-1897320e6e33  rack1
UN  172.27.0.2  109.41 KiB  16      64.7%             94345229-fd00-424d-b16c-e1556fae7849  rack1

这样做的主要问题是启动节点需要很长时间。在该示例组合文件中,healthcheck.interval2m,所有3个节点都需要约5分钟的时间才能正确start-up。