为redis添加slave的时候,健康检查出问题

问题描述
 
下面场景可在本机模拟。比如,当前有redis集群,7000-7005端口,共6个实例,3主3从。当使用redis-trib.rb add-node --slave 127.0.0.1:7006 127.0.0.1:7000来添加7006端口为从节点时,redis check集群是ok的。
C:\redis-cluster>redis-trib.rb check 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7000)
M: 0a74f64bbd606964c3ebd3f17d5f1285a466868f 127.0.0.1:7000
slots:0-5460 (5461 slots) master
2 additional replica(s)
M: 505f05f410d4b3763d7b85e9f62ea3777caefb9a 127.0.0.1:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 84175a278ad60ed0c330575173f84e54d145c11a 127.0.0.1:7003
slots: (0 slots) slave
replicates 0a74f64bbd606964c3ebd3f17d5f1285a466868f
S: 344c00d6bcc3f19e845c08f7df7fe1c357ca29e4 127.0.0.1:7005
slots: (0 slots) slave
replicates ba31f68916e533d5baaee2d012efd809f048930a
S: 00ef7a7e6fd71f64ff2d5092a083516fed02f413 127.0.0.1:7006
slots: (0 slots) slave
replicates 0a74f64bbd606964c3ebd3f17d5f1285a466868f
S: f07f96b98d81fa9530f27bccff8cd67cb141bc97 127.0.0.1:7004
slots: (0 slots) slave
replicates 505f05f410d4b3763d7b85e9f62ea3777caefb9a
M: ba31f68916e533d5baaee2d012efd809f048930a 127.0.0.1:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
但是,此时spring boot的/health健康检查不通过,使得依赖redis的服务实例下线。报错信息如下:
2017-08-14 10:14:25,903 WARN  [http-nio-10020-exec-31] org.springframework.boot.actuate.health.RedisHealthIndicator - Health check failed
java.lang.IllegalArgumentException: Node 127.0.0.1:7006 is unknown to cluster
at org.springframework.data.redis.connection.jedis.JedisClusterConnection$JedisClusterNodeResourceProvider.getResourceForSpecificNode(JedisClusterConnection.java:4133)
at org.springframework.data.redis.connection.jedis.JedisClusterConnection$JedisClusterNodeResourceProvider.getResourceForSpecificNode(JedisClusterConnection.java:4107)
at org.springframework.data.redis.connection.ClusterCommandExecutor.executeCommandOnSingleNode(ClusterCommandExecutor.java:145)
at org.springframework.data.redis.connection.ClusterCommandExecutor.executeCommandOnSingleNode(ClusterCommandExecutor.java:128)
at org.springframework.data.redis.connection.ClusterCommandExecutor.executeCommandOnArbitraryNode(ClusterCommandExecutor.java:116)
at org.springframework.data.redis.connection.jedis.JedisClusterConnection.clusterGetClusterInfo(JedisClusterConnection.java:3955)
at org.springframework.boot.actuate.health.RedisHealthIndicator.doHealthCheck(RedisHealthIndicator.java:56)
从报错位置单步下,可知出错在下面的位置:
public Jedis getResourceForSpecificNode(RedisClusterNode node) {

JedisPool pool = getResourcePoolForSpecificNode(node);
if (pool != null) {
return pool.getResource();
}

throw new IllegalArgumentException(String.format("Node %s is unknown to cluster", node));
}

protected JedisPool getResourcePoolForSpecificNode(RedisNode node) {

Assert.notNull(node, "Cannot get Pool for 'null' node!");

Map<String, JedisPool> clusterNodes = cluster.getClusterNodes();
if (clusterNodes.containsKey(node.asString())) {
return clusterNodes.get(node.asString());
}

return null;
}
主要是上面的第二个函数返回了null,返回null的原始是第二个函数是通过7006端口信息去查JedisPool,而这个时候clusterNodes中却没有7006端口的信息。
 
由于这些信息都是jedis的内容,不知道这是不是jedis的bug?有朋友碰到过吗?
已邀请:

xiaobaxi - Fang Oba

赞同来自:

在redis 7006的节点下执行下面的尝试一下:
cluster replicate 0a74f64bbd606964c3ebd3f17d5f1285a466868f

要回复问题请先登录注册