- 17 클 러 스 터 gossip 프로 토 콜

gossip 프로 토 콜
전편 은 클 러 스 터 의 cluster meet 명령 실현 과정 을 소개 했다. 악수 과정 이 끝 난 후에 A 노드 는 B 노드 정 보 를 gossip 협 의 를 통 해 클 러 스 터 의 다른 노드 에 전파 하고 다른 노드 도 B 노드 와 악 수 를 한다. 최종 적 으로 시간 이 지나 면 B 노드 는 클 러 스 터 가 가지 고 있 는 노드 에 의 해 인식 된다.
배경 지식 보충:
Gossip 알고리즘 은 이름 그대로 사무실 팔괘 에서 영감 을 받 아 한 사람 이 팔괘 만 하면 제 한 된 시간 안에 모든 사람 이 이 팔괘 의 정 보 를 알 게 되 는데 이런 방식 도 바이러스 전파 와 유사 하기 때문에 Gossip 에는 '여담 알고리즘', '감염 병 전파 알고리즘', '바이러스 감염 알고리즘', '루머 전파 알고리즘' 이라는 별명 이 많다.
그러나 Gossip 는 새로운 것 이 아니 라 예전 의 범 홍 검색, 경로 알고리즘 은 모두 이 범주 에 속 했 고 다른 것 은 Gossip 이 이런 알고리즘 에 명확 한 의미, 구체 적 인 실시 방법 과 수렴 성 증명 을 제공 했다.
Gossip 과정 은 피 드 노드 에서 시 작 됩 니 다. 피 드 노드 가 상태 가 있어 네트워크 의 다른 노드 로 업데이트 해 야 할 때 주변 몇 개의 노드 를 무 작위 로 선택 하여 메 시 지 를 퍼 뜨리 고 메 시 지 를 받 은 노드 도 이 과정 을 반복 합 니 다. 최종 네트워크 의 모든 노드 가 메 시 지 를 받 을 때 까지 입 니 다.이 과정 은 일정한 시간 이 필요 할 수 있 습 니 다. 특정한 시간 에 모든 노드 가 소식 을 받 는 것 을 보장 할 수 없 지만 이론 적 으로 모든 노드 가 소식 을 받 을 수 있 기 때문에 최종 일치 성 협의 입 니 다.
다음은 Gossip 전파 의 완전한 과정 을 구체 적 인 사례 를 통 해 알 아 보 겠 습 니 다.
명확 하 게 설명 하기 위해 서 우 리 는 먼저 전제 설정 을 한다.
(1) Gossip 은 주기 적 으로 소식 을 퍼 뜨리 는 것 으로 주 기 를 1 초 (2) 감 염 된 노드 로 한정 하여 무 작위 로 k 개의 인접 노드 (fan - out) 를 선택 하여 소식 을 퍼 뜨 린 다. 여 기 는 fan - out 을 3 으로 설정 하고 매번 최대 3 개의 노드 로 퍼 뜨 린 다.(3) 매번 에 메 시 지 를 퍼 뜨 릴 때마다 아직 보 내지 않 은 노드 를 선택 하여 퍼 뜨 린 다 (4) 메 시 지 를 받 은 노드 는 더 이상 노드 에 퍼 뜨리 지 않 는 다. 예 를 들 어 A - > B. 그러면 B 가 퍼 뜨 릴 때 A 에 게 보 내지 않 는 다.
여 기 는 모두 16 개의 노드 가 있 고 노드 1 은 초기 감 염 된 노드 이 며 Gossip 과정 을 통 해 모든 노드 가 감 염 됩 니 다.
참조https://www.jianshu.com/p/8279d6fd65bb
2 redis 실현
Redis 에서 노드 정 보 는 어떻게 전 파 됩 니까?정 답 은 PING 나 PONG 메 시 지 를 보 낼 때 노드 정보 가 포 함 된 뒤 전파 된다.
레 디 스 클 러 스 터 에서 소식 이 어떻게 추상 적 인지 소개 하 겠 습 니 다.하나의 메시지 대상 은 PING, PONG, MEET 일 수도 있 고 UPDATE, PUBLISH, FAIL 등 소식 일 수도 있다.그들 은 모두 clustermsg 형식의 구조 로 이 유형 은 주로 메시지 헤더 와 메시지 데이터 로 구성 된다.

메시지 포두 부 는 서명, 메시지 의 전체 크기, 버 전 과 메 시 지 를 보 내 는 노드 의 정 보 를 포함한다.

메시지 데 이 터 는 하나의 연합 체 유 니 온 clusterMsgData 로 연합 체 에 서로 다른 구조 체 가 있어 서로 다른 소식 을 구축한다.

PING, PONG, MEET 는 한 종류 에 속 하 며 clustermsgDataGossip 형식의 배열 로 여러 노드 의 정 보 를 저장 할 수 있 습 니 다. 이 구 조 는 다음 과 같 습 니 다.

/* Initially we don't know our "name", but we'll find it once we connect
 * to the first node, using the getsockname() function. Then we'll use this
 * address for all the next messages. */
typedef struct {
	  //      
    //        ，          
    //   MEET            ，              
    char nodename[CLUSTER_NAMELEN];
    //       PING    
    uint32_t ping_sent;
    //       PONG    
    uint32_t pong_received;
    //    IP  
    char ip[NET_IP_STR_LEN];  /* IP address last time it was seen */
    //       
    uint16_t port;              /* port last time it was seen */
    //      
    uint16_t flags;             /* node->flags copy */
    //     ，   
    uint16_t notused1;          /* Some room for future improvements. */
    uint32_t notused2;
} clusterMsgDataGossip;

clusterSendPing() 함수 에서 먼저 무 작위 로 선택 한 노드 의 정 보 를 메시지 에 추가 합 니 다.코드 는 다음 과 같 습 니 다:

/* Send a PING or PONG packet to the specified node, making sure to add enough
 * gossip informations. */
//           MEET 、 PING    PONG    
void clusterSendPing(clusterLink *link, int type) {
    unsigned char *buf;
    clusterMsg *hdr;
    int gossipcount = 0; /* Number of gossip sections added so far. */
    int wanted; /* Number of gossip sections we want to append if possible. */
    int totlen; /* Total packet length. */
    /* freshnodes is the max number of nodes we can hope to append at all:
     * nodes available minus two (ourself and the node we are sending the
     * message to). However practically there may be less valid nodes since
     * nodes in handshake state, disconnected, are not considered. */
    // freshnodes       gossip       
    //          ，    freshnodes     
    //   freshnodes         0  ，       gossip   
    // freshnodes           nodes           2 
    //     2      ，    myself   （            ）
    //        gossip       
    int freshnodes = dictSize(server.cluster->nodes)-2;

    /* How many gossip sections we want to add? 1/10 of the number of nodes
     * and anyway at least 3. Why 1/10?
     *         gossip  ,gossip               1/10，        3     。
     *      gossip            1/10，            ，   2  node_timeout   ，
     *          ，              ，          ； 1/10        ： 
     * If we have N masters, with N/10 entries, and we consider that in
     * node_timeout we exchange with each other node at least 4 packets
     * (we ping in the worst case in node_timeout/2 time, and we also
     * receive two pings from the host), we have a total of 8 packets
     * in the node_timeout*2 falure reports validity time. So we have
     * that, for a single PFAIL node, we can expect to receive the following
     * number of failure reports (in the specified window of time):
      * PROB * GOSSIP_ENTRIES_PER_PACKET * TOTAL_PACKETS:
     *
     * PROB = probability of being featured in a single gossip entry,
     *        which is 1 / NUM_OF_NODES.
     * ENTRIES = 10.
     * TOTAL_PACKETS = 2 * 4 * NUM_OF_MASTERS.
     *
     * If we assume we have just masters (so num of nodes and num of masters
     * is the same), with 1/10 we always get over the majority, and specifically
     * 80% of the number of nodes, to account for many masters failing at the
     * same time.
     *
     * Since we have non-voting slaves that lower the probability of an entry
     * to feature our node, we set the number of entires per packet as
     * 10% of the total nodes we have. 
     
     *      N     ，     node_timeout ，                  4    ：
     *        node_timeout/2  ，          PING 。    PING  ，   PONG 。
     *   ， node_timeout   ，         A     PING ，       A   ，       PING     ，   2 PONG 。
     *   ，       node_timeout*2 ，              8    。
     *   ，          8*N    ，      ，            1/10，
     *   ，            8*N*(1/10)，   N*80%，  ，                    。
     */
    // wanted                 ，      3
    // wanted       gossip             
    wanted = floor(dictSize(server.cluster->nodes)/10);
    if (wanted < 3) wanted = 3;
    //    wanted      freshnodes。
    if (wanted > freshnodes) wanted = freshnodes;

    /* Compute the maxium totlen to allocate our buffer. We'll fix the totlen
     * later according to the number of gossip sections we really were able
     * to put inside the packet. */
    //            
    totlen = sizeof(clusterMsg)-sizeof(union clusterMsgData);
    totlen += (sizeof(clusterMsgDataGossip)*wanted);
    /* Note: clusterBuildMessageHdr() expects the buffer to be always at least
     * sizeof(clusterMsg) or more. */
    //                  
    if (totlen < (int)sizeof(clusterMsg)) totlen = sizeof(clusterMsg);
    //     
    buf = zcalloc(totlen);
    hdr = (clusterMsg*) buf;

    /* Populate the header. */
    //     PING     
    if (link->node && type == CLUSTERMSG_TYPE_PING)
        link->node->ping_sent = mstime();
    //        
    clusterBuildMessageHdr(hdr,type);

    /* Populate the gossip fields */
    int maxiterations = wanted*3;
    //       
    while(freshnodes > 0 && gossipcount < wanted && maxiterations--) {
        //           
        dictEntry *de = dictGetRandomKey(server.cluster->nodes);
        clusterNode *this = dictGetVal(de);
        clusterMsgDataGossip *gossip;
        int j;

        /* Don't include this node: the whole packet header is about us
         * already, so we just gossip about other nodes. */
        // 1.       ，  myself  
        if (this == myself) continue;

        /* Give a bias to FAIL/PFAIL nodes. */
        // 2.                     
        if (maxiterations > wanted*2 &&
            !(this->flags & (CLUSTER_NODE_PFAIL|CLUSTER_NODE_FAIL)))
            continue;

        /* In the gossip section don't include:
         * 1) Nodes in HANDSHAKE state.
         * 3) Nodes with the NOADDR flag set.
         * 4) Disconnected nodes if they don't have configured slots.
         */
        //               ：
        /*
            1.          
            2.   NOADDR     
            3.                 
        */
        if (this->flags & (CLUSTER_NODE_HANDSHAKE|CLUSTER_NODE_NOADDR) ||
            (this->link == NULL && this->numslots == 0))
        {
            freshnodes--; /* Tecnically not correct, but saves CPU. */
            continue;
        }

        /* Check if we already added this node */
        //      gossip            ，     (      ,        )
        for (j = 0; j < gossipcount; j++) {
            if (memcmp(hdr->data.ping.gossip[j].nodename,this->name,
                    CLUSTER_NAMELEN) == 0) break;
        }
        // j    == gossipcount
        if (j != gossipcount) continue;

        /* Add it */
        //         ，      gossip   
        freshnodes--;
        //             
        gossip = &(hdr->data.ping.gossip[gossipcount]);
        //     
        memcpy(gossip->nodename,this->name,CLUSTER_NAMELEN);
        //     PING   
        gossip->ping_sent = htonl(this->ping_sent);
        //    PING     
        gossip->pong_received = htonl(this->pong_received);
        //       IP port
        memcpy(gossip->ip,this->ip,sizeof(this->ip));
        gossip->port = htons(this->port);
        //     
        gossip->flags = htons(this->flags);
        gossip->notused1 = 0;
        gossip->notused2 = 0;
        //      gossip       1
        gossipcount++;
    }

    /* Ready to send... fix the totlen fiend and queue the message in the
     * output buffer. */
    //         
    totlen = sizeof(clusterMsg)-sizeof(union clusterMsgData);
    totlen += (sizeof(clusterMsgDataGossip)*gossipcount);
    //             
    hdr->count = htons(gossipcount);
    //             
    hdr->totlen = htonl(totlen);
    //     
    clusterSendMessage(link,buf,totlen);
    zfree(buf);
}

Gossip 프로 토 콜 에 포 함 된 노드 정보 개 수 는 wanted 개 이 고 wanted 의 값 은 클 러 스 터 노드 의 10 분 의 1 에서 아래로 조정 하 며 최소 3 과 같 습 니 다.10 분 의 1 을 선택 한 이 유 는 Redis Cluster 에서 고장 전이 시간 초과 시간 을 계산 하 는 것 이 server. cluster 이기 때 문 입 니 다.node_timeout * 2, 따라서 노드 오프라인 이 있 으 면 대부분의 클 러 스 터 노드 에서 보 내 온 오프라인 보 고 를 받 을 수 있 습 니 다. 저 자 는 주석 에서 10 분 의 1 의 유래 를 설명 했다. 결론 은 N 개의 노드 상황 에서 보 낼 노드 wanted 가 바로 N/10， 클 러 스 터 오프라인 보 고 를 받 을 확률 이 바로 8*N/10, 즉 80％ 이다. 그러면 대부분의 클 러 스 터 노드 가 보 낸 오프라인 보 고 를 받 을 수 있다.
그 다음 에 메시지 의 전체 크기, 즉 totlen 변 수 를 계산 하고 메시지 헤더 에 wanted 개의 노드 정 보 를 추가 합 니 다.메시지 에 공간 을 분배 하고 clusterBuildMessageHdr() 함 수 를 호출 하여 메시지 헤더 부분 을 구축 하고 보 낸 노드 의 정 보 를 채 웁 니 다.마지막 으로 clusterSendMessage 를 호출 하여 메 시 지 를 보 냅 니 다.Gossip 협 의 를 통 해 매번 에 일부 노드 정 보 를 목표 노드 에 보 낼 수 있 고 모든 노드 는 이렇게 한다. 시간 이 충분 하면 이론 적 으로 클 러 스 터 중의 모든 노드 는 서로 인식 한다.물론 gossip 프로 토 콜 의 폐 해 는 논의 되 지 않 습 니 다.
clusterProcessGossip Section 은 연결 을 구축 하 는 과정 에서 가지 고 있 는 gossip 정 보 를 분석 하고 연결 할 노드 에 추가 합 니 다.

/* Process the gossip section of PING or PONG packets.
 * Note that this function assumes that the packet is already sanity-checked
 * by the caller, not in the content of the gossip section, but in the
 * length. */
//        PING or PONG    ，                 
void clusterProcessGossipSection(clusterMsg *hdr, clusterLink *link) {
	   //               
    uint16_t count = ntohs(hdr->count);
    // clusterMsgDataGossip     
    clusterMsgDataGossip *g = (clusterMsgDataGossip*) hdr->data.ping.gossip;
    //        
    clusterNode *sender = link->node ? link->node : clusterLookupNode(hdr->sender);
	 
	   //          
    while(count--) {
    	  //          
        uint16_t flags = ntohs(g->flags);
        clusterNode *node;
        sds ci;

        if (server.verbosity == LL_DEBUG) {
        	   //          ，        sds   ci
            ci = representClusterNodeFlags(sdsempty(), flags);
            serverLog(LL_DEBUG,"GOSSIP %.40s %s:%d %s",
                g->nodename,
                g->ip,
                ntohs(g->port),
                ci);
            sdsfree(ci);
        }

        /* Update our state accordingly to the gossip sections */
        /*                */
        //     name           
        node = clusterLookupNode(g->nodename);
         //   node  
        if (node) {
            /* We already know this node.
               Handle failure reports, only when the sender is a master. */
            //    sender            ，              
            if (sender && nodeIsMaster(sender) && node != myself) {
            	   //                 
                if (flags & (CLUSTER_NODE_FAIL|CLUSTER_NODE_PFAIL)) {
                	   //  sender    node      
                    if (clusterNodeAddFailureReport(node,sender)) {
                        serverLog(LL_VERBOSE,
                            "Node %.40s reported node %.40s as not reachable.",
                            sender->name, node->name);
                    }
                      //   node           FAIL  
                    markNodeAsFailingIfNeeded(node);
                } else { //               
                	   //    sender        node      ，       
                    if (clusterNodeDelFailureReport(node,sender)) {
                        serverLog(LL_VERBOSE,
                            "Node %.40s reported node %.40s is back online.",
                            sender->name, node->name);
                    }
                }
            }

            /* If we already know this node, but it is not reachable, and
             * we see a different address in the gossip section of a node that
             * can talk with this other node, update the address, disconnect
             * the old link if any, so that we'll attempt to connect with the
             * new address. */
            //   node  ，  node        
            //                      ，                  
            //             ，        
            if (node->flags & (CLUSTER_NODE_FAIL|CLUSTER_NODE_PFAIL) &&
                !(flags & CLUSTER_NODE_NOADDR) &&
                !(flags & (CLUSTER_NODE_FAIL|CLUSTER_NODE_PFAIL)) &&
                (strcasecmp(node->ip,g->ip) || node->port != ntohs(g->port)))
            {
            	  //            
                if (node->link) freeClusterLink(node->link);
                //               	
                memcpy(node->ip,g->ip,NET_IP_STR_LEN);                
                node->port = ntohs(g->port);
                 //         
                node->flags &= ~CLUSTER_NODE_NOADDR;
            }
        } else { // node   ，          
            /* If it's not in NOADDR state and we don't have it, we
             * start a handshake process against this IP/PORT pairs.
             *    node    NOADDR   ，          node 
             *     node    HANDSHAKE   。
             *
             * Note that we require that the sender of this gossip message
             * is a well known node in our cluster, otherwise we risk
             * joining another cluster. 
             *   ，         sender        ，
             *                  。
             */
            if (sender &&
                !(flags & CLUSTER_NODE_NOADDR) &&
                !clusterBlacklistExists(g->nodename))
            {
            	   //       
                clusterStartHandshake(g->ip,ntohs(g->port));
            }
        }

        /* Next node */
        //       
        g++;
    }
}

참고:
https://blog.csdn.net/men_wen/article/details/72871618

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

nginx websocket ip_해시 규칙

프로젝트 를 다운로드 한 후 서로 다른 네트워크 에 각각 이 demo 프로젝트 를 배치 합 니 다. 프로젝트 에서 환경 변수 에 따라 시스템 변 수 를 설정 합 니 다. spring.profiles.active=de...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

- 17 클 러 스 터 gossip 프로 토 콜

좋은 웹페이지 즐겨찾기