파 이 썬 구현 DFA 알고리즘 분석

12124 단어 Python DFA 알고리즘

개술
컴퓨터 운영 체제 의 프로 세 스 상태 와 전환 은 DFA 알고리즘 과 비슷 하 게 이해 할 수 있다.아래 그림 에서 보 듯 이 타원 은 상 태 를 나타 내 고 상태 간 의 연결선 은 사건 을 나타 내 며 프로 세 스 의 상태 와 사건 은 모두 확정 할 수 있 으 며 모두 궁 거 할 수 있다.
进程状态与切换图

DFA 알고리즘 은 다양한 응용 을 가지 고 있 으 며,여기 서 키워드 와 일치 하 는 분야 에서 의 응용 을 먼저 소개 한다.
2.일치 키워드
우 리 는 모든 텍스트 세 션 을 상태 로 할 수 있 습 니 다.예 를 들 어'일치 키워드'는'일치','일치 관계','일치 키'와'일치 키워드'다섯 개의 텍스트 세 션 으로 나 눌 수 있 습 니 다.
DFA示例1

【과정】:

초기 상태 가 비어 있 습 니 다.이벤트'필'을 촉발 할 때 상태'필'로 전환 합 니 다.

이벤트"배합"을 촉발 하여 상태"일치"로 전환 합 니 다.

마지막 상태 인'일치 키워드'로 전 환 될 때 까지 순서대로 유추 합 니 다.

'일치 알고리즘','일치 키워드','정보 추출'등 여러 키워드 의 상황 을 고려 합 시다.
DFA示例2

위의 그림 의 상태 도 는 트 리 구조 와 유사 한 것 을 볼 수 있 습 니 다.바로 이 구조 때문에 DFA 알고리즘 은 키워드 일치 에 있어 키워드 교체 방법(for 순환)보다 빠 릅 니 다.LeetCode 를 자주 사용 하 는 독자 들 은 나무 구조의 시간 복잡 도가 for 순환 의 시간 복잡 도보 다 작 아야 한 다 는 것 을 잘 알 아야 한다.
for 순환:


keyword_list = []

for keyword in ["    ", "     ", "    "]:
    if keyword in "DFA        ":
        keyword_list.append(keyword)

for 순환 은 키워드 표를 한 번 옮 겨 다 녀 야 합 니 다.키워드 표 가 확대 되면 서 필요 한 시간 도 점점 길 어 집 니 다.
DFA 알고리즘:'필'을 찾 을 때 이벤트 에 따라 특정한 순서 로 만 갑 니 다.예 를 들 어'일치 하 는 키워드'는'일치 하 는 알고리즘'으로 가지 않 기 때문에 반복 하 는 횟수 는 for 순환 보다 적 습 니 다.구체 적 인 실현 은 다음 글 에 놓 여 있다.
[문]:그러면 상태 도 에서 보 여 주 는 구 조 를 어떻게 구축 합 니까?
[답]:Python 에서 우 리 는 dict 데이터 구 조 를 사용 할 수 있 습 니 다.


state_event_dict = {
    " ": {
        " ": {
            " ": {
                " ": {
                    "is_end": True
                },
                "is_end": False
            },
            " ": {
                " ": {
                    " ": {
                        "is_end": True
                    },
                    "is_end": False
                },
                "is_end": False
            },
            "is_end": False
        },
        "is_end": False
    },
    " ": {
        " ": {
            " ": {
                " ": {
                    "is_end": True
                },
                "is_end": False
            },
            "is_end": False
        },
        "is_end": False
    }
}

새 겨 진 사전 으로 트 리 구조,key 를 이벤트 로 하고is_end필드 를 통 해 상태 가 마지막 상태 인지 여 부 를 판단 합 니 다.마지막 상태 라면 상태 전환 을 중단 하고 일치 하 는 키 워드 를 가 져 옵 니 다.
【문】:키워드 에 포함 관계 가 존재 한다 면,예 를 들 어"일치 키워드"와"일치"는 어떻게 처리 해 야 합 니까?
【답】:우 리 는is_end필드 로 키워드 의 끝 을 표시 할 수 있 고 새로운 필드 를 추가 할 수 있 습 니 다.예 를 들 어is_continue는 계속 일치 할 수 있 음 을 표시 합 니 다.이 밖 에 도is_end필드 를 제외 한 다른 필드 가 있 는 지 찾 아 계속 일치 하 는 지 여 부 를 판단 할 수 있다.예 를 들 어 아래 코드 의'배합'은is_end필드 외 에'닫 기'가 있 기 때문에 계속 일치 해 야 합 니 다.


state_event_dict = {
    " ": {
        " ": {
            " ": {
                " ": {
                    " ": {
                        "is_end": True
                    },
                    "is_end": False
                },
                "is_end": False
            },
            "is_end": True
        },
        "is_end": False
    }
}

다음 에 우 리 는 이 알고리즘 을 실현 할 것 이다.
3.알고리즘 실현
파 이 썬 3.6 버 전 을 사용 하여 이 루어 집 니 다.물론 파 이 썬 3.X 는 모두 실 행 될 수 있 습 니 다.
3.1 저장 구조 구축


def _generate_state_event_dict(keyword_list: list) -> dict:
    state_event_dict = {}

    #         
    for keyword in keyword_list:
        current_dict = state_event_dict
        length = len(keyword)

        for index, char in enumerate(keyword):
            if char not in current_dict:
                next_dict = {"is_end": False}
                current_dict[char] = next_dict
                current_dict = next_dict
            else:
                next_dict = current_dict[char]
                current_dict = next_dict

            if index == length - 1:
                current_dict["is_end"] = True

    return state_event_dict

상기 코드 에 대해 아직도 최적화 할 수 있 는 부분 이 적지 않다.예 를 들 어 먼저 키워드 목록 을 사전 순서에 따라 정렬 하면 같은 접 두 사 를 가 진 키 워드 를 한 곳 에 집중 시 켜 저장 구 조 를 구축 할 때 옮 겨 다 니 는 횟수 를 줄 일 수 있다.
3.2 일치 키워드


def match(state_event_dict: dict, content: str):
    match_list = []
    state_list = []
    temp_match_list = []

    for char_pos, char in enumerate(content):
        #           
        if char in state_event_dict:
            state_list.append(state_event_dict)
            temp_match_list.append({
                "start": char_pos,
                "match": ""
            })

        #             ，         
        for index, state in enumerate(state_list):
            if char in state:
                state_list[index] = state[char]
                temp_match_list[index]["match"] += char

                #           ，         
                if state[char]["is_end"]:
                    match_list.append(copy.deepcopy(temp_match_list[index]))

                    #       ，       
                    if len(state[char].keys()) == 1:
                        state_list.pop(index)
                        temp_match_list.pop(index)
            #            ，     
            else:
                state_list.pop(index)
                temp_match_list.pop(index)

    return match_list

3.3 전체 코드


import re
import copy


class DFA:

    def __init__(self, keyword_list: list):
        self.state_event_dict = self._generate_state_event_dict(keyword_list)

    def match(self, content: str):
        match_list = []
        state_list = []
        temp_match_list = []

        for char_pos, char in enumerate(content):
            if char in self.state_event_dict:
                state_list.append(self.state_event_dict)
                temp_match_list.append({
                    "start": char_pos,
                    "match": ""
                })

            for index, state in enumerate(state_list):
                if char in state:
                    state_list[index] = state[char]
                    temp_match_list[index]["match"] += char

                    if state[char]["is_end"]:
                        match_list.append(copy.deepcopy(temp_match_list[index]))

                        if len(state[char].keys()) == 1:
                            state_list.pop(index)
                            temp_match_list.pop(index)
                else:
                    state_list.pop(index)
                    temp_match_list.pop(index)

        return match_list

    @staticmethod
    def _generate_state_event_dict(keyword_list: list) -> dict:
        state_event_dict = {}

        for keyword in keyword_list:
            current_dict = state_event_dict
            length = len(keyword)

            for index, char in enumerate(keyword):
                if char not in current_dict:
                    next_dict = {"is_end": False}
                    current_dict[char] = next_dict
                    current_dict = next_dict
                else:
                    next_dict = current_dict[char]
                    current_dict = next_dict

                if index == length - 1:
                    current_dict["is_end"] = True

        return state_event_dict


if __name__ == "__main__":
    dfa = DFA(["     ", "    ", "    ", "  "])
    print(dfa.match("      DFA        ，    "))

출력:
[
{
'start': 0,
'match':'정보 추출'
}, {
'start': 12,
'match':'일치'
}, {
'start': 12,
'match':'키워드 일치'
}, {
'start': 18,
'match':'일치'
}, {
'start': 18,
'match':'일치 알고리즘'
}
]
다른 용법
4.1,마스크 추가
민감 한 단 어 를 식별 할 때 흔히 같은 유형의 문형 을 만 날 수 있다.예 를 들 어'이 바보 X'는 그 중에서 X 가 많 을 수 있다.우리 가 관건 적 인 단어 표 에 하나씩 추가 해 야 하 는 것 일 까?정규 표현 식 과 유사 한 방법 으로 식별 하 는 것 이 가장 좋다.간단 한 방법 은'*'로 모든 내용 과 일치 합 니 다.
어댑터 를 추가 하려 면 일치 하 는 키워드 과정 만 수정 해 야 합 니 다:


def match(self, content: str):
    match_list = []
    state_list = []
    temp_match_list = []

    for char_pos, char in enumerate(content):
        if char in self.state_event_dict:
            state_list.append(self.state_event_dict)
            temp_match_list.append({
                "start": char_pos,
                "match": ""
            })

        for index, state in enumerate(state_list):
            is_find = False
            state_char = None

            #     *        
            if "*" in state:
                state_list[index] = state["*"]
                state_char = state["*"]
                is_find = True

            if char in state:
                state_list[index] = state[char]
                state_char = state[char]
                is_find = True

            if is_find:
                temp_match_list[index]["match"] += char

                if state_char["is_end"]:
                    match_list.append(copy.deepcopy(temp_match_list[index]))

                    if len(state_char.keys()) == 1:
                        state_list.pop(index)
                        temp_match_list.pop(index)
            else:
                state_list.pop(index)
                temp_match_list.pop(index)

    return match_list

main()함수.


if __name__ == "__main__":
    dfa = DFA(["     ", "    ", "  * ", "  "])
    print(dfa.match("      DFA        ，    ，    "))

출력:
[
{
'start': 0,
'match':'정보 추출'
}, {
'start': 12,
'match':'일치'
}, {
'start': 12,
'match':'키워드 일치'
}, {
'start': 18,
'match':'일치'
}, {
'start': 18,
'match':'일치 알고리즘'
}, {
'start': 23,
'match':'정보 캡 처'
}
]
이상 은 Python 이 DFA 알고리즘 을 실현 하 는 상세 한 내용 을 분석 하 는 것 입 니 다.Python DFA 알고리즘 에 관 한 자 료 는 우리 의 다른 관련 글 을 주목 하 세 요!

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

Python의 None과 NULL의 차이점 상세 정보

그래서 대상 = 속성 + 방법 (사실 방법도 하나의 속성, 데이터 속성과 구별되는 호출 가능한 속성 같은 속성과 방법을 가진 대상을 클래스, 즉 Classl로 분류할 수 있다.클래스는 하나의 청사진과 같아서 하나의 ...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

파 이 썬,바 이 두 AI 기반 이모 티 콘 캡 처

Python max 함수 에서 key 의 용법 및 원리 해석

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다