re. sub () 용법 에 대한 상세 한 소개

29773 단어 데이터 전처리 python 데이터 분석

1. 머리말

2. 함수 원형

3. 사용 사례

1. 하나의 숫자 나 자모

와 일치 합 니 다.

2. 여러 개의 숫자 나 자모

와 일치 합 니 다.

3. 기타

와 일치

4. 감사

머리말
문자열 데이터 처리 과정 에서 정규 표현 식 은 우리 가 자주 사용 하 는 것 이 고 python 에 서 는 re 모듈 을 사용 합 니 다.다음은 실제 사례 를 통 해 re. sub () 의 상세 한 용법 을 소개 합 니 다. 이 함 수 는 문자열 의 일치 하 는 항목 을 바 꾸 는 데 사 용 됩 니 다.
함수 원형
먼저 소스 코드 에서 이 함수 의 원형 을 살 펴 보 세 요. 각 매개 변수 와 의 미 를 포함 합 니 다.

	def sub(pattern, repl, string, count=0, flags=0):
	    """Return the string obtained by replacing the leftmost
	    non-overlapping occurrences of the pattern in string by the
	    replacement repl.  repl can be either a string or a callable;
	    if a string, backslash escapes in it are processed.  If it is
	    a callable, it's passed the match object and must return
	    a replacement string to be used."""
	    return _compile(pattern, flags).sub(repl, string, count)

위의 코드 에서 re. sub () 방법 에 5 개의 매개 변 수 를 포함 하고 있 음 을 볼 수 있 습 니 다. 아래 에 일일이 설명 하 겠 습 니 다. (1) pattern: 이 매개 변 수 는 정규 중의 패턴 문자열 을 표시 합 니 다.(2) repl: 이 매개 변 수 는 바 꿀 문자열 (pattern 에 일치 하면 repl 로 바 꿀 수 있 음) 을 표시 하고 함수 일 수도 있 습 니 다.(3) string: 이 매개 변 수 는 처 리 될 원본 문자열 을 표시 합 니 다.(4) count: 선택 할 수 있 는 매개 변 수 는 교체 할 최대 횟수 이 고 비 마이너스 정수 여야 합 니 다. 이 매개 변 수 는 기본적으로 0 입 니 다. 즉, 모든 일치 가 교 체 됩 니 다.(5) flags: 컴 파일 할 때 사용 하 는 일치 하 는 모드 (예 를 들 어 대소 문자, 다 중 줄 모드 무시 등), 숫자 형식 을 선택 할 수 있 습 니 다. 기본 값 은 0 입 니 다.
3. 사용 사례
다음은 하나의 문자열 (대소 문자 영어, 숫자, 중 영문 구두점, 특수 기호 등 포함) 을 예 로 들 어 사례 설명 을 할 것 입 니 다. 이 문자열 은 다음 과 같 습 니 다.


	>>> s = "   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"

1. 단일 숫자 나 알파벳 일치
(1) 단일 숫자 만 일치


	>>> import re
	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[0-9]', '*', s)
	"   ，         。I 'm so glad to introduce myself, and I’m ** years old.   Today is ****/**/**. It is a wonderful DAY! @HHHHello,,,#***ComeHere***...**？AA？zz？——http://welcome.cn"

위 re.sub(r'[0-9]', '*', s) 이 말 은 단일 숫자 만 일치 하고 모든 숫자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.
(2) 단일 자모 만 일치


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[a-z]', '*', s)
	"   ，         。I '* ** **** ** ********* ******, *** I’* 18 ***** ***.   T**** ** 2020/01/01. I* ** * ********* DAY! @HHHH****,,,#111C***H***222...66？AA？**？——****://*******.**"
	>>> re.sub(r'[A-Z]', '*', s)
	"   ，         。* 'm so glad to introduce myself, and *’m 18 years old.   *oday is 2020/01/01. *t is a wonderful ***! @****ello,,,#111*ome*ere222...66？**？zz？——http://welcome.cn"
	>>> re.sub(r'[A-Za-z]', '*', s)
	"   ，         。* '* ** **** ** ********* ******, *** *’* 18 ***** ***.   ***** ** 2020/01/01. ** ** * ********* ***! @********,,,#111********222...66？**？**？——****://*******.**"

위 re.sub(r'[a-z]', '*', s) 이 말 은 단일 소문 자 만 일치 하고 모든 소문 자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[A-Z]', '*', s) 이 말 은 단일 대문자 만 일치 하고 모든 대문자 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[A-Za-z]', '*', s) 이 말 은 단일 자모 만 일치 하고 모든 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.
(3) 단일 숫자 와 알파벳 일치


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[0-9A-Z]', '*', s)
	"   ，         。* 'm so glad to introduce myself, and *’m ** years old.   *oday is ****/**/**. *t is a wonderful ***! @****ello,,,#****ome*ere***...**？**？zz？——http://welcome.cn"
	>>> re.sub(r'[0-9a-z]', '*', s)
	"   ，         。I '* ** **** ** ********* ******, *** I’* ** ***** ***.   T**** ** ****/**/**. I* ** * ********* DAY! @HHHH****,,,#***C***H******...**？AA？**？——****://*******.**"
	>>> re.sub(r'[0-9A-Za-z]', '*', s)
	"   ，         。* '* ** **** ** ********* ******, *** *’* ** ***** ***.   ***** ** ****/**/**. ** ** * ********* ***! @********,,,#**************...**？**？**？——****://*******.**"

위 re.sub(r'[0-9A-Z]', '*', s) 이 말 은 단일 숫자 와 대문자 만 일치 하고 모든 숫자 와 대문자 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[0-9a-z]', '*', s) 이 말 은 단일 숫자 와 소문 자 만 일치 하고 모든 숫자 와 소문 자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[0-9A-Za-z]', '*', s) 이 말 은 단일 숫자 와 자모 만 일치 하고 모든 숫자 와 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.
2. 여러 숫자 나 알파벳 일치 ：。
(1) 여러 숫자 일치


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[0-9]+', '*', s)
	"   ，         。I 'm so glad to introduce myself, and I’m * years old.   Today is */*/*. It is a wonderful DAY! @HHHHello,,,#*ComeHere*...*？AA？zz？——http://welcome.cn"

위 re.sub(r'[0-9]+', '*', s) 이 말 은 여러 개의 연속 적 인 숫자 와 일치 하고 여러 개의 연속 적 인 숫자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.
(2) 여러 글자 일치


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[a-z]+', '*', s)
	"   ，         。I '* * * * * *, * I’* 18 * *.   T* * 2020/01/01. I* * * * DAY! @HHHH*,,,#111C*H*222...66？AA？*？——*://*.*"
	>>> re.sub(r'[A-Z]+', '*', s)
	"   ，         。* 'm so glad to introduce myself, and *’m 18 years old.   *oday is 2020/01/01. *t is a wonderful *! @*ello,,,#111*ome*ere222...66？*？zz？——http://welcome.cn"
	>>> re.sub(r'[a-zA-Z]+', '*', s)
	"   ，         。* '* * * * * *, * *’* 18 * *.   * * 2020/01/01. * * * * *! @*,,,#111*222...66？*？*？——*://*.*"

위 re.sub(r'[a-z]+', '*', s) 이 말 은 여러 개의 연속 적 인 소문 자 와 일치 하고 여러 개의 연속 적 인 소문 자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[A-Z]+', '*', s) 이 말 은 여러 개의 연속 적 인 대문자 와 일치 하고 여러 개의 연속 적 인 대문자 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[A-Za-z]+', '*', s) 이 말 은 여러 개의 연속 적 인 자모 와 일치 하고 여러 개의 연속 적 인 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.
(3) 여러 숫자 와 알파벳 이 일치 합 니 다.


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[0-9a-zA-Z]+', '*', s)
	"   ，         。* '* * * * * *, * *’* * * *.   * * */*/*. * * * * *! @*,,,#*...*？*？*？——*://*.*"

위 re.sub(r'[0-9A-Za-z]+', '*', s) 이 말 은 여러 개의 연속 적 인 숫자 와 자모 가 일치 하고 여러 개의 연속 적 인 숫자, 연속 적 인 자모, 연속 적 인 숫자 와 자 모 를 하나의 별표 로 대체 하 는 것 을 나타 낸다.
3. 다른 것 과 일치
(1) 비 숫자 일치


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[^0-9]', '*', s)
	'********************************************************18***********************2020*01*01**************************************111********222***66**************************'
	>>> re.sub(r'[^0-9]+', '*', s)
	'*18*2020*01*01*111*222*66*'

위 re.sub(r'[^0-9]', '*', s) 이 말 은 하나의 비 숫자 와 일치 하고 하나의 비 숫자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[^0-9]+', '*', s) 이 말 은 여러 연속 의 비 숫자 와 일치 하고 여러 연속 의 비 숫자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.
(2) 비 자모 일치


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[^a-z]', '*', s)
	'*****************m*so*glad*to*introduce*myself**and***m****years*old*****oday*is**************t*is*a*wonderful***********ello********ome*ere************zz***http***welcome*cn'
	>>> re.sub(r'[^A-Z]', '*', s)
	'**************I*************************************I*******************T********************I*****************DAY***HHHH***********C***H************AA***********************'
	>>> re.sub(r'[^A-Za-z]', '*', s)
	'**************I**m*so*glad*to*introduce*myself**and*I*m****years*old****Today*is*************It*is*a*wonderful*DAY***HHHHello*******ComeHere*********AA*zz***http***welcome*cn'
	>>> re.sub(r'[^a-z]+', '*', s)
	'*m*so*glad*to*introduce*myself*and*m*years*old*oday*is*t*is*a*wonderful*ello*ome*ere*zz*http*welcome*cn'
	>>> re.sub(r'[^A-Z]+', '*', s)
	'*I*I*T*I*DAY*HHHH*C*H*AA*'
	>>> re.sub(r'[^A-Za-z]+', '*', s)
	'*I*m*so*glad*to*introduce*myself*and*I*m*years*old*Today*is*It*is*a*wonderful*DAY*HHHHello*ComeHere*AA*zz*http*welcome*cn'

위 re.sub(r'[^a-z]', '*', s) 이 말 은 하나의 비 소문 자 와 일치 하고 하나의 비 소문 자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[^A-Z]', '*', s) 이 말 은 하나의 비 대문자 와 일치 하고 하나의 비 대문자 자 모 를 하나의 별표 로 대체 하 는 것 을 나타 낸다.위 re.sub(r'[^A-Za-z]', '*', s) 이 말 은 하나의 비 자모 와 일치 하고 하나의 비 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[^a-z]+', '*', s) 이 말 은 여러 개의 연속 적 인 비 소문 자 와 일치 하고 여러 개의 연속 적 인 비 소문 자 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[^A-Z]+', '*', s) 이 말 은 여러 개의 연속 적 인 비 대문자 와 일치 하고 여러 개의 연속 적 인 비 대문자 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[^A-Za-z]+', '*', s) 이 말 은 여러 개의 연속 적 인 비 자모 와 일치 하고 여러 개의 연속 적 인 비 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.
(3) 비 숫자 와 비 자모 가 일치 합 니 다.


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[^0-9A-Za-z]', '*', s)
	'**************I**m*so*glad*to*introduce*myself**and*I*m*18*years*old****Today*is*2020*01*01**It*is*a*wonderful*DAY***HHHHello****111ComeHere222***66*AA*zz***http***welcome*cn'
	>>> re.sub(r'[^0-9A-Za-z]+', '*', s)
	'*I*m*so*glad*to*introduce*myself*and*I*m*18*years*old*Today*is*2020*01*01*It*is*a*wonderful*DAY*HHHHello*111ComeHere222*66*AA*zz*http*welcome*cn'

위 re.sub(r'[^0-9A-Za-z]', '*', s) 이 말 은 하나의 비 숫자 와 비 자모 가 일치 하고 하나의 비 숫자 와 비 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.위 re.sub(r'[^0-9A-Za-z]+', '*', s) 이 말 은 여러 개의 연속 적 인 비 숫자 와 비 자모 가 일치 하고 여러 개의 연속 적 인 비 숫자 와 비 자 모 를 하나의 별표 로 바 꾸 는 것 을 나타 낸다.
(4) 일치 고정 형식
a. 알파벳 과 빈 칸 만 유지 하고 repl 을 빈 문자 로 설정 하면 됩 니 다.

	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[^a-z ]', '', s)
	' m so glad to introduce myself and m  years old   oday is  t is a wonderful  elloomeerezzhttpwelcomecn'
	>>> re.sub(r'[^a-z ]+', '', s)
	' m so glad to introduce myself and m  years old   oday is  t is a wonderful  elloomeerezzhttpwelcomecn'
	>>> re.sub(r'[^A-Za-z ]', '', s)
	'I m so glad to introduce myself and Im  years old   Today is  It is a wonderful DAY HHHHelloComeHereAAzzhttpwelcomecn'
	>>> re.sub(r'[^A-Za-z ]+', '', s)
	'I m so glad to introduce myself and Im  years old   Today is  It is a wonderful DAY HHHHelloComeHereAAzzhttpwelcomecn'

문장의 의미 와 구 조 를 완전 하 게 하려 면 나머지 문 자 를 빈 칸 으로 바 꾸 고 (즉, repl 을 빈 칸 으로 설정) 나머지 빈 칸 을 제거 해 야 합 니 다. 다음 과 같 습 니 다.


	>>> s1 = re.sub(r'[^A-Za-z ]+', ' ', s)
	>>> s1
	' I  m so glad to introduce myself  and I m   years old    Today is   It is a wonderful DAY   HHHHello ComeHere AA zz http welcome cn'
	>>> re.sub(r'[ ]+', ' ', s1)
	' I m so glad to introduce myself and I m years old Today is It is a wonderful DAY HHHHello ComeHere AA zz http welcome cn'

b. @ 으로 시작 하 는 영어 단어 제거


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'@[A-Za-z]+', '', s)
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! ,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"

c. 제거 로?끝 에 있 는 영어 단어 와 숫자 (이것 은 중국어 물음표 입 니 다)


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'[A-Za-z]+？', '', s)
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？——http://welcome.cn"
	>>> re.sub(r'[0-9A-Za-z]+？', '', s)
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...——http://welcome.cn"

d. 원본 문자열 의 URL 제거


	>>> s
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——http://welcome.cn"
	>>> re.sub(r'http[:.]+\S+', '', s)
	"   ，         。I 'm so glad to introduce myself, and I’m 18 years old.   Today is 2020/01/01. It is a wonderful DAY! @HHHHello,,,#111ComeHere222...66？AA？zz？——"

감사 하 다
이상 은 실제 사례 를 통 해 re. sub () 용법 에 대한 상세 한 소개 입 니 다. 읽 어 주 셔 서 감사합니다. 필자 가 잘 썼 다 고 생각 되면 좋아요 눌 러 주세요 ~ 물론 문제 가 있 으 면 아래 에 댓 글 을 남 겨 주세요 ~

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

데이터 분석용 전처리 템플릿(Python)

자신이 자주 사용하는 데이터의 전처리를 이하에 템플릿 형식으로 정리한다. 설명은별로 템플릿이 아닙니다. CSV 형식 로드 read_data.py 트레이닝 데이터와 테스트 데이터 일제히 데이터의 전처리를 하는 경우에 ...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

python 날짜 인식

주민등록번호 입력 검증 js 코드 지원 입력 "x" 대소 문자 현재 일부 주민등록번호 호 환 되 지 않 는 문제 해결

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다