Pseudo-Randomness 및 Regex 힌트

14045 단어 machinelearning python datascience

우리는 파이썬 단기집중 과정이 거의 끝나가고 있으며 나머지 Data Science from Scratch (by Joel Grus) 동안 재미있는 내용을 다룰 것입니다. 이 게시물에서 우리는 의사 난수와 정규 표현식에 대해 간략하게 알아볼 것입니다.
random 모듈은 데이터 과학에서 광범위하게 사용됩니다. 특히 난수를 생성해야 하고 다음에 모델을 실행할 때 재현 가능한 결과를 원할 때(Python에서는 random.seed(x) , R에서는 set.seed(x) ) x는 우리가 결정한 정수입니다(우리는 일관성만 있으면 됩니다. 모델을 다시 방문할 때).

기술적으로 모듈은 결정론적 결과를 생성하므로 의사 난수입니다. 다음은 임의성이 결정론적임을 강조하는 예입니다.

import random
random.seed(10) # say we use 10

# this variable is from the book
four_randoms = [random.random() for _ in range(4)]

# call four_randoms - same result from Data Science from Scratch
# because the book also uses random.seed(10)
[0.5714025946899135,
 0.4288890546751146,
 0.5780913011344704,
 0.20609823213950174]

# if we use x instead of underscore
# a different set of four "random" numbers is generated
another_four_randoms = [random.random() for x in range(4)]

[0.81332125135732, 
 0.8235888725334455, 
 0.6534725339011758, 
 0.16022955651881965]

_에 대한 간략한 우회

다른 소스에서 읽어보면 밑줄 "_"이 변수에 관심이 없고 사용할 계획이 없을 때 for 루프에서 사용된다는 것을 알 수 있습니다. 예를 들면 다음과 같습니다.

# prints 'hello' five times
for _ in range(5):
    print("hello")

# we could use x as well
for x in range(5):
    print("hello")

위의 예에서 _ 또는 x를 사용할 수 있었고 큰 차이가 없어 보입니다. 우리는 기술적으로 _ 를 호출할 수 있지만 이는 나쁜 습관으로 간주됩니다.

# bad practice, but prints 0, 1, 2, 3, 4
for _ in range(5):
    print(_)

그럼에도 불구하고 _는 다른 결과를 산출하기 때문에 의사 난수의 맥락에서 중요합니다.

import random
random.seed(10)

# these two yield different results, even with the same random.seed(10)
four_randoms = [random.random() for _ in range(4)]
another_four_randoms = [random.random() for x in range(4)]

그러나 결정론 또는 유사 난수성으로 돌아가서 다음과 같이 변경해야 합니다.

# new random.seed()
random.seed(11)

# reset four_randoms
four_randoms = [random.random() for _ in range(4)]
[0.4523795535098186, 
0.559772386080496, 
0.9242105840237294, 
0.4656500700997733]

# change to previous random.seed()
random.seed(10)

# reset four_randoms (again)
four_randoms = [random.random() for _ in range(4)]

# get previous result (see above)
[0.5714025946899135,
 0.4288890546751146,
 0.5780913011344704,
 0.20609823213950174]

random.seed(11) 모듈의 다른 기능에는 random.seed(10) , random , random.randrange 및 random.shuffle가 있습니다.

random.randrange(3,6) # choose randomly between [3,4,5]

# random shuffle
one_to_ten = [1,2,3,4,5,6,7,8,9,10]
random.shuffle(one_to_ten)
print(one_to_ten)  # example: [8, 7, 9, 3, 5, 2, 10, 1, 6, 4]
random.shuffle(one_to_ten) # again
print(one_to_ten)  # example: [3, 10, 8, 6, 9, 2, 7, 1, 4, 5]

# random choice
list_of_people = (["Bush", "Clinton", "Obama", "Biden", "Trump"])
random.choice(list_of_people) # first time, 'Clinton'
random.choice(list_of_people) # second time, 'Biden'

# random sample
lottery_numbers = range(60) # get a range of 60 numbers
winning_numbers = random.sample(lottery_numbers, 6) # get a random sample of 6 numbers
winning_numbers # example: [39, 24, 2, 37, 0, 15]

# because its pseudorandom, if you want a different set of 6 numbers
# reset the winning_numbers
winning_numbers = random.sample(lottery_numbers, 6)
winning_numbers # a different set of numbers [8, 12, 19, 34, 23, 49]

정규식

전체 책은 random.choice에 대해 쓸 수 있으므로 저자는 유용할 수 있는 몇 가지 기능, random.sample, regular expressions, re.match 및 re.search를 간략하게 강조합니다.

import re

re_examples = [
    not re.match("a", "cat"),                   # re.match check the word cat 'starts' letter 'a'
    re.search("a", "cat"),                      # re.search check if word cat 'contains' letter 'a'
    not re.search("c", "dog"),                  # 'dog' does not contain 'c'
    3 == len(re.split("[ab]", "carbs")),        # 3 equals length of "carbs" once you split out [ab]
    "R-D-" == re.sub("[0-9]", "-", "R2D2")      # sub out numbers in 'R2D2' with hyphen "-"
    ]

# test that all examples are true
assert all(re_examples), "all the regex examples should be True"

마지막 줄은 정규식 예제에 적용된 테스트( re.split )와 진실성( re.sub )에 대한 이해를 검토합니다.

다음 주제에 관심이 있을 수 있습니다.

Reference

이 문제에 관하여(Pseudo-Randomness 및 Regex 힌트), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/paulapivat/pseudo-randomness-and-a-hint-of-regex-348c

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

Como colocar um projeto no Github.

루비 모듈

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다