0. 개요

주말엔 쉬고 싶지만..😂 아직 부족한 나에겐 그럴 여유는 존재하지 않는다. 시작해보자😎

1. 학습 내용

Python

4-1강: Python Object Oriented Programming

객체지향 프로그래밍 개요

Object-Oriented Programming, OOP
객체: 실생활에서 일종의 물건
- 속성(Attribute)과 행동(Action)을 가진다.
OOP는 이러한 객체 개념을 프로그램으로 표현
- 속성은 변수(variable), 행동은 함수(method)로 표현된다.
- 파이썬 역시 객체 지향 프로그램 언어
OOP는 설계도에 해당하는 클래스(class)와 실제 구현체인 인스턴스(instance)로 나눔

Attribute 추가하기

Attiribute 추가는 __init__, self와 함께
__init__은 객체 초기화 예약 함수

class SoccerPlayer(object):
    def __init__(self, name, position, back_number):
        self.name = name
        self.position = position
        slef.back_number = back_number

파이썬에서 __ 의미

__는 특수한 예약 함수나 변수 그리고 함수명 변경(맨글링)으로 사용

예) __main__, __add__, __str__, __eq__

class SoccerPlayer(object):
    def __str__(self):
        return "Hello, My name is %s. I play in %s in center " % \
        (self.name, self.position)
jinhyun = SoccerPlayer("Jinhyun", "MF", 10)
print(jinhyun)

method 구현하기

method(Action) 추가는 기존 함수와 같으나, 반드시 self를 추가해야만 class 함수로 인정됨

class SoccerPlayer(object):
    def change_back_number(self, new_number):
        print("선수의 등번호를 변경합니다 :
            From %d to %d % \
            (self.back_number, new_number))
        self.back_number = new_number

객체 지향 언어의 특징

실제 세상을 모델링

상속 (Inheritance)

부모클래스로 부터 속성과 Method를 물려받은 자식 클래스를 생성 하는 것

class Person(object):
    def __init__(self, name, age):
        self.name = name
        self.age = age
class Korean(Person):
    pass
first_korean = korean("Sungchul", 35)
print(first_korean.name)

class Person(object): # 부모 클래스 Person 선언
    def __init__(self, name, age):
        self.name = name
        self.age = age
        
    def about_me(self): # Method 선언
        print("저의 이름은 ", self.name, "이구요, 제 나이는 ", str(self.age), "살입니다.")
 
class Employee(Person): # 부모 클래스 Person으로 부터 상속
    def __init__(self, name, age, gendef, salary, hire_date):
        super().__init__(name, age, gender)  # 부모객체 사용
        self.salary = salary
        self.hire_date = hire_date # 속성값 추가
    def do_work(self):  # 새로운 메서드 추가
        print("열심히 일을 합니다.")
    def about_me(self): # 부모 클래스 함수 재정의
        super().about_me()  # 부모 클래스 함수 사용
        print("제 급여는 ", self.salary, "원 이구요, 제 입사일은 ", self.hire_date," 입니다.")

다형성 (Polymorphism)

같은 이름 메소드의 내부 로직을 다르게 작성

Dynamic Typing 특성으로 인해 파이썬에서는 같은 부모클래스의 상속에서 주로 발생함

class Animal:
    def __init__(self, name): # 클래스 정의
		self.name = name
    def talk(self):
    	raise NotImplementedError("Subclass must implement abstract method")

class Cat(Animal):
    def talk(self):
    	return "Meow!'

class Dog(Animal):
    def talk(self):
        return 'Woof! Woof!'

가시성 (Visibility)

객체의 정보를 볼 수 있는 레벨을 조절하는 것

누구나 객체 안에 모든 변수를 볼 필요가 없음

객체를 사용하는 사용자가 임의로 정보 수정

필요 없는 정보에는 접근 할 필요가 없음

소스의 보호

class Product(object):
    pass
class Inventory(object):
    def __init__(self):
        self.__items = [] # Private 변수로 선언, 타객체가 접근 못함

class Inventory(object):
    def __init__(self):
        self.__items = [] # Private 변수로 선언, 타객체가 접근 못함
    @property # property decorator: 숨겨진 변수를 반환하게 해줌
    def items(self):
        return self.__items

4-2강: Module and Project

생략

5-1강: Exception / File / Log Handling

파이썬의 예외 처리

try ~ except 문법

try:
    예외 발생 가능 코드
except <Exception Type>:
    예외 발생시 대응하는 코드

0으로 숫자를 나눌 때 예외처리 하기

for i in range(10):
    try:
        print(10/i)
    except ZeroDivisionError:
    	print("Not divided by 0")

Built-in Exception: 기본적으로 제공하는 예외

Exception 이름	내용
IndexError	List의 Index 범위를 넘어갈 때
NameError	존재하지 않은 변수를 호출 할 때
ZeroDivisionError	0으로 숫자를 나눌 때
ValueError	변환할 수 없는 문자/숫자를 변환할 때
FileNotFoundError	존재하지 않는 파일을 호출할 때

예외 정보 표시하기

for i in range(10):
    try:
        print(10/i)
    except ZeroDivisionError as e:
    	print(e)
    	print("Not divided by 0")

try ~ except ~ else

try:
    예외 발생 가능 코드
except <Exception Type>:
    예외 발생시 동작하는 코드
else:
    예외가 발생하지 않을 때 동작하는 코드

for i in range(10):
    try:
        result = 10 / i
    except ZeroDivisionError:
        print("Not divided by 0")
    else:
        print(10 / i)

try ~except ~finally

try:
    예외 발생 가능 코드
except <Exception Type>:
    예외 발생시 동작하는 코드
finally
    예외 발생 여부와 상관없이 실행됨

for i in range(10):
    try:
        result = 10 / i
    except ZeroDivisionError:
        print("Not divided by 0")
    finally:
        print("종료되었습니다.")

raise 구문
- 필요에 따라 강제로 Exception을 발생

while True:
    value = input("변환할 정수 값을 입력해주세요")
    for digit in value:
        if digit not in "0123456789":
            raise ValueError("숫자값을 입력하지 않으셨습니다")
    print("정수값으로 변환된 숫자 -", int(value))

assert 구문
- 특정 조건에 만족하지 않을 경우 예외 발생

def get_binary_number(decimal_number):
    assert isinstance(decimal_number, int)
    return bin(decimal_number)

print(get_binary_number(10))

File Handling

파일의 종류

기본적인 파일 종류로 text 파일과 binary 파일로 나눔

컴퓨터는 text 파일을 처리하기 위해 binary 파일로 변환시킴 (예: pyc 파일)

모든 text 파일도 실제는 binary 파일, ASCII/Unicode 문자열 집합으로 저장되어 사람이 읽을 수 있음

Binary 파일	Text 파일
컴퓨터만 이해할 수 있는 형태인 이진(법)형식으로 저장된 파일	인간도 이해할 수 있는 형태인 문자열 형식으로 저장된 파일
일반적으로 메모장으로 열면 내용이 깨져 보임	메모장으로 열면 내용 확인 가능
엑셀파일, 워드 파일 등등	메모장에 저장된 파일, HTML 파일, 파이썬 코드 파일 등

Python File I/O

파이썬은 파일 처리를 위해 "open"키워드를 사용함

f = open("<파일이름>", "접근 모드")
f.close()

with open("i_have_a_dream.txt", "r") as f:
    contents = f.read()
# with 사용시 따로 close안해줘도 된다.

파일열기모드	설명
r	읽기모드 - 파일을 읽기만 할 때 사용
w	쓰기모드 - 파일에 내용을 쓸 때 사용
a	추가모드 - 파일의 마지막에 새로운 내용을 추가 시킬 때 사용

파이썬의 directory 다루기

os 모듈을 사용하여 Directory 다루기

import os
os.mkdir("log")

디렉토리가 있는지 확인하기

if not os.path.isdir("log"):
    os.mkdir("log")

파일 복사하기

import shutil

source = "i_have_a_dream.txt"
dest = os.path.join("abc", "yg.txt")
shutil.copy(source, dext

최근에는 pathlib 모듈을 사용하여 path를 객체로 다룸

Pickle

파이썬의 객체를 영속화하는 built-in 객체
데이터, object 등 실행중 정보를 저장 -> 불러와서 사용
저장해야하는 정보, 계산 결과(모델) 등 활용이 많음

import pickle
f = open("list.pickle", "wb")
test = [1,2,3,4,5]
pickle.dump(test, f)
f.close()

del test

f = open("list.pikle", "rb")
test_pickle = pickle.load(f)
test_pickle
f.close()

로그 남기기 - Logging

특징

프로그램이 실행되는 동안 일어나는 정보를 기록을 남기기

유저의 접근, 프로그램의 Exception, 특정 ㅎ마수의 사용

Console 화면에 출력, 파일에 남기기, DB에 남기기 등등

기록된 로그를 분석하여 의미있는 결과를 도출 할 수 있음

실행시점에서 남겨야 하는 기록, 개발시점에서 남겨야하는 기록

print vs logging

기록을 print로 남기는 것도 가능하지만 Console 창에만 남기는 기록은 분석시 사용불가

때로는 레벨별, 모듈별로 별도의 logging을 남길 필요가 있음

Python의 기본 Log 관리 모듈

import logging

logging.debug("틀렸잖아!)
logging.info("확인해")
logging.warning("조심해!")
logging.error("에러났어!!!")
logging.critical("망했다...")

Level	개요	예시
debug	개발시 처리 기록을 남겨야하는 로그 정보를 남김	- 다음 함수로 A를 호출함 - 변수 A를 무엇으로 변경함
info	처리가 진행되는 동안의 정보를 알림	- 서버가 시작되었음 - 서버가 종료됨 - 사용자 A가 프로그램에 접속함
warning	사용자가 잘못 입력한 정보나 처리는 가능하나 원래 개발시 의도치 않는 정보가 들어왔을 때 알림	- str입력을 기대했으나, int가 입력됨 -> str casting으로 처리함 - 함수에 argument로 이차원 리스트를 기대했으나 일차원 리스트가 들어옴 -> 이차원으로 변환 후 처리
error	잘못된 처리로 인해 에러가 났으나, 프로그램은 동작할 수 있음을 알림	- 파일에 기록을 해야하는데 파일이 없음 -> exeption 처리 후 사용자에게 알림 - 외부서비스와 연결 불가
critical	잘못된 처리로 데이터 손실이나 더이상 프로그램이 동작할 수 없음을 알림	- 잘못된 접근으로 해당 파일이 삭제됨 - 사용자의 의한 강제 종료

실제 프로그램 설정

configparser

프로그램의 실행 설정을 file에 저장함

section, key, value값의 형태로 설정된 설정 파일을 사용

설정파일을 dict type으로 호출 후 사용

import configparser

config = configparser.ConfigParser()

config.read('example.cfg')
print(config.sections())

for key in config['SectionTwo']:
    value = config['SectionsTwo'][key]
    print("{0} : {1}".format(key, value))

argparser

Console 창에서 프로그램 실행시 Setting 정보를 저장함

거의 모든 Console 기반 Pythonb 프로그램 기본으로 제공

특수 모듈도 많이 존재하지만(TF), 일반적으로 ARGPARSE를 사용

Command-Line Option 이라고 부름

import argparse

parse = argparse.ArgumentParser(
    description='Sum two integers.')
    
parser.add_argument(
    '-a', "--a_value",
    dest="a", help="A integers", type=int,
    required=True)
    
parser.add_argument(
    '-b', "--b_value",
    dest="b", help="B integers", type=int,
    required=True)   

args = parser.parse_args()
print(args)
print(args.a)
print(args.b)
print(arags.a + args.b)

5-2강: Python data handling

CSV (Comma Separate Value)

CSV, 필드를 쉼표(,)로 구분한 텍스트 파일
엑셀 양식의 데이터를 프로그램에 상관없이 쓰기 위한 데이터 형식

CSV 객체로 CSV처리

Text파일 형태로 데이터 처리시 문장 내에 들어가 있는 "," 대해 전처리 과정이 필요

파이썬에서는 간단히 CSV파일을 처리하기 위해 csv 객체를 제공함

import csv
reader = csv.reader(f,
        delemiter=',', quotechar'"',
        quoting=csv.QUOTE_ALL)

Attribute	Default	Meaning
delimiter	,	글자를 나누는 기준
lineterminator	\r\n	줄 바꿈 기준
quotechar	"	문자열을 둘러싸는 신호 문자
quoting	QUOTE_MINIMAL	데이터 나누는 기준이 quotechar에 의해 둘러싸인 레벨

Web

Web 동작 과정
1. 요청: 웹주소, Form, Header 등
2. 처리: Database 처리 등 요청 대응
3. 응답: HTML, XML 등 으로 결과 반환
4. 렌더링: HTML, XML 표시

HTML(Hyper Text Markup Language)

웹 상의 정보를 구조적으로 표현하기 위한 언어

제목, 단락, 링크 등 요소 표시를 위해 tag를 사용

모든 HTML은 트리 모양의 포함 관계를 가짐

웹을 왜 알아야 하는가?

많은 데이터들이 웹을 통해 공유됨

HTML도 일종의 프로그램, 페이지 생성 규칙이 있음

규칙을 분석하여 데이터의 추출이 가능

추출된 데이터를 바탕으로 하여 다양한 분석이 가능

정규식 (regular expression)

정규 표현식

복잡한 문자열 패턴을 정의하는 문자 표현 공식

특정한 규칙을 가진 문자열의 집합을 추출

정규표현식 참고자료
정규표현식 연습장

정규식 기본 문법

1) 문자 클래스 [ ]: [ 와 ] 사이의 문자들과 매치라는 의미
예) [abc] <- 해당 글자가 a,b,c중 하나가 있다. ("a", "before", "deep", >"dud", "sunset")

2) "-"를 사용 : 범위를 지정할 수 있음
예) [a-zA-Z] - 알파벳 전체, [0-9] - 숫자 전체

3) 메타 문자: 정규식 표현을 위해 원래 의미 X, 다른 용도로 사용되는 문자

. ^ $ * + ? { } [ ] \ | ( )

import re
import urllib.request

url = "http://www.google.com/googlebooks/uspto-patents-grants-text.html" 
#url 값 입력

html = urllib.request.urlopen(url) # url 열기
html_contents = str(html.read().decode("utf8")) 
# html 파일 읽고, 문자열로 변환

url_list = re.findall(r"(http)(.+)(zip)", html_contents)

for url in url_list:
    print("".join(url)) # 출력된 Tuple 형태 데이터 str으로 join

XML (extensible markup language)

데이터의 구조와 의미를 설명하는 TAG(MarkUp)를 사용하여 표시하는 언어

?xml version="1.0"?> 
<고양이> 
  <이름>나비</이름> 
  <품종>샴</품종> 
  <나이>6</나이> 
  <중성화>예</중성화> 
  <발톱 제거>아니요</발톱 제거>
  <등록 번호>Izz138bod</등록 번호>
  <소유자>이강주</소유자>
</고양이>

XML도 HTML과 같이 구조적 markup 언어
정규표현식으로 Parsing이 가능함
그러나 좀 더 손쉬운 도구들이 개발되어 있음
가장 많이 쓰이는 parser인 beautifulsoup으로 파싱

BeautifulSoup

HTML, XML 등 Markup 언어 Scraping을 위한 대표적인 도구

JSON (JavaScript Object Notation)

원래 웹 언어인 Java Script의 데이터 객체 표현 방식
간결성으로 기계/인간이 모두 이해하기 편함
데이터 용량이 적고, Code로의 전환이 쉬움
이로 인해 XML의 대체제로 많이 활용되고 있음
Python의 Dict Type과 유사, key:value 쌍으로 데이터 표시

import json

# read 
with open("json_example.json", "r", encoding="utf8") as f:
    contents = f.read()
    json_data = json.loads(contents)
print(type(json_data))

for employee in json_data["employee]:
    print(employee)
    
# write
dict_data = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

with open("data.json", "w") as f:
    json.dump(dict_data, f)

6강: numpy

Numpy의 특징

일반 list에 비해 빠르고, 메모리 효율적
반복문 없이 데이터 배열에 대한 처리를 지원함
선형대수와 관련된 다양한 기능을 제공함
C, C++, 포트란 등의 언어와 통합 가능

import

import numpy as np

array creation

test_array = np.array([1, 4, 5, 8], float)
print(test_array)
type(test_array[3])

numpy는 하나의 데이터 type만 배열에 넣을 수 있음
shape: numpy array의 dimension 구성을 반환함 (type: tuple)
dtype: numpy array의 데이터 type을 반환함
ndim: number of dimensions (rank)
size: data의 개수 (element의 개수)
nbytes: ndarray object의 메모리 크기를 반환함

Handling Shape

reshape: Array의 shape의 크기를 변경함, elment의 갯수는 동일
flatten: 다차원 array를 1차원 array로 변환

Indexing & Slicing

Indexiing

list와 달리 이차원 배열에서 [0,0] 표기법을 제공함

matrix일 경우 앞은 row 뒤는 column을 의미함

Slicing

list와 달리 행과 열 부분을 나눠서 slicing이 가능함

matrix의 부분 집합을 추출할 때 유용함

a = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]], int)
a[:,2:] # 전체 row의 2열 이상
a[1,1:3] # 1 row의 1열 ~ 2열
a[1:3] # 1row ~ 2row의 전체
a[:,::2] # 2칸씩 띄어가면서..

creation function

arange

array의 범위를 지정하여, 값의 list를 생성하는 명령어

import numpy as np

np.arange(30)
np.arange(0, 10, 0.5)
np.arange(30).reshape(5, 6)

ones, zeros and empty

zeros: 0으로 가득찬 ndarray 생성

ones: 1로만 가득찬 ndarray 생성

empty: shape만 주어지고 비어있는 ndarray 생성 (memory initialization이 되지 않음)

something_like

기존 ndarray의 shape 크기 만큼 1, 0 또는 empty array를 반환

test_matrix = np.arange(30).reshape(5,6)
np.ones_like(test_matrix)

identity

단위 행렬(i 행렬)을 생성함

np.identity(n=3, dtype=np.int8)

eye

대각선인 1인 행렬, k값의 시작 index의 변경이 가능

np.eye(3)
np.eye(3,5,k=2)
np.eye(N-3, M=5, dtype=np.int8)

diag

대각 행렬의 값을 추출함

matrix = np.arange(12).reshape(3, 4)
np.diag(matrix, k=2)
np.diag(matrix, k=1)

random sampling

데이터 분포에 따른 sampling으로 array를 생성

# 균등분포
np.random.uniform(0,1,10).reshape(2,5)
# 정규분포
np.random.normal(0,1,10).reshape(2,5)

operation functions

sum, mean, std

그 외에도 sqrt, exp 등 많음

axis

모든 operation function을 실행할 때 기준이 되는 dimension 축

axis = 0 (column 단위)

axis = 1 (row 단위)

concatenate

numpy array를 합치는(붙이는) 함수

# vstack
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.vstack((a, b))

# hstack
a = np.array([ [1], [2], [3]])
b = np.array([ [2], [3], [4]])
np.hstack((a, b))

# concatenate
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.concatenate((a,b), axis=0)

a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
np.concatenate(a, b.T), axis=1)

축 추가

1차원 -> 2차원으로 차원 늘릴 때
b = b[np.newaxis, :]

array operation

numpy는 array간의 기본적인 사칙 연산을 지원함

element-wise operation

array간 shape가 같을 때 일어나는 연산

+,-,*,/ 등

Dot product

Matrix간의 기본 연산

test_a = np.arange(1, 7).reshape(2, 3)
test_b = np.arange(7, 13).reshape(3, 2)
test_a.dot(test_b)

transpose

transpose 또는 T attribute 사용

test_a.transpose()
test_a.T

broadcasting

Shape이 다른 배열 간 연산을 지원하는 기능
test_matrix = np.arange(1,13).reshape(4,3)
test_vector = np.arange(10,40,10)
test_matrix+test_vector

numpy performance

timeit: jupyter 환경에서 코드의 퍼포먼스를 체크하는 함수

comparisons

All & Any

array의 데이터 전부(and) 또는 일부(or)가 조건에 만족 여부 반환

import numpy as np
a = np.arange(10)
a < 4
a < 1
a < 10
np.any(a > 5) # 하나라도 조건에 만족한다면 true (or)
np.all(a < 10) # 전부 조건에 만족한다면 true (and)

numpy는 배열의 크기가 동일할 때 element간 비교의 결과를 Boolean type으로 반환

a = np.array([1, 3, 0], float)
b = np.array([True,False,True], bool)
np.logical_and(a > 0, a < 3)
np.logical_or(b, c)
np.logical_not(b)

np.where

# true면 3, false면 2로 치환
np.where(a > 0, 3, 2)

# index 값 반환
a = np.arange(10)
np.where(a>5)

# Not a number
a = np.array([1, np.NaN, np.Inf], float)
np.isnan(a)

# isfinite
np.isfinite(a)

argmax & argmin

array내 최대값 또는 최소값의 index를 반환함

a = np.array([1,2,3,4,8,78,23,3])

# 작은 값부터 index
a.argsort()

# 큰값, 작은값 index
np.argmax(a), np.argmin(a)

# 2-dimension
a = np.array([[1,2,4,7], [9,88,6,45], [9,76,3,4]])
np.argmax(a, axis=1), np.argmin(a, axis=0)

boolean & fancy index

boolean index

특정 조건에 따른 값을 배열 형태로 추출

Comparison iperation함수들도 모두 사용가능

test_array = np.array([1, 4, 0, 2, 3, 8, 9, 7], float)
test_array > 3
test_array.shape

condition = test_array < 3
test_array[condition]

fancy index

numpy는 array를 index value로 사용해서 값 추출

matrix 형태의 데이터도 가능

a = np.array([2, 4, 6, 8], float)
b = np.array([0, 0, 1, 3, 2, 1], int)
a[b]

a.take(b)

numpy data i/o

loadtxt & savetxt

test type의 데이터를 읽고, 저장하는 기능

import numpy as np
a = np.loadtxt("./populations.txt", delimiter="\t")
a_int = a.astype(int)
a_int[:3]

np.savetxt("int_data_2.csv", a_int, fmt="%.2e", delimiter=",")

np.save("npy_test", arr=a_int)
a_test = np.load(file="npy_test.npy")
a_test

7-1강: pandas 1

구조화된 데이터의 처리를 지원하는 Python 라이브러리
panel data -> pandas
고성능 array 계산 라이브러리인 numpy와 통합하여, 강력한 "스프레드시트"처리 기능을 제공
인덱싱, 연산용 함수, 전처리 함수 등을 제공함
데이터 처리 및 통계 분석을 위해 사용

데이터 로딩

import pandas as pd
data_url = 'https://archive~~"
df_data = pd.read_csv(data_url, sep='\s+', header = None) 
#csv 타임 데이터 로드, sparate는 빈공간으로 지정하고, Column은 없음

df_data.head()
df_data.columns = ["abc", "def", "ghi"]

series

pandas의 구성은 Series와 DataFrame으로 나뉜다.

series

DataFrame 중 하나의 Column에 해당하는 데이터의 모음 Object

column vector를 표현하는 object

index를 수자 또는 문자로 가능, data type도 정해져있다.

dataframe

series를 모아서 만든 Data Table = 기본 2차원
각각 column은 다른 type을 가질 수 있다.

indexing

loc은 index 이름, iloc은 index number

df.loc[:3]
df.loc[:, ["first_name", "seconde_name"]]
df["age"].iloc[1:]

selection & drop

# basic (Column과 index number)
df["name","street"]][:2]

# loc (Column과 index name)
df.loc[[211829,320563],["name", "street"]

# iloc (Column number와 index number)
df.iloc[:2, :2]

index 재설정

df.reset_index(inplace=True, drop=True)

data drop

df.drop([0, 1, 2, 3])
df.drop("city", axis=1)

dataframe operations

series operation

index를 기준으로 연산수행

겹치는 index가 없을 경우 NaN

dataframe operation

df는 column과 ihndex를 모두 고려

add operation을 쓰면 NaN값 0으로 변환

Operation types: add, sub, div, mul

NaN 보완하려고 fill_value를 쓴다.

**series + dataframe

axis를 기준으로 row broadcasting 실행

df.add(s2, axis=0)

lambda, map, apply

pandas의 series type의 데이터에도 map 함수 사용가능
function 대신 dict, sequence형 자료등으로 대체 가능

s1 = Series(np.arange(10))
s1.head(5)
s1.map(lambda x: x**2).head(5)

df = pd.read_csv("wages.csv")
df.head()
df.sex.unique()

df["sex_code"] = df.sex.map({"male":0, "female":1})
df.head(5)

# replace로도 가능
df.sex.replace(
    {"male":0, "female":1}
).head()

df.sex.replace(
    ["male", "female"],
    [0, 1], inplace=True)
df.head(5)

apply for dataframe

map과 달리, series 전체에 해당 함수를 적용

입력 값이 series 데이터로 입력 받아 handling 가능

내장 연산 함수를 사용할 때도 똑같은 효과를 거둘 수 있음

mean, std 등 사용가능

applymap for dataframe

모든 값에 적용

pandas built-in functions

describe: Numeric type 데이터의 요약 정보를 보여줌

unique: series data의 유일한 값을 list로 반환함

sum: 기본적인 column 또는 row 값의 연산을 지원

sub, mean, min, max, count, median, mad, var 등

isnull: column 또는 row값은 NaN (null) 값의 index를 반환함
df.isnull().sum()
sort_values: column 값을 기준으로 데이터를 sorting

Correlation & Covariance: 상관계수와 공분산을 구하는 함수

corr, cov, corrwith

7-2강: pandas 2

Groupby

SQL groupby 명령어와 같음
split -> apply -> combine
과정을 거쳐 연산함

df.groupby("Team")["Points"].sum()
df.groupby("Team")["Points"].std()
df.groupby("Team")["Points"].mean()

h_index = df.groupby(["Team", "Year"])["Points"].sum()

# 데이터를 매트릭스 형태로 풀어준다.
h_index.unstack()
h_index.reset_index()

# 인덱스 레벨도 바꿀 수 있다.
h_index.swaplevel()

# sort
h_index.sort_index(level=0)
h_index.sort_values()

# index 2개있어도 series 데이터
type(h_index)

# 시리즈이므로 레벨만 정해주면 시리즈 연산 가능
h_index.std(level=1)

Groupby에 의해 Split된 상태를 추출 가능함
- Tuple 형태로 그룹의 key 값 Value값이 추출됨

gruoped = df.groupby("Team")
for name, group in grouped:
    print(name, group)

특정 key값을 가진 정보만 추출 가능

grouped.get_group("Devils")

추출된 group 정보에는 세 가지 유형의 apply가 가능함

Aggregation: 요약된 통계정보를 추출해 줌

Transformation: 해당 정보를 변환해줌

Filtration: 특정 정보를 제거 하여 보여주는 필터링 기능

# aggregation
grouped.agg(max)
grapued.agg(np.mean)
df.describe().T

# transformation
score = lambda x: (x.max())
grouped.transform(score)

score = lambda x: (x - x.mean()) / x.std()
grouped.transform(score)

# filter
df.groupby('Team').filter(lambda x: x["Rank"].sum() > 2)

pivot table

df_phone.pivot_table(
    values=["duration"],
    index=[df_phone.month, df_phone.item],
    columns=df_phone.network,
    aggfunc="sum",
    fill_value=0,
)

# pivot table을 groupby로도 가능
df_phone.groupby("month", "item", "network"])["duration"].sum().unstack()

Crosstab

pd.crosstab(
    index=df_movie.critic,
    columns=df_movie.title,
    values=df_movie.rating,
    aggfunc="first",
).fillna(0)

Merge & Concat

merge

SQL에서 많이 사용하는 Merge와 같은 기능

두 개의 데이터를 하나로 합침

pd.merge(df_a, df_b, on="subject_id")

INNER JOIN: 양쪽 다 subject id에 같은 값있을 때
LEFT JOIN: 왼쪽 기준
Right JOIN: 오른쪽 기준
FULL JOIN: 같은 건 붙이고 아닌건 따로 정리

pd.merge(df_a, df_b, on="subject_id", how="left")
pd.merge(df_a, df_b, on="subject_id", how="right")
pd.merge(df_a, df_b, on="subject_id", how="outer")
pd.merge(df_a, df_b, on="subject_id", how="inner")

pd.maerge(df_a, df_b, reight_index=True, left_index=True)

Concat

같은 형태의 데이터를 붙이는 연산작업

df_new = pd.concat(df_a, df_b])
df_new.reset_index()(drop=True)

df_a.append(df_b)

df_new = pd.concat(df_a, df_b], axis=1)
df_new.reset_index(drop=False)

persistence

Database connections

Data loading시 db connection 기능을 제공함


import sqlite3 #pymysql <- 설치

conn = sqlite3.connect("./data/flights.db")
cur = conn.cursor()
cur.execute("select * from airlines limit 5;")
results = cur.fetchall()
results

Xls 엔진으로 openpyxls 또는 XlsxWrite 사용

2. 과제 수행 과정

월요일 선택 과제 리뷰를 위해 선택 과제 1~3 풀이 및 솔루션보고 이해

3. 회고

주말에 쉬고 싶으면 평일에 열심히 하자!

Author And Source

이 문제에 관하여(AI Tech Week 1 보충 학습 (numpy, pandas)), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@f2f42012/AI-Tech-Week-1-보충-학습-numpy-pandas

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

AI Tech Week 1 보충 학습 (numpy, pandas)