[DM] Frequent Itemsets, Association rules

1. 연관 규칙 발견

마트 진열 관리 → Market-basket model
Goal : 같이 구매하는 물건 찾기
Approach : 물건 사이의 종속성을 찾기 위해 바코드 스캐너로 판매 데이터 수집

2. Market-basket model

A large set of items : 마트에서 파는 물건들
A large set of baskets(transactions) : 한 번에 같이 사는 items
각 basket은 itmes의 small subset
association rules(연관규칙)
👉 {x, y, z}를 구매한 사람들은 {v, w}도 함께 구매하는 경향이 있음

3. 응용

items = products
baskets = sets of products someone bought in one trip to the store

실제로 마트는 고객이 함께 구입하는 물건에 대한 데이터를 보관하고 있음
👉 tricks을 쓰기도 함 ⇒ 기저귀와 맥주가 같이 잘 팔린다고 할 때, 기저귀는 세일하면서 맥주의 가격은 인상시킴

4. Frequent itemsets

baskets 내에서 빈번하게 같이 나타나는 itemsets 찾기
Itemset I의 Support : I의 모든 items가 포함되어 있는 baskets의 수

💡 Support는 종종 전체 basket 수에 대한 분수로 표현될 수도 있음

support threshold s (임계값 s)
: 적어도 s개의 baskets에서 나타난 itemsets을 frequent itemsets라고 한다

예시)
Items = {milk, coke, pepsi, beer, juice}

Support threshold = 3 baskets

B1 = {m, c, b}
B2 = {m, p, j}
B3 = {m, b}
B4 = {c, j}
B5 = {m, p, b}
B6 = {m, c, b, j}
B7 = {c, b, j}
B8 = {b, c}

Frequent itemsets : {m}, {c}, {b}, {j}, {m, b}, {b, c}, {c, j}

5. Association rules, Confidence

if-then rules about the contents of baskets

{ $i _{1} ,i _{2} ,...,i _{k}$
실제로 많은 규칙들이 있고, 그 중에서도 중요하고, 흥미로운 규칙을 찾아야 함!
👉 confidence를 정의

📌Confidence
연관규칙의 신뢰도는 I = { $i _{1} ,i _{2} ,...,i _{k}$

📌 문제점

confidence 값이 높다고 항상 그 규칙이 interesting한 건 아니다!
👉 ex) X → milk 라는 연관 규칙이 높은 confidence 값을 갖는 건, 그저 X와 상관없이(independent of X) milk의 판매율이 높아서일지도 모른다.

📌Interest
연관 규칙 I → j의 interest는 confidence에서 j를 포함하고 있는 basket의 비율을 뺀 값이다.

$Interest(I→j) = conf(I → j) - Pr[j]$
Interesting rule은 높은 양의 값 또는 음의 값을 가진 규칙이다. (보통 0.5이상)
보통 절댓값을 사용

📝 예시

B1 = {m, c, b}
B2 = {m, p, j}
B3 = {m, b}
B4 = {c, j}
B5 = {m, p, b}
B6 = {m, c, b, j}
B7 = {c, b, j}
B8 = {b, c}

💡 Association rule : {m, b} → c

Confidence = 2/4 = 0.5

Interest = 0.5 - 5/8 = -1/8
👉 Item c는 basket의 5/8에서 나타나고 있음
👉 이 규칙은 별로 interesting 하지 않다는 것을 알 수 있음

💡 Finding association rules

support ≥ s & confidence ≥ c를 만족하는 모든 association rules를 찾아야 함 (s, c는 사용자가 임의로 지정)
👉 frequent itemsets를 찾아야 함
👉 만약 { $i _{1} ,i _{2} ,...,i _{k}$

💡 Mining association rules

1. 모든 frequent itemsets I 찾기

⇒ 다음 챕터에서 설명

2. Rule generation
주어진 confidence 임계값보다 높은 규칙 출력

⭐ I의 모든 subset A는 반드시 항상 frequent⭐
규칙 A → I\A 생성

ex) I : {a, b, c}, A : {b, c}
A → I\A
👉 {b, c} → {a}

confidence(A, B → C, D) = support(A, B, C, D) / support(A, B)
만약 A, B, C → D 가 주어진 confidence 값보다 작으면, A, B → C, D 역시 마찬가지이다.

📝 예시

B1 = {m, c, b}
B2 = {m, p, j}
B3 = {m, b}
B4 = {c, j}
B5 = {m, p, b}
B6 = {m, c, b, j}
B7 = {c, b, j}
B8 = {b, c}

💡 Support threshold s = 3, confidence c = 0.75

1) Frequent itemsets:
{b, m}, {b, c}, {c, m}, {c, j}, {m, c, b}

2) Generate rules:
b→m : c=4/6
b→c : c=5/6
b,c→m : c=3/5
m→b : c=4/5
b,m→c : c=3/4
b→c,m : c=3/6
...

Author And Source

이 문제에 관하여([DM] Frequent Itemsets, Association rules), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@jiyeah3108/DM-Frequent-Itemsets-Association-rules

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

[DM] Frequent Itemsets, Association rules

1. 연관 규칙 발견

2. Market-basket model

3. 응용

4. Frequent itemsets

5. Association rules, Confidence

📌 문제점

📝 예시

💡 Finding association rules

💡 Mining association rules

📝 예시

Author And Source

좋은 웹페이지 즐겨찾기