요약

추천시스템 분야에서 기존에 쓰이던 MF에서 벗어나 DNN을 활용한 논문이다.

NCF 는 GMF와 MLP를 앙상블한 모델이다.
여기서 GMF는 MF에 non-linear activation function을 더해 표현력을 높인 MF이고,
MLP는 임베딩된 user Id와 itemID를 Input으로 받아 0~1의 $\hat y$

Matrix Factorization의 문제점

참고 : Matrix Factorization 논문 리뷰 & 구현

MF는 두 벡터( $p_u$

단순 선형 결합을 통해 $\hat y$

(b)를 보면 $u_4$

이러한 모순을 해결하기 위해 latent space에 축을 하나더 추가 할 수 있지만, overfitting 을 야기 할 수 있다.

따라서 이런 단점을 해결하기 위해 선형결합 대신 Deep Neural Networks를 사용한다.

Input

그림의 Input Layer에서 볼 수 있듯 유저와 아이템의 ID를 one-hot-encoding 하여 sparse한 vector로 변환하여 Input으로 사용한다.

예시)

각 User u 와 Item i 의 id를 one-hot-encoding
User u = [0,0,0,0,1,0,0,0]
Item i = [0,0,0,0,0,0,1,0,0,0,0]

그리고 Embedding Layer에서 Sparse한 one-hot-encoding 데이터를 Dense한 벡터로 바꿔준다.
이렇게 얻어진 dense vector은 Latent vector로도 볼 수 있다.

Model

모델에서 나오는 예측값 $\hat y_{u,i}$

y_{u,i} = \begin{cases} 1, \ \ \ \ if\ interaction\ (user\ u, item\ i)\ is\ observed; \\ 0, \ \ \ \ otherwise. \end{cases}

Implicit Feedback이기 때문에 1인 경우 interaction이 있었을 뿐 선호도를 반영하지 않고, 0인 경우 마찬가지로 item을 좋아하지 않는다는 것을 나타내지 않는다.

MLP

Input data는 user, item을 각각 one-hot-encoding된 값이다. (Input Layer)
Input을 Latent Vector로 각각 임베딩한다(Embedding Layer)
임베딩된 P (user Latent vector), Q (item Latent vector) 를 쌓여진 Layer에 올려 feed forward 한다 (Neural CF Layer)
최종 결과값( $\hat y_{ui}$

위 과정을 수식으로 나타내면 다음과 같다.

\begin{aligned} \mathbf{z}_{1} &=\phi_{1}\left(\mathbf{p}_{u}, \mathbf{q}_{i}\right)=\left[\begin{array}{c} \mathbf{p}_{u} \\ \mathbf{q}_{i} \end{array}\right] \\ \phi_{2}\left(\mathbf{z}_{1}\right) &=a_{2}\left(\mathbf{W}_{2}^{T} \mathbf{z}_{1}+\mathbf{b}_{2}\right) \\ \cdots & \\ \phi_{L}\left(\mathbf{z}_{L-1}\right) &=a_{L}\left(\mathbf{W}_{L}^{T} \mathbf{z}_{L-1}+\mathbf{b}_{L}\right) \\ \hat{y}_{u i} &=\sigma\left(\mathbf{h}^{T} \phi_{L}\left(\mathbf{z}_{L-1}\right)\right) \end{aligned}

GMF

MF가 NCF의 특별한 케이스가 됨을 보여준다.
앞서 Input에서 User와 Item 의 ID를 임베딩하여 Latent vector로 볼 수 있다.

첫번째 레이어를 맵핑하면 다음의 식과 같다.

$\phi_{1}\left(\mathbf{p}_{u}, \mathbf{q}_{i}\right)=\mathbf{p}_{u} \odot \mathbf{q}_{i}$

⊙는 성분곱
$\mathbf{p}_{u} = \mathbf{P}^{T} \mathbf{v}_{u}^{U}$
$\mathbf{q}_{i} = \mathbf{Q}^{T} \mathbf{v}_{i}^{I}$
qi=QTviI

이때 P, Q 는 latent vector , v는 one-hot-encoding 된 vector

위 식을 output layer로 보내면 다음의 최종식이 나오게 된다.

$\hat{y}_{u i}=a_{o u t}\left(\mathbf{h}^{T}\left(\mathbf{p}_{u} \odot \mathbf{q}_{i}\right)\right)$

$\mathbf{h}$
$a_{o u t}$

이렇게 만들어진 GMF식은 non-linear activation 을 추가하였기 때문에 기존의 MF보다 좋은 표현력을 가지게 된다.

※ $\mathbf{h}$

GMF & MLP 혼합

두 모델을 앙상블하는 간단한 솔루션은 GMF와 MLP가 동일한 Embedding Layer를 공유하게 하고, 그들의 인터랙션 기능들의 출력들을 결합하는 것이다. 그러나 GMF및 MLP의 Embedding을 공유하는 것은 융합된 모델의 성능을 제한할 수 있다.

이 논문에서는 융합된 모델에 flexibility를 주기 위해 GMF와 MLP를 서로 다른 embedding layer 사용하여
별도로 학습하고 마지막 은닉층을 concat하는 방법을 제안한다.

식으로 표현하면 다음과 같다.

\begin{aligned} \phi^{G M F} &=\mathbf{p}_{u}^{G} \odot \mathbf{q}_{i}^{G}, \\ \phi^{M L P} &=a_{L}\left(\mathbf{W}_{L}^{T}\left(a_{L-1}\left(\ldots a_{2}\left(\mathbf{W}_{2}^{T}\left[\begin{array}{l} \mathbf{p}_{u}^{M} \\ \mathbf{q}_{i}^{M} \end{array}\right]+\mathbf{b}_{2}\right) \ldots\right)\right)+\mathbf{b}_{L}\right), \\ \hat{y}_{u i} &=\sigma\left(\mathbf{h}^{T}\left[\begin{array}{l} \phi^{G M F} \\ \phi^{M L P} \end{array}\right]\right), \end{aligned}

$\mathbf{p}_{u}^{G}$
$\mathbf{p}_{u}^{M}$

Loss

likelihood function

Implicit feedback은 0,1로 이루어져 있기 때문에 $\hat y_{ui}$

p\left(\mathcal{Y}, \mathcal{Y}^{-} \mid \mathbf{P}, \mathbf{Q}, \Theta_{f}\right)=\prod_{(u, i) \in \mathcal{Y}} \hat{y}_{u i} \prod_{(u, j) \in \mathcal{Y}^{-}}\left(1-\hat{y}_{u j}\right)

$\mathcal Y$
$\mathcal Y^-$

Binaray cross-entropy loss

likelihood function 에 $-log$

\begin{aligned} L &=-\sum_{(u, i) \in \mathcal{Y}} \log \hat{y}_{u i}-\sum_{(u, j) \in \mathcal{Y}^{-}} \log \left(1-\hat{y}_{u j}\right) \\ &=-\sum_{(u, i) \in \mathcal{Y} \cup \mathcal{Y}^{-}} y_{u i} \log \hat{y}_{u i}+\left(1-y_{u i}\right) \log \left(1-\hat{y}_{u i}\right) \end{aligned}

NCF에서는 이 Binaray cross-entropy loss를 가지고 Loss를 계산하게 된다.

Contribution

추천시스템에 Deep Neural Network 적용
MF는 NCF의 특별한 케이스가 됨을 증명
여러 실험을 통한 NCF의 효율성 증명

논문 모르는 용어

Point-wise loss : 한번에 1개의 아이템만 고려

기존 Explicit Data에서 사용 ( True_rating - Predicted_ratin )

Pair-wise loss : 한번에 2개의 아이템을 고려

(Positive item - 관측된 데이터, Negative item - 관측되지 않은 데이터)

Sparse Representation : 아이템 가지수 = 차원의 수 (원핫인코딩)

ex) a = [1, 0, 0, 0, 0, 0, 0, 0, 0], b = [0, 1, 0, 0, 0, 0, 0, 0, 0] (0 or 1)

Dense Representation : 아이템 가지수 >> 차원의 수

ex) a = [0.2,1.4,-0.4], b = [0.1, 0.5, 1.1] (실수값으로 이루어짐)

Author And Source

이 문제에 관하여([추천시스템] NCF : Neural Collaborative Filtering 논문 리뷰), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@minchoul2/추천시스템-Neural-Collaborative-Filtering-논문-리뷰

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

[추천시스템] NCF : Neural Collaborative Filtering 논문 리뷰

요약