원티드 프리온보딩 AI/ML 코스 Week1

6535 단어 AI NLP AI

1. 본인이 본 강의를 수강하는 목적

취업 연계를 통한 취업 + 프로젝트
NLP 심화 학습
실제 기업은 어떤 형태로 일을 하는지 체험하기 위해

2. Paperswithcode에서 NLP sub task 중 2개를 선택하여 정리

문제 정의
- task가 해결하고자 하는 문제가 무엇인가? (Question Answering)
  - 챗봇을 이용한 상담 중 유저의 질문이 대답할 수 없는 질문을 판단하는 것

데이터 소개
- task를 해결하기 위해 사용할 수 있는데 데이터가 무엇인가?
  - SQuAD(Stanford Question Answering Dataset) 링크
- 데이터 구조는 어떻게 생겼는가?
  - 위키백과 문서에서 파생된 질문-응답쌍 모음으로 SQuAD 1.1의 질문 데이터(10만) + 유저가 적대적으로 작성한 답변불가 질문(5만)을 유사한 형태로 결합한 데이터
  - Data Instances
    - train
```
{
    "answers": {
        "answer_start": [1],
        "text": ["This is a test text"]
    },
    "context": "This is a test context.",
    "id": "1",
    "question": "Is this a test?",
    "title": "train test"
}
```
  - Data Fields
    - id: a string feature.
    - title: a string feature.
    - context: a string feature.
    - question: a string feature.
    - answers: a dictionary feature containing:
      - text: a string feature.
      - answer_start: a int32 feature.

SOTA(State-of-the-Art : 최신 기술) 모델 소개(대표 모델 1개)
- task의 SOTA 모델은 무엇인가?
  - Retro-Reader
- 해당 모델 논문의 요약에서 주요 키워드는 무엇인가?
  - reading and verification

문제 정의
- task가 해결하고자 하는 문제가 무엇인가? (Sentiment Analysis)
  - 게시글이나 리뷰의 감정 분석을 통한 의사 결정

데이터 소개
- task를 해결하기 위해 사용할 수 있는데 데이터가 무엇인가?
  - SST 링크
- 데이터 구조는 어떻게 생겼는가?
  - 영화 리뷰에서 추출한 단일 문장 11,855개의 말뭉치 + 3명의 사람이 주석을 단 215,154개의 독특한 문구
  - Data Instances
    - default
```
{'label': 0.7222200036048889,
 'sentence': 'Yet the act is still charming here .',
 'tokens': 'Yet|the|act|is|still|charming|here|.',
 'tree': '15|13|13|10|9|9|11|12|10|11|12|14|14|15|0'}
```
    - dictionary
```
{'label': 0.7361099720001221, 
'phrase': 'still charming'}
```
    - ptb
```
{'ptb_tree': '(3 (2 Yet) (3 (2 (2 the) (2 act)) (3 (4 (3 (2 is) (3 (2 still) (4 charming))) (2 here)) (2 .))))'}
```
  - Data Fields
    - sentence: a complete sentence expressing an opinion about a film
    - label: the degree of "positivity" of the opinion, on a scale between 0.0 and 1.0
    - tokens: a sequence of tokens that form a sentence
    - tree: a sentence parse tree formatted as a parent pointer tree
    - phrase: a sub-sentence of a complete sentence
    - ptb_tree: a sentence parse tree formatted in Penn Treebank-style, where each component's degree of positive sentiment is labelled on a scale from 0 to 4

- SOTA(State-of-the-Art : 최신 기술) 모델 소개(대표 모델 1개)

  * task의 SOTA 모델은 무엇인가?
    * **[MUPPET Roberta Large](https://paperswithcode.com/paper/muppet-massive-multi-task-representations)**
  * 해당 모델 논문의 요약에서 주요 키워드는 무엇인가?
    * pre-finetuning consistently improves performance for pretrained discriminators

Author And Source

이 문제에 관하여(원티드 프리온보딩 AI/ML 코스 Week1), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@watemelon_0718/원티드-프리온보딩-AIML-코스-Week1

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다