[๐Ÿค— ๊ฐ•์ขŒ 2.2] Pipeline ๋‚ด๋ถ€ ์‹คํ–‰ ๊ณผ์ •

๋ณธ ํ•œ๊ธ€ ๊ฐ•์ขŒ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ PyTorch๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

์ด์ œ ์™„์ „ํ•œ ์˜ˆ์ œ๋กœ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์šฐ์„ , 1์žฅ์—์„œ ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์‹คํ–‰ํ–ˆ์„ ๋•Œ ๋‚ด๋ถ€์ ์œผ๋กœ ๋ฌด์Šจ ์ผ์ด ์ผ์–ด๋‚ฌ๋Š”์ง€ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier(
    [
        "I've been waiting for a HuggingFace course my whole life.",
        "I hate this so much!",
    ]
)

1์žฅ์—์„œ ๋ณด์•˜๋“ฏ์ด ์ด ํŒŒ์ดํ”„๋ผ์ธ์€ ์ „์ฒ˜๋ฆฌ(preprocessing), ๋ชจ๋ธ๋กœ ์ž…๋ ฅ ์ „๋‹ฌ ๋ฐ ํ›„์ฒ˜๋ฆฌ(postprocessing)์˜ 3๋‹จ๊ณ„๋ฅผ ํ•œ๋ฒˆ์— ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์ด๋“ค ๊ฐ๊ฐ์— ๋Œ€ํ•ด ๋น ๋ฅด๊ฒŒ ์‚ดํŽด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค.

Preprocessing with a tokenizer

๋‹ค๋ฅธ ์‹ ๊ฒฝ๋ง(neural networks)๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Transformer ๋ชจ๋ธ์€ ์›์‹œ ํ…์ŠคํŠธ๋ฅผ ์ง์ ‘ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์—†์œผ๋ฏ€๋กœ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ฒซ๋ฒˆ์งธ ๋‹จ๊ณ„๋Š” ํ…์ŠคํŠธ ์ž…๋ ฅ์„ ๋ชจ๋ธ์ด ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ์ˆซ์ž๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, ๋‹ค์Œ ๊ธฐ๋Šฅ๋“ค์„ ์ˆ˜ํ–‰ํ•˜๋Š” ํ† ํฌ๋‚˜์ด์ €(tokenizer)๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

  • ์ž…๋ ฅ์„ ํ† ํฐ(token) ์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๋‹จ์–ด(word), ํ•˜์œ„ ๋‹จ์–ด(subword) ๋˜๋Š” ๊ธฐํ˜ธ(symbol)(์˜ˆ: ๊ตฌ๋‘์ )๋กœ ๋ถ„ํ• 

  • ๊ฐ ํ† ํฐ(token)์„ ์ •์ˆ˜(integer)๋กœ ๋งคํ•‘(mapping)

  • ๋ชจ๋ธ์— ์œ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ถ€๊ฐ€์ ์ธ ์ž…๋ ฅ(additional inputs)์„ ์ถ”๊ฐ€

์ด ๋ชจ๋“  ์ „์ฒ˜๋ฆฌ(preprocessing)๋Š” ๋ชจ๋ธ์ด ์‚ฌ์ „ ํ•™์Šต(pretraining)๋  ๋•Œ์™€ ์ •ํ™•ํžˆ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ์ˆ˜ํ–‰๋˜์–ด์•ผ ํ•˜๋ฏ€๋กœ ๋จผ์ € Model Hub์—์„œ ํ•ด๋‹น ์ •๋ณด๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด AutoTokenizer ํด๋ž˜์Šค์™€ from_pretrained() ๋ฉ”์„œ๋“œ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์˜ ์ฒดํฌํฌ์ธํŠธ(checkpoint) ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ํ† ํฌ๋‚˜์ด์ €(tokenizer)์™€ ์—ฐ๊ฒฐ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™์œผ๋กœ ๊ฐ€์ ธ์™€ ์บ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์ฒ˜์Œ ์‹คํ–‰ํ•  ๋•Œ๋งŒ ํ•ด๋‹น ์ •๋ณด๊ฐ€ ๋‹ค์šด๋กœ๋“œ๋ฉ๋‹ˆ๋‹ค.

sentiment-analysis ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋””ํดํŠธ ์ฒดํฌํฌ์ธํŠธ(default checkpoint)๋Š” distilbert-base-uncased-finetuned-sst-2-english(์ด ๋ชจ๋ธ์— ๋Œ€ํ•œ model card๋Š” ์—ฌ๊ธฐ์—์„œ ํ™•์ธ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค)์ด๋ฏ€๋กœ ๋‹ค์Œ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

from transformers import AutoTokenizer

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

์ผ๋‹จ ์œ„์™€ ๊ฐ™์ด ํ† ํฌ๋‚˜์ด์ €(tokenizer)๋ฅผ ์ƒ์„ฑํ•˜๋ฉด, ์•„๋ž˜์˜ ์ฝ”๋“œ์—์„œ ๋ณด๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, ์ด ํ† ํฌ๋‚˜์ด์ €์— ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜์—ฌ ๋ชจ๋ธ์— ๋ฐ”๋กœ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๋Š” ํŒŒ์ด์ฌ ๋”•์…”๋„ˆ๋ฆฌ(dictionary) ์ •๋ณด๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ์ดํ›„ ํ•ด์•ผํ•  ์ผ์€ input IDs ๋ฆฌ์ŠคํŠธ๋ฅผ ํ…์„œ(tensors)๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๊ฒƒ๋ฟ์ž…๋‹ˆ๋‹ค.

์—ฌ๋Ÿฌ๋ถ„๋“ค์€ PyTorch, TensorFlow ๋˜๋Š” Flax ๋“ฑ, ์ด๋“ค ์ค‘ ์–ด๋–ค ML ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ๋ฐฑ์—”๋“œ(backend)๋กœ ์‚ฌ์šฉ๋˜๋Š”์ง€ ๊ฑฑ์ •ํ•  ํ•„์š”๊ฐ€ ์—†์ด ๐Ÿค—Transformers๋ฅผ ๋งˆ์Œ๋Œ€๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Transformer ๋ชจ๋ธ์€ ํ…์„œ(tensor) ์ž…๋ ฅ๋งŒ ๋ฐ›์Šต๋‹ˆ๋‹ค. ๋งŒ์ผ ์—ฌ๋Ÿฌ๋ถ„์ด ํ…์„œ(tensor)์— ๋Œ€ํ•ด ์ฒ˜์Œ ์ ‘ํ•œ๋‹ค๋ฉด, NumPy ๋ฐฐ์—ด(array)์„ ์ƒ๊ฐํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. NumPy ๋ฐฐ์—ด์€ ์Šค์นผ๋ผ(0D), ๋ฒกํ„ฐ(1D), ํ–‰๋ ฌ(2D) ํ˜น์€ ๋” ๋งŽ์€ ์ฐจ์›์„ ๊ฐ€์งˆ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๊ฒƒ์€ ์‚ฌ์‹ค์ƒ ํ…์„œ์ž…๋‹ˆ๋‹ค. ๋‹ค๋ฅธ ๊ธฐ๊ณ„ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ…์„œ๋„ ๋น„์Šทํ•˜๊ฒŒ ๋™์ž‘ํ•˜๋ฉฐ, ์ผ๋ฐ˜์ ์œผ๋กœ NumPy ๋ฐฐ์—ด๋งŒํผ ๊ฐ„๋‹จํ•˜๊ฒŒ ์ƒ์„ฑ(instantiate)ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ† ํฌ๋‚˜์ด์ €๊ฐ€ ๋ฐ˜ํ™˜ํ•˜๋Š” ํ…์„œ์˜ ์œ ํ˜•(PyTorch, TensorFlow ๋˜๋Š” ์ผ๋ฐ˜ NumPy)์„ ์ง€์ •ํ•˜๋ ค๋ฉด return_tensors ์ธ์ˆ˜(argument)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค.

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="pt")
print(inputs)

์•„์ง ํŒจ๋”ฉ(padding)๊ณผ truncation์— ๋Œ€ํ•ด ์‹ ๊ฒฝ์“ฐ์ง€ ๋งˆ์„ธ์š”. ๋‚˜์ค‘์— ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๊ธฐ์–ตํ•ด์•ผ ํ•  ์ฃผ์š” ์‚ฌํ•ญ์€ ๋‹จ์ผ ๋ฌธ์žฅ ๋˜๋Š” ๋‹ค์ค‘ ๋ฌธ์žฅ ๋ฆฌ์ŠคํŠธ๋ฅผ ํ† ํฌ๋‚˜์ด์ € ํ•จ์ˆ˜๋กœ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ถœ๋ ฅ ํ…์„œ ์œ ํ˜•์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ํ…์„œ ์œ ํ˜•์ด ์ง€์ •๋˜์ง€ ์•Š์œผ๋ฉด ๊ฒฐ๊ณผ๋กœ ์ด์ค‘ ๋ฆฌ์ŠคํŠธ(list of list)๊ฐ€ ํ‘œ์‹œ๋ฉ๋‹ˆ๋‹ค.

PyTorch ํ…์„œ ์œ ํ˜•์˜ ๊ฒฐ๊ณผ๋Š” ์œ„์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค. ์œ„ ๊ฒฐ๊ณผ์—์„œ ๋ณด๋“ฏ์ด, ์ถœ๋ ฅ์€ ๋‘ ๊ฐœ์˜ ํ‚ค(key) ์ฆ‰, input_ids ๋ฐ attention_mask๋ฅผ ๊ฐ€์ง€๋Š” ํŒŒ์ด์ฌ ๋”•์…”๋„ˆ๋ฆฌ์ž…๋‹ˆ๋‹ค. input_ids์—๋Š” ๊ฐ ๋ฌธ์žฅ์— ์žˆ๋Š” ํ† ํฐ์˜ ๊ณ ์œ  ์‹๋ณ„์ž๋กœ ๊ตฌ์„ฑ๋œ ๋‘ ํ–‰์˜ ์ •์ˆ˜(๊ฐ ๋ฌธ์žฅ์— ํ•˜๋‚˜์”ฉ)๊ฐ€ ๊ฐ’(value)์œผ๋กœ ๋“ค์–ด๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ์žฅ์˜ ๋’ท๋ถ€๋ถ„์—์„œ attention_mask ๊ฐ€ ๋ฌด์—‡์ธ์ง€ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.

Going through the model

ํ† ํฌ๋‚˜์ด์ €์™€ ๋™์ผํ•œ ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ(pretrained model)์„ ๋‹ค์šด๋กœ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐Ÿค—Transformers๋Š” ์œ„์˜ AutoTokenizer ํด๋ž˜์Šค์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, from_pretrained() ๋ฉ”์„œ๋“œ๊ฐ€ ํฌํ•จ๋œ AutoModel ํด๋ž˜์Šค๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.

from transformers import AutoModel

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModel.from_pretrained(checkpoint)

์œ„ ์ฝ”๋“œ ์Šค๋‹ˆํŽซ(code snippet)์—์„œ๋Š” ์ด์ „์— ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ๋‹ค์šด๋กœ๋“œํ•˜๊ณ (์‹ค์ œ๋กœ ์ด๋ฏธ ์บ์‹œ๋˜์–ด ์žˆ์–ด์•ผ ํ•จ) ๋ชจ๋ธ์„ ์ธ์Šคํ„ด์Šคํ™”(instantiate)ํ–ˆ์Šต๋‹ˆ๋‹ค.

ํ•ด๋‹น ์•„ํ‚คํ…์ฒ˜์—๋Š” ๊ธฐ๋ณธ Transformer ๋ชจ๋“ˆ๋งŒ ํฌํ•จ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ์ž…๋ ฅ์ด ์ฃผ์–ด์ง€๋ฉด ์ž์งˆ(feature) ์ด๋ผ๊ณ ๋„ ๋ถˆ๋ฆฌ๋Š” hidden states ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ๋ชจ๋ธ ์ž…๋ ฅ์— ๋Œ€ํ•ด Transformer ๋ชจ๋ธ์— ์˜ํ•ด์„œ ์ˆ˜ํ–‰๋œ ํ•ด๋‹น ์ž…๋ ฅ์˜ ๋ฌธ๋งฅ์  ์ดํ•ด(contextual understanding) ๊ฒฐ๊ณผ ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ณ ์ฐจ์› ๋ฒกํ„ฐ(high-dimensional vector)๋ฅผ ๊ฐ€์ ธ์˜ต๋‹ˆ๋‹ค.

์ด ๋ถ€๋ถ„์ด ์ดํ•ด๊ฐ€ ๊ฐ€์ง€ ์•Š๋”๋ผ๋„ ๊ฑฑ์ •ํ•˜์ง€ ๋งˆ์„ธ์š”. ๋‚˜์ค‘์— ๋ชจ๋‘ ์„ค๋ช…ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ hidden states๋Š” ๊ทธ ์ž์ฒด๋กœ๋„ ์œ ์šฉํ•  ์ˆ˜ ์žˆ์ง€๋งŒ ์ผ๋ฐ˜์ ์œผ๋กœ head ๋ผ๊ณ  ์•Œ๋ ค์ง„ ๋ชจ๋ธ์˜ ๋‹ค๋ฅธ ๋ถ€๋ถ„์— ๋Œ€ํ•œ ์ž…๋ ฅ์œผ๋กœ ๋“ค์–ด๊ฐ‘๋‹ˆ๋‹ค. 1์žฅ์—์„œ, ๋™์ผํ•œ ์•„ํ‚คํ…์ฒ˜๋กœ ์„œ๋กœ ๋‹ค๋ฅธ ํƒœ์Šคํฌ(task)๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์—ˆ์ง€๋งŒ ์ด๋Ÿฌํ•œ ๊ฐ ํƒœ์Šคํฌ(task)์—๋Š” ์„œ๋กœ ๋‹ค๋ฅธ ํ—ค๋“œ(head)๊ฐ€ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

A high-dimensional vector?

Transformer ๋ชจ๋“ˆ์˜ ๋ฒกํ„ฐ ์ถœ๋ ฅ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๊ทœ๋ชจ๊ฐ€ ํฝ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ์„ธ ๊ฐ€์ง€ ์ฐจ์›์ด ์žˆ์Šต๋‹ˆ๋‹ค:

  • ๋ฐฐ์น˜ ํฌ๊ธฐ(Batch size): ํ•œ ๋ฒˆ์— ์ฒ˜๋ฆฌ๋˜๋Š” ์‹œํ€€์Šค(sequence)์˜ ๊ฐœ์ˆ˜(์œ„์˜ ์˜ˆ์ œ์—์„œ๋Š” 2๊ฐœ).

  • ์‹œํ€€์Šค ๊ธธ์ด(Sequence length): ์‹œํ€€์Šค ์ˆซ์ž ํ‘œํ˜„์˜ ๊ธธ์ด(์ด ์˜ˆ์—์„œ๋Š” 16).

  • ์€๋‹‰ ํฌ๊ธฐ(Hidden size): ๊ฐ ๋ชจ๋ธ ์ž…๋ ฅ์˜ ๋ฒกํ„ฐ ์ฐจ์›.

์œ„์—์„œ ๋งˆ์ง€๋ง‰ ๊ฐ’ ๋•Œ๋ฌธ์— "๊ณ ์ฐจ์›(high-dimensional)" ๋ฒกํ„ฐ๋ผ๊ณ  ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. Hidden size๋Š” ๋งค์šฐ ํด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค(768์€ ์ž‘์€ ๋ชจ๋ธ์— ์ผ๋ฐ˜์ ์ด๊ณ  ํฐ ๋ชจ๋ธ์—์„œ๋Š” 3072 ์ด์ƒ์ผ ์ˆ˜๋„ ์žˆ์Œ).

์‚ฌ์ „ ์ฒ˜๋ฆฌํ•œ ์ž…๋ ฅ์„ ๋ชจ๋ธ์— ๋„˜๊ธฐ๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๋‚ด์šฉ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

outputs = model(**inputs)
print(outputs.last_hidden_state.shape)

๐Ÿค—Transformers ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์€ namedtuple ๋˜๋Š” ๋”•์…”๋„ˆ๋ฆฌ(dictionary)์ฒ˜๋Ÿผ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์š”์†Œ์— ์ ‘๊ทผํ•˜๊ธฐ ์œ„ํ•ด์„œ ์†์„ฑ ๋˜๋Š” ํ‚ค(outputs["last_hidden_state"])๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ฐพ๊ณ  ์žˆ๋Š” ํ•ญ๋ชฉ์ด ์–ด๋””์— ์žˆ๋Š”์ง€ ์ •ํ™•ํžˆ ์•Œ๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ ์ธ๋ฑ์Šค(outputs[0])๋กœ๋„ ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Model heads: Making sense out of numbers

๋ชจ๋ธ ํ—ค๋“œ(model head)๋Š” hidden states์˜ ๊ณ ์ฐจ์› ๋ฒกํ„ฐ(high-dimensional vector)๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ๋‹ค๋ฅธ ์ฐจ์›์— ํˆฌ์˜(project)ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ํ—ค๋“œ(head)๋Š” ํ•˜๋‚˜ ๋˜๋Š” ๋ช‡ ๊ฐœ์˜ ์„ ํ˜• ๋ ˆ์ด์–ด(linear layers)๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค.

Transformer ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์€ ์ฒ˜๋ฆฌํ•  ๋ชจ๋ธ ํ—ค๋“œ(model head)๋กœ ์ง์ ‘ ์ „๋‹ฌ๋ฉ๋‹ˆ๋‹ค.

์œ„ ๊ทธ๋ฆผ์—์„œ ๋ชจ๋ธ์€ ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด(embeddings layer)์™€ ํ›„์† ๋ ˆ์ด์–ด(subsequent layers)๋กœ ํ‘œํ˜„๋ฉ๋‹ˆ๋‹ค. ์ž„๋ฒ ๋”ฉ ๋ ˆ์ด์–ด(embeddings layer)๋Š” ํ† ํฐํ™”๋œ ์ž…๋ ฅ(tokenized input)์˜ ๊ฐ ์ž…๋ ฅ ID๋ฅผ ํ•ด๋‹น ํ† ํฐ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฒกํ„ฐ(embeddings vector)๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ดํ›„์˜ ํ›„์† ๋ ˆ์ด์–ด๋Š” ์ฃผ์˜ ๋ฉ”์ปค๋‹ˆ์ฆ˜(attention mechanism)์„ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋“ค ์ž„๋ฒ ๋”ฉ ๋ฒกํ„ฐ(embeddings vector)๋ฅผ ์กฐ์ž‘ํ•˜์—ฌ ๋ฌธ์žฅ์˜ ์ตœ์ข… ํ‘œํ˜„(final representation)์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

๐Ÿค—Transformers์—๋Š” ๋‹ค์–‘ํ•œ ์•„ํ‚คํ…์ฒ˜๊ฐ€ ์žˆ์œผ๋ฉฐ ๊ฐ ์•„ํ‚คํ…์ฒ˜๋Š” ํŠนํ™”๋œ ์ž‘์—…์„ ์ฒ˜๋ฆฌํ•˜๋„๋ก ์„ค๊ณ„๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ์€ ์ผ๋ถ€ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค:

  • *Model (hidden states๋ฅผ ๋ฆฌํ„ด)

  • *ForCausalLM

  • *ForMaskedLM

  • *ForMultipleChoice

  • *ForQuestionAnswering

  • *ForSequenceClassification

  • *ForTokenClassification

  • and others๐Ÿค—

์ด ์„น์…˜์—์„œ์˜ ์˜ˆ์‹œ์—์„œ๋Š” ์‹œํ€€์Šค ๋ถ„๋ฅ˜ ํ—ค๋“œ(sequence classification head)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋Š” ๋ชจ๋ธ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค(๋ฌธ์žฅ์„ ๊ธ์ • ๋˜๋Š” ๋ถ€์ •์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ธฐ ์œ„ํ•ด์„œ). ๋”ฐ๋ผ์„œ ์‹ค์ œ๋กœ AutoModel ํด๋ž˜์Šค๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ๋Œ€์‹  AutoModelForSequenceClassification๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

from transformers import AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)

์ด์ œ ์ถœ๋ ฅ์˜ ๋ชจ์–‘(shape)์„ ๋ณด๋ฉด ์ฐจ์›์ด ํ›จ์”ฌ ๋‚ฎ์•„์ง‘๋‹ˆ๋‹ค. ๋ชจ๋ธ ํ—ค๋“œ(model head)๋Š” ๊ณ ์ฐจ์› ๋ฒกํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ณ  ๋‘ ๊ฐœ์˜ ๊ฐ’(๋ ˆ์ด๋ธ”๋‹น ํ•˜๋‚˜์”ฉ)์„ ํฌํ•จํ•˜๋Š” ๋ฒกํ„ฐ๋ฅผ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค.

print(outputs.logits.shape)

๋‘ ๊ฐœ์˜ ๋ฌธ์žฅ๊ณผ ๋‘ ๊ฐœ์˜ ๋ ˆ์ด๋ธ”๋งŒ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋ชจ๋ธ์—์„œ ์–ป์€ ๊ฒฐ๊ณผ์˜ ๋ชจ์–‘(shape)์€ 2 x 2์ž…๋‹ˆ๋‹ค.

Postprocessing the output

๋ชจ๋ธ์—์„œ ์ถœ๋ ฅ์œผ๋กœ ์–ป์€ ๊ฐ’์€ ๋ฐ˜๋“œ์‹œ ๊ทธ ์ž์ฒด๋กœ ์˜๋ฏธ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์€ ์•„๋‹™๋‹ˆ๋‹ค. ๋‹ค์Œ์„ ํ•œ๋ฒˆ ๋ณด์‹œ์ง€์š”.

print(outputs.logits)

์šฐ๋ฆฌ ๋ชจ๋ธ์€ ์ฒซ ๋ฒˆ์งธ ๋ฌธ์žฅ์— ๋Œ€ํ•ด [-1.5607, 1.6123], ๋‘ ๋ฒˆ์งธ ๋ฌธ์žฅ์— ๋Œ€ํ•ด [4.1692, -3.3464]๋ฅผ ์˜ˆ์ธกํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ํ™•๋ฅ ์ด ์•„๋‹ˆ๋ผ ๋ชจ๋ธ์˜ ๋งˆ์ง€๋ง‰ ๊ณ„์ธต์—์„œ ์ถœ๋ ฅ๋œ ์ •๊ทœํ™”๋˜์ง€ ์•Š์€ ์›์‹œ ์ ์ˆ˜์ธ logits ์ž…๋‹ˆ๋‹ค. ์ด๋“ค ๊ฐ’์„ ํ™•๋ฅ ๋กœ ๋ณ€ํ™˜ํ•˜๋ ค๋ฉด SoftMax ๊ณ„์ธต์„ ํ†ต๊ณผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  ๐Ÿค—Transformers ๋ชจ๋ธ์€ ์ด logits ๊ฐ’์„ ์ถœ๋ ฅํ•ฉ๋‹ˆ๋‹ค. ๊ทธ ์ด์œ ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ํ•™์Šต์„ ์œ„ํ•œ ์†์‹ค ํ•จ์ˆ˜(loss function)๋Š” ์ตœ์ข… ํ™œ์„ฑํ™” ํ•จ์ˆ˜(activation function, e.g., SoftMax)์™€ ์‹ค์ œ ์†์‹ค ํ•จ์ˆ˜(actual loss function, e.g., cross entropy)๋ฅผ ๋ชจ๋‘ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌํ˜„๋˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

import torch

predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)

์ด์ œ ๋ชจ๋ธ์ด ์ฒซ ๋ฒˆ์งธ ๋ฌธ์žฅ์— ๋Œ€ํ•ด [0.0402, 0.9598], ๋‘ ๋ฒˆ์งธ ๋ฌธ์žฅ์— ๋Œ€ํ•ด [0.9995, 0.0005]๋ฅผ ์˜ˆ์ธกํ–ˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋“ค์€ ์šฐ๋ฆฌ๊ฐ€ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋Š” ํ™•๋ฅ  ์ ์ˆ˜์ž…๋‹ˆ๋‹ค.

๊ฐ ์œ„์น˜์— ํ•ด๋‹นํ•˜๋Š” ๋ ˆ์ด๋ธ”์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•ด, model.config์˜ id2label ์†์„ฑ๊ฐ’์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค. ๋” ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋‹ค์Œ ์„น์…˜์—์„œ ๋‹ค๋ฃน๋‹ˆ๋‹ค.

model.config.id2label

์ด์ œ ๋ชจ๋ธ์ด ์•„๋ž˜ ๋‚ด์šฉ์„ ์˜ˆ์ธกํ–ˆ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

  • ์ฒซ๋ฒˆ์งธ ๋ฌธ์žฅ : NEGATIVE: 0.0402, POSITIVE: 0.9598

  • ๋‘๋ฒˆ์งธ ๋ฌธ์žฅ : NEGATIVE: 0.9995, POSITIVE: 0.0005

์ง€๊ธˆ๊นŒ์ง€ ํŒŒ์ดํ”„๋ผ์ธ(pipeline)์˜ ๋‚ด๋ถ€์—์„œ ์‹คํ–‰๋˜๋Š” 3๋‹จ๊ณ„์ธ ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์‚ฌ์šฉํ•œ ์ „์ฒ˜๋ฆฌ(preprocessing), ๋ชจ๋ธ์„ ํ†ตํ•œ ์ž…๋ ฅ ์ „๋‹ฌ(passing the inputs through the model) ๋ฐ ํ›„์ฒ˜๋ฆฌ(postprocessing)๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ์‹คํ–‰ํ•ด๋ดค์Šต๋‹ˆ๋‹ค.

โœ๏ธ Try it out! ๋ณธ์ธ์ด ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ํ…์ŠคํŠธ๋ฅผ ๋‘ ๊ฐœ(๋˜๋Š” ๊ทธ ์ด์ƒ) ์„ ํƒํ•˜๊ณ  sentiment analysis ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด ์‹คํ–‰ํ•ด ๋ด…์‹œ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์—ฌ๊ธฐ์—์„œ ์„ค๋ช…ํ•œ ๋Œ€๋กœ ์ง์ ‘ ์‹คํ–‰ํ•ด๋ณด๊ณ , ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป๋Š”์ง€ ํ™•์ธํ•ด๋ณด์„ธ์š”!

์ข‹์€ ์›นํŽ˜์ด์ง€ ์ฆ๊ฒจ์ฐพ๊ธฐ