[๐Ÿค— ๊ฐ•์ขŒ 2.5] ๋‹ค์ค‘ ์‹œํ€€์Šค ์ฒ˜๋ฆฌ

์ด์ „ ์„น์…˜์—์„œ ์šฐ๋ฆฌ๋Š” ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ํ™œ์šฉ ์‚ฌ๋ก€๋ฅผ ์‚ดํŽด๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ฆ‰, ๊ธธ์ด๊ฐ€ ์งง์€ ๋‹จ์ผ ์‹œํ€€์Šค์— ๋Œ€ํ•œ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ช‡ ๊ฐ€์ง€ ์˜๋ฌธ์ ์ด ๋ฒŒ์จ ๋จธ๋ฆฌ์†์— ๋‚จ์Šต๋‹ˆ๋‹ค:

  • ๋‹ค์ค‘ ์‹œํ€€์Šค(multiple sequences)๋ฅผ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ• ๊นŒ?

  • ๊ฐ๊ฐ์ด ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅธ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์‹œํ€€์Šค๋ฅผ ์–ด๋–ป๊ฒŒ ์ฒ˜๋ฆฌํ• ๊นŒ?

  • ๋ชจ๋ธ์ด ์ž˜ ๋™์ž‘ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์–ดํœ˜์ง‘(vocabulary)์˜ ์ธ๋ฑ์Šค๋“ค๋งŒ ์ž…๋ ฅํ•˜๋ฉด ๋ ๊นŒ?

  • ๊ธธ์ด๊ฐ€ ์—„์ฒญ๋‚˜๊ฒŒ ๊ธด ์‹œํ€€์Šค์— ๋Œ€ํ•ด์„œ๋Š” ์ž˜ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์„๊นŒ?

์œ„ ์งˆ๋ฌธ๋“ค์ด ์–ด๋–ค ๋ฌธ์ œ๋ฅผ ์ผ์œผํ‚ค๊ณ  ์ด๋ฅผ ๐Ÿค—Transformers API๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ์•„๋ด…์‹œ๋‹ค.

๋ชจ๋ธ(model)์€ ์ž…๋ ฅ์˜ ๋ฐฐ์น˜(batch) ํ˜•ํƒœ๋ฅผ ์š”๊ตฌํ•œ๋‹ค.

์ด์ „ ์„น์…˜์—์„œ ์‹œํ€€์Šค๊ฐ€ ์ˆซ์ž ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜๋˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ์ด ์ˆซ์ž ๋ฆฌ์ŠคํŠธ๋ฅผ ํ…์„œ(tensor)๋กœ ๋ณ€ํ™˜ํ•˜๊ณ  ๋ชจ๋ธ์— ์ž…๋ ฅํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)
input_ids = torch.tensor(ids)
# This line will fail
model(input_ids)
---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

/tmp/ipykernel_9651/1126667217.py in <module>
     12 input_ids = torch.tensor(ids)
     13 # This line will fail
---> 14 model(input_ids)


~/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []


~/anaconda3/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
    727         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
    728 
--> 729         distilbert_output = self.distilbert(
    730             input_ids=input_ids,
    731             attention_mask=attention_mask,


~/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []


~/anaconda3/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids, attention_mask, head_mask, inputs_embeds, output_attentions, output_hidden_states, return_dict)
    548 
    549         if inputs_embeds is None:
--> 550             inputs_embeds = self.embeddings(input_ids)  # (bs, seq_length, dim)
    551         return self.transformer(
    552             x=inputs_embeds,


~/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
   1100         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1101                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1102             return forward_call(*input, **kwargs)
   1103         # Do not call functions when jit is used
   1104         full_backward_hooks, non_full_backward_hooks = [], []


~/anaconda3/lib/python3.8/site-packages/transformers/models/distilbert/modeling_distilbert.py in forward(self, input_ids)
    117         embeddings)
    118         """
--> 119         seq_length = input_ids.size(1)
    120 
    121         # Setting the position-ids to the registered buffer in constructor, it helps


IndexError: Dimension out of range (expected to be in range of [-1, 0], but got 1)

์ด๋Ÿฐ! ๋ญ๊ฐ€ ๋ฌธ์ œ์ผ๊นŒ์š”? ์œ„ ์ฝ”๋“œ์—์„œ ์šฐ๋ฆฌ๋Š” ์„น์…˜ 2์—์„œ์˜ ํŒŒ์ดํ”„๋ผ์ธ ๋‹จ๊ณ„๋ฅผ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ž์Šต๋‹ˆ๋‹ค.

์œ„ ๋ฌธ์ œ๋Š” ์šฐ๋ฆฌ๊ฐ€ ๋ชจ๋ธ์— ํ•˜๋‚˜์˜ ๋‹จ์ผ ์‹œํ€€์Šค๋ฅผ ์ž…๋ ฅํ•ด์„œ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์ž…๋‹ˆ๋‹ค. ๐Ÿค—Transformers ๋ชจ๋ธ์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ๋‹ค์ค‘ ๋ฌธ์žฅ(์‹œํ€€์Šค)์„ ํ•œ๋ฒˆ์— ์ž…๋ ฅํ•˜๊ธฐ๋ฅผ ๊ธฐ๋Œ€ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—์„œ ์šฐ๋ฆฌ๋Š” ์‹œํ€€์Šค์— ํ† ํฌ๋‚˜์ด์ €๋ฅผ ์ ์šฉํ•  ๋•Œ ์‹ค์ œ ๋‚ด๋ถ€์ ์œผ๋กœ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ๋“  ์ž‘์—…์„ ์‹œ๋„ํ–ˆ์Šต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์ž์„ธํžˆ ๋ณด๋ฉด ์ž…๋ ฅ ์‹๋ณ„์ž(input IDs) ๋ฆฌ์ŠคํŠธ๋ฅผ ํ…์„œ๋กœ ๋ณ€ํ™˜ํ•˜๋Š” ๋™์‹œ์— ์ฐจ์›(dimension) ํ•˜๋‚˜๊ฐ€ ๊ทธ ์œ„์— ์ถ”๊ฐ€๋˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

tokenized_inputs = tokenizer(sequence, return_tensors="pt")
print(tokenized_inputs["input_ids"])
tensor([[  101,  1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,
          2607,  2026,  2878,  2166,  1012,   102]])

์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•œ ์ฝ”๋“œ์—์„œ input_ids์— ์ƒˆ๋กœ์šด ์ฐจ์›์„ ํ•˜๋‚˜ ์ถ”๊ฐ€ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence = "I've been waiting for a HuggingFace course my whole life."

tokens = tokenizer.tokenize(sequence)
ids = tokenizer.convert_tokens_to_ids(tokens)

input_ids = torch.tensor([ids])
print("Input IDs:", input_ids)

output = model(input_ids)
print("Logits:", output.logits)
Input IDs: tensor([[ 1045,  1005,  2310,  2042,  3403,  2005,  1037, 17662, 12172,  2607,
          2026,  2878,  2166,  1012]])
Logits: tensor([[-2.7276,  2.8789]], grad_fn=<AddmmBackward0>)

์œ„ ์ฝ”๋“œ์—์„œ๋Š” ์ž…๋ ฅ ์‹๋ณ„์ž(input IDs)์™€ ๊ทธ ๊ฒฐ๊ณผ ๋กœ์ง“(logit) ๊ฐ’์„ ์ถœ๋ ฅํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Batching ์ด๋ž€ ๋ชจ๋ธ์„ ํ†ตํ•ด ํ•œ๋ฒˆ์— ์—ฌ๋Ÿฌ ๋ฌธ์žฅ์„ ์ž…๋ ฅํ•˜๋Š” ๋™์ž‘์ž…๋‹ˆ๋‹ค. ๋ฌธ์žฅ์ด ํ•˜๋‚˜๋งŒ ์žˆ๋Š” ๊ฒฝ์šฐ ์•„๋ž˜์™€ ๊ฐ™์ด ๋‹จ์ผ ์‹œํ€€์Šค๋กœ ๋ฐฐ์น˜(batch) ๋ฅผ ๋นŒ๋“œํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

batch_ids = [ids, ids]

์ด๊ฒƒ์€ ๋™์ผํ•œ ๋‘ ์‹œํ€€์Šค๋กœ ๊ตฌ์„ฑ๋œ ๋ฐฐ์น˜(batch) ์ž…๋‹ˆ๋‹ค!

๋ฐฐ์น˜(batch) ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด์„œ ๋ชจ๋ธ์ด ์—ฌ๋Ÿฌ ๋ฌธ์žฅ์„ ๋™์‹œ์— ์ž…๋ ฅ๋ฐ›์„ ์ˆ˜ ์žˆ๋„๋ก ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์ค‘ ์‹œํ€€์Šค๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์€ ๋‹จ์ผ ์‹œํ€€์Šค๋กœ ๋ฐฐ์น˜(batch)๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ๋งŒํผ ๊ฐ„๋‹จํ•ฉ๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๋‘ ๋ฒˆ์งธ ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๊ฐœ(๋˜๋Š” ๊ทธ ์ด์ƒ) ๋ฌธ์žฅ์„ ํ•จ๊ป˜ ๋ฐฐ์น˜(batch) ์ฒ˜๋ฆฌํ•˜๋ ค๊ณ  ํ•  ๋•Œ ๊ฐ ๋ฌธ์žฅ์˜ ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด์ „์— ํ…์„œ(tensor)๋ฅผ ์‚ฌ์šฉํ•ด ๋ณธ ์ ์ด ์žˆ๋‹ค๋ฉด ํ•ญ์ƒ ๊ทธ ํ˜•ํƒœ๊ฐ€ ์ง์‚ฌ๊ฐํ˜• ๋ชจ์–‘์ด์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ณ  ์žˆ์„ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋Ÿด ๊ฒฝ์šฐ์—๋Š” ์ž…๋ ฅ ์‹๋ณ„์ž(input IDs) ๋ฆฌ์ŠคํŠธ๋ฅผ ํ…์„œ๋กœ ์ง์ ‘ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ผ๋ฐ˜์ ์œผ๋กœ ์ž…๋ ฅ์„ ์ฑ„์›๋‹ˆ๋‹ค(padding).

์ž…๋ ฅ์„ ํŒจ๋”ฉ(padding)ํ•˜๊ธฐ

๋‹ค์Œ ๋ฆฌ์ŠคํŠธ์˜ ๋ฆฌ์ŠคํŠธ(ํ˜น์€ ์ด์ค‘ ๋ฆฌ์ŠคํŠธ)๋Š” ํ…์„œ๋กœ ๋ณ€ํ™˜ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

batched_ids = [
    [200, 200, 200],
    [200, 200],
]

์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํŒจ๋”ฉ(padding) ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ…์„œ๋ฅผ ์ง์‚ฌ๊ฐํ˜• ๋ชจ์–‘์œผ๋กœ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ํŒจ๋”ฉ(padding)์€ ๊ธธ์ด๊ฐ€ ๋” ์งง์€ ๋ฌธ์žฅ์— ํŒจ๋”ฉ ํ† ํฐ(padding token) ์ด๋ผ๋Š” ํŠน์ˆ˜ ๋‹จ์–ด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋ชจ๋“  ๋ฌธ์žฅ์ด ๋™์ผํ•œ ๊ธธ์ด๋ฅผ ๊ฐ–๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, 10๊ฐœ์˜ ๋‹จ์–ด๋กœ ๊ตฌ์„ฑ๋œ 10๊ฐœ์˜ ๋ฌธ์žฅ๊ณผ 20๊ฐœ์˜ ๋‹จ์–ด๊ฐ€ ์žˆ๋Š” 1๊ฐœ์˜ ๋ฌธ์žฅ์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ํŒจ๋”ฉ(padding)์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋“  ๋ฌธ์žฅ์— 20๊ฐœ์˜ ๋‹จ์–ด๊ฐ€ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์œ„์˜ batched_ids๋ฅผ ํŒจ๋”ฉ(padding) ์ฒ˜๋ฆฌํ•˜๋ฉด ๊ฒฐ๊ณผ ํ…์„œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

padding_id = 100

batched_ids = [
    [200, 200, 200],
    [200, 200, padding_id],
]

ํŒจ๋”ฉ ํ† ํฐ(padding token)์˜ ์‹๋ณ„์ž(ID)๋Š” tokenizer.pad_token_id์— ์ง€์ •๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‘ ๊ฐœ์˜ ์‹œํ€€์Šค๋ฅผ ํ•œ๋ฒˆ์€ ๊ฐœ๋ณ„์ ์œผ๋กœ ๋˜ ํ•œ๋ฒˆ์€ ๋ฐฐ์น˜(batch) ํ˜•ํƒœ๋กœ ๋ชจ๋ธ์— ์ž…๋ ฅํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

sequence1_ids = [[200, 200, 200]]
sequence2_ids = [[200, 200]]
batched_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]

print(model(torch.tensor(sequence1_ids)).logits)
print(model(torch.tensor(sequence2_ids)).logits)
print(model(torch.tensor(batched_ids)).logits)
tensor([[ 1.5694, -1.3895]], grad_fn=<AddmmBackward0>)
tensor([[ 0.5803, -0.4125]], grad_fn=<AddmmBackward0>)
tensor([[ 1.5694, -1.3895],
        [ 1.3374, -1.2163]], grad_fn=<AddmmBackward0>)

๋ฐฐ์น˜ ์ฒ˜๋ฆฌ๋œ ์˜ˆ์ธก ๊ฒฐ๊ณผ์˜ ๋กœ์ง“(logits)์— ๋ฌธ์ œ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‘ ๋ฒˆ์งธ ํ–‰์€ ๋‘ ๋ฒˆ์งธ ๋ฌธ์žฅ์˜ ๋กœ์ง“(logits)๊ณผ ๊ฐ™์•„์•ผ ํ•˜์ง€๋งŒ ์™„์ „ํžˆ ๋‹ค๋ฅธ ๊ฐ’์„ ๊ฐ–์Šต๋‹ˆ๋‹ค!

์ด๋Š” ํŠธ๋žœ์Šคํฌ๋จธ(Transformer) ๋ชจ๋ธ์˜ ํ•ต์‹ฌ์ ์ธ ํŠน์ง•์ด ๊ฐ ํ† ํฐ์„ ์ปจํ…์ŠคํŠธํ™”(contextualize) ํ•˜๋Š” ์–ดํ…์…˜ ๋ ˆ์ด์–ด(attention layers)๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค๋Š” ์‚ฌ์‹ค์ด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์–ดํ…์…˜ ๋ ˆ์ด์–ด(attention layers)๋Š” ์‹œํ€€์Šค์˜ ๋ชจ๋“  ํ† ํฐ์— ์ฃผ์˜ ์ง‘์ค‘(paying attention)์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํŒจ๋”ฉ ํ† ํฐ๋„ ์—ญ์‹œ ๊ณ ๋ คํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ์— ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅธ ๊ฐœ๋ณ„ ๋ฌธ์žฅ๋“ค์„ ์ž…๋ ฅํ•  ๋•Œ๋‚˜ ๋™์ผํ•œ ๋ฌธ์žฅ์œผ๋กœ ๊ตฌ์„ฑ๋œ ํŒจ๋”ฉ์ด ์ ์šฉ๋œ ๋ฐฐ์น˜(batch)๋ฅผ ์ž…๋ ฅํ•  ๋•Œ ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด์„œ๋Š” ํ•ด๋‹น ์–ดํ…์…˜ ๋ ˆ์ด์–ด(attention layers)๊ฐ€ ํŒจ๋”ฉ ํ† ํฐ์„ ๋ฌด์‹œํ•˜๋„๋ก ์ง€์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์–ดํ…์…˜ ๋งˆ์Šคํฌ(attention mask)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์–ดํ…์…˜ ๋งˆ์Šคํฌ (attention masks)

์–ดํ…์…˜ ๋งˆ์Šคํฌ(attention mask)๋Š” 0๊ณผ 1๋กœ ์ฑ„์›Œ์ง„ ์ž…๋ ฅ ์‹๋ณ„์ž(input IDs) ํ…์„œ(tensor)์™€ ํ˜•ํƒœ๊ฐ€ ์ •ํ™•ํ•˜๊ฒŒ ๋™์ผํ•œ ํ…์„œ(tensor)์ž…๋‹ˆ๋‹ค. 1์€ ํ•ด๋‹น ํ† ํฐ์— ์ฃผ์˜๋ฅผ ๊ธฐ์šธ์—ฌ์•ผ ํ•จ์„ ๋‚˜ํƒ€๋‚ด๊ณ  0์€ ํ•ด๋‹น ํ† ํฐ์„ ๋ฌด์‹œํ•ด์•ผ ํ•จ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ์ฆ‰, ๋ชจ๋ธ์˜ ์–ดํ…์…˜ ๋ ˆ์ด์–ด(attention layers)์—์„œ ๋ฌด์‹œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

์–ดํ…์…˜ ๋งˆ์Šคํฌ(attention mask)๋กœ ์ด์ „ ์˜ˆ์ œ๋ฅผ ์™„์„ฑํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค:

batch_ids = [
    [200, 200, 200],
    [200, 200, tokenizer.pad_token_id],
]

attention_mask = [
    [1, 1, 1],
    [1, 1, 0],
]

outputs = model(torch.tensor(batch_ids), attention_mask=torch.tensor(attention_mask))
print(outputs.logits)
tensor([[ 1.5694, -1.3895],
        [ 0.5803, -0.4125]], grad_fn=<AddmmBackward0>)

์ด์ œ ๋ฐฐ์น˜(batch)์˜ ๋‘ ๋ฒˆ์งธ ๋ฌธ์žฅ์— ๋Œ€ํ•ด ๋™์ผํ•œ ๋กœ์ง“(logits) ๊ฐ’์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

๋‘ ๋ฒˆ์งธ ์‹œํ€€์Šค์˜ ๋งˆ์ง€๋ง‰ ๊ฐ’์ด ํŒจ๋”ฉ ์‹๋ณ„์ž(padding ID)์ด๊ณ  ์ด์— ํ•ด๋‹นํ•˜๋Š” ์–ดํ…์…˜ ๋งˆ์Šคํฌ(attention mask)์˜ ๊ฐ’์ด 0์ธ ์ ์„ ์ฃผ์˜ํ•˜์„ธ์š”.

๊ธธ์ด๊ฐ€ ๋” ๊ธด ์‹œํ€€์Šค๋“ค

ํŠธ๋žœ์Šคํฌ๋จธ(Transformer) ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•  ๋•Œ, ๋ชจ๋ธ์— ์ž…๋ ฅํ•  ์ˆ˜ ์žˆ๋Š” ์‹œํ€€์Šค์˜ ๊ธธ์ด์— ์ œํ•œ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ์€ ์ตœ๋Œ€ 512๊ฐœ ๋˜๋Š” 1024๊ฐœ์˜ ํ† ํฐ ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, ๊ทธ๋ณด๋‹ค ๋” ๊ธด ์‹œํ€€์Šค๋ฅผ ์ฒ˜๋ฆฌํ•˜๋ผ๋Š” ์š”์ฒญ์„ ๋ฐ›์œผ๋ฉด ์˜ค๋ฅ˜๋ฅผ ๋ฐœ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ์ด ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋‘ ๊ฐ€์ง€ ์†”๋ฃจ์…˜์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • ๊ธธ์ด๊ฐ€ ๋” ๊ธด ์‹œํ€€์Šค๋ฅผ ์ง€์›ํ•˜๋Š” ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์‹ญ์‹œ์˜ค.

  • ์‹œํ€€์Šค๋ฅผ ์ ˆ๋‹จํ•ฉ๋‹ˆ๋‹ค(truncation).

๋ชจ๋ธ ๋ณ„๋กœ ์ง€์›๋˜๋Š” ์‹œํ€€์Šค ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅด๋ฉฐ ์ผ๋ถ€ ๋ชจ๋ธ์€ ๋งค์šฐ ๊ธด ์‹œํ€€์Šค ์ฒ˜๋ฆฌ์— ํŠนํ™”๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. Longformer๊ฐ€ ํ•˜๋‚˜์˜ ์˜ˆ์ด๊ณ  ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” LED์ž…๋‹ˆ๋‹ค. ๋งค์šฐ ๊ธด ์‹œํ€€์Šค๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” ํƒœ์Šคํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ ํ•ด๋‹น ๋ชจ๋ธ์„ ์‚ดํŽด๋ณด๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

๊ทธ๋ ‡์ง€ ์•Š์œผ๋ฉด, max_sequence_length ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ง€์ •ํ•˜์—ฌ ์‹œํ€€์Šค๋ฅผ ์ ˆ๋‹จํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

max_sequence_length = 512

sequence = sequence[:max_sequence_length]

์ข‹์€ ์›นํŽ˜์ด์ง€ ์ฆ๊ฒจ์ฐพ๊ธฐ