Model subclasses
Distilbert Custom class
- class src.training.subclasses.multilabel_distilbert_for_sequence_classification.DistilBertForMultiLabelClassification(config, loss='bce', loss_args=None)[source]
Bases:
DistilBertForSequenceClassificationCustom implementation for multilabel distilbert classification
- forward(input_ids=None, attention_mask=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]
The [DistilBertForSequenceClassification] forward method, overrides the __call__ special method.
<Tip>
Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.
</Tip>
- Args:
- input_ids (torch.LongTensor of shape (batch_size, sequence_length)):
Indices of input sequence tokens in the vocabulary.
Indices can be obtained using [AutoTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.
[What are input IDs?](../glossary#input-ids)
- attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional):
Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:
1 for tokens that are not masked,
0 for tokens that are masked.
[What are attention masks?](../glossary#attention-mask)
- head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional):
Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]:
1 indicates the head is not masked,
0 indicates the head is masked.
- inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional):
Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.
- output_attentions (bool, optional):
Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.
- output_hidden_states (bool, optional):
Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.
- return_dict (bool, optional):
Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.
- labels (torch.LongTensor of shape (batch_size,), optional):
Labels for computing the sequence classification/regression loss. Indices should be in [0, …, config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).
- Returns:
[transformers.modeling_outputs.SequenceClassifierOutput] or tuple(torch.FloatTensor): A [transformers.modeling_outputs.SequenceClassifierOutput] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([DistilBertConfig]) and inputs.
loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Classification (or regression if config.num_labels==1) loss.
logits (torch.FloatTensor of shape (batch_size, config.num_labels)) – Classification (or regression if config.num_labels==1) scores (before SoftMax).
hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).
Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.
attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).
Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.
Example of single-label classification:
```python >>> import torch >>> from transformers import AutoTokenizer, DistilBertForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") >>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> with torch.no_grad(): ... logits = model(**inputs).logits
>>> predicted_class_id = logits.argmax().item()
>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` >>> num_labels = len(model.config.id2label) >>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=num_labels)
>>> labels = torch.tensor([1]) >>> loss = model(**inputs, labels=labels).loss ```
Example of multi-label classification:
```python >>> import torch >>> from transformers import AutoTokenizer, DistilBertForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased") >>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", problem_type="multi_label_classification")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> with torch.no_grad(): ... logits = model(**inputs).logits
>>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5]
>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)` >>> num_labels = len(model.config.id2label) >>> model = DistilBertForSequenceClassification.from_pretrained( ... "distilbert-base-uncased", num_labels=num_labels, problem_type="multi_label_classification" ... )
>>> labels = torch.sum( ... torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1 ... ).to(torch.float) >>> loss = model(**inputs, labels=labels).loss ```
- mlb_losses = {'asl': <class 'src.training.losses.asymetric.AsymmetricLossOptimized'>, 'bce': <class 'torch.nn.modules.loss.BCEWithLogitsLoss'>}