Model subclasses

Distilbert Custom class

class src.training.subclasses.multilabel_distilbert_for_sequence_classification.DistilBertForMultiLabelClassification(config, loss='bce', loss_args=None)[source]

Bases: DistilBertForSequenceClassification

Custom implementation for multilabel distilbert classification

forward(input_ids=None, attention_mask=None, head_mask=None, inputs_embeds=None, labels=None, output_attentions=None, output_hidden_states=None, return_dict=None)[source]

The [DistilBertForSequenceClassification] forward method, overrides the __call__ special method.

<Tip>

Although the recipe for forward pass needs to be defined within this function, one should call the [Module] instance afterwards instead of this since the former takes care of running the pre and post processing steps while the latter silently ignores them.

</Tip>

Args:
input_ids (torch.LongTensor of shape (batch_size, sequence_length)):

Indices of input sequence tokens in the vocabulary.

Indices can be obtained using [AutoTokenizer]. See [PreTrainedTokenizer.encode] and [PreTrainedTokenizer.__call__] for details.

[What are input IDs?](../glossary#input-ids)

attention_mask (torch.FloatTensor of shape (batch_size, sequence_length), optional):

Mask to avoid performing attention on padding token indices. Mask values selected in [0, 1]:

  • 1 for tokens that are not masked,

  • 0 for tokens that are masked.

[What are attention masks?](../glossary#attention-mask)

head_mask (torch.FloatTensor of shape (num_heads,) or (num_layers, num_heads), optional):

Mask to nullify selected heads of the self-attention modules. Mask values selected in [0, 1]:

  • 1 indicates the head is not masked,

  • 0 indicates the head is masked.

inputs_embeds (torch.FloatTensor of shape (batch_size, sequence_length, hidden_size), optional):

Optionally, instead of passing input_ids you can choose to directly pass an embedded representation. This is useful if you want more control over how to convert input_ids indices into associated vectors than the model’s internal embedding lookup matrix.

output_attentions (bool, optional):

Whether or not to return the attentions tensors of all attention layers. See attentions under returned tensors for more detail.

output_hidden_states (bool, optional):

Whether or not to return the hidden states of all layers. See hidden_states under returned tensors for more detail.

return_dict (bool, optional):

Whether or not to return a [~utils.ModelOutput] instead of a plain tuple.

labels (torch.LongTensor of shape (batch_size,), optional):

Labels for computing the sequence classification/regression loss. Indices should be in [0, …, config.num_labels - 1]. If config.num_labels == 1 a regression loss is computed (Mean-Square loss), If config.num_labels > 1 a classification loss is computed (Cross-Entropy).

Returns:

[transformers.modeling_outputs.SequenceClassifierOutput] or tuple(torch.FloatTensor): A [transformers.modeling_outputs.SequenceClassifierOutput] or a tuple of torch.FloatTensor (if return_dict=False is passed or when config.return_dict=False) comprising various elements depending on the configuration ([DistilBertConfig]) and inputs.

  • loss (torch.FloatTensor of shape (1,), optional, returned when labels is provided) – Classification (or regression if config.num_labels==1) loss.

  • logits (torch.FloatTensor of shape (batch_size, config.num_labels)) – Classification (or regression if config.num_labels==1) scores (before SoftMax).

  • hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings, if the model has an embedding layer, + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

    Hidden-states of the model at the output of each layer plus the optional initial embedding outputs.

  • attentions (tuple(torch.FloatTensor), optional, returned when output_attentions=True is passed or when config.output_attentions=True) – Tuple of torch.FloatTensor (one for each layer) of shape (batch_size, num_heads, sequence_length, sequence_length).

    Attentions weights after the attention softmax, used to compute the weighted average in the self-attention heads.

Example of single-label classification:

```python >>> import torch >>> from transformers import AutoTokenizer, DistilBertForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> with torch.no_grad():
...     logits = model(**inputs).logits
>>> predicted_class_id = logits.argmax().item()
>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", num_labels=num_labels)
>>> labels = torch.tensor([1])
>>> loss = model(**inputs, labels=labels).loss
```

Example of multi-label classification:

```python >>> import torch >>> from transformers import AutoTokenizer, DistilBertForSequenceClassification

>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased", problem_type="multi_label_classification")
>>> inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
>>> with torch.no_grad():
...     logits = model(**inputs).logits
>>> predicted_class_ids = torch.arange(0, logits.shape[-1])[torch.sigmoid(logits).squeeze(dim=0) > 0.5]
>>> # To train a model on `num_labels` classes, you can pass `num_labels=num_labels` to `.from_pretrained(...)`
>>> num_labels = len(model.config.id2label)
>>> model = DistilBertForSequenceClassification.from_pretrained(
...     "distilbert-base-uncased", num_labels=num_labels, problem_type="multi_label_classification"
... )
>>> labels = torch.sum(
...     torch.nn.functional.one_hot(predicted_class_ids[None, :].clone(), num_classes=num_labels), dim=1
... ).to(torch.float)
>>> loss = model(**inputs, labels=labels).loss
```
mlb_losses = {'asl': <class 'src.training.losses.asymetric.AsymmetricLossOptimized'>, 'bce': <class 'torch.nn.modules.loss.BCEWithLogitsLoss'>}