Trainers

MultilabelTrainer

class src.training.trainers.multilabel_trainer.MultilabelTrainer(**kwargs)[source]

Bases: Trainer

Trainer is a simple but feature-complete training and eval loop for PyTorch, optimized for 🤗 Transformers.

Args:
model ([PreTrainedModel] or torch.nn.Module, optional):

The model to train, evaluate or use for predictions. If not provided, a model_init must be passed.

<Tip>

[Trainer] is optimized to work with the [PreTrainedModel] provided by the library. You can still use your own models defined as torch.nn.Module as long as they work the same way as the 🤗 Transformers models.

</Tip>

args ([TrainingArguments], optional):

The arguments to tweak for training. Will default to a basic instance of [TrainingArguments] with the output_dir set to a directory named tmp_trainer in the current directory if not provided.

data_collator (DataCollator, optional):

The function to use to form a batch from a list of elements of train_dataset or eval_dataset. Will default to [default_data_collator] if no tokenizer is provided, an instance of [DataCollatorWithPadding] otherwise.

train_dataset (torch.utils.data.Dataset or torch.utils.data.IterableDataset, optional):

The dataset to use for training. If it is a [~datasets.Dataset], columns not accepted by the model.forward() method are automatically removed.

Note that if it’s a torch.utils.data.IterableDataset with some randomization and you are training in a distributed fashion, your iterable dataset should either use a internal attribute generator that is a torch.Generator for the randomization that must be identical on all processes (and the Trainer will manually set the seed of this generator at each epoch) or have a set_epoch() method that internally sets the seed of the RNGs used.

eval_dataset (Union[torch.utils.data.Dataset, Dict[str, torch.utils.data.Dataset]), optional):

The dataset to use for evaluation. If it is a [~datasets.Dataset], columns not accepted by the model.forward() method are automatically removed. If it is a dictionary, it will evaluate on each dataset prepending the dictionary key to the metric name.

tokenizer ([PreTrainedTokenizerBase], optional):

The tokenizer used to preprocess the data. If provided, will be used to automatically pad the inputs to the maximum length when batching inputs, and it will be saved along the model to make it easier to rerun an interrupted training or reuse the fine-tuned model.

model_init (Callable[[], PreTrainedModel], optional):

A function that instantiates the model to be used. If provided, each call to [~Trainer.train] will start from a new instance of the model as given by this function.

The function may have zero argument, or a single one containing the optuna/Ray Tune/SigOpt trial object, to be able to choose different architectures according to hyper parameters (such as layer count, sizes of inner layers, dropout probabilities etc).

compute_metrics (Callable[[EvalPrediction], Dict], optional):

The function that will be used to compute metrics at evaluation. Must take a [EvalPrediction] and return a dictionary string to metric values.

callbacks (List of [TrainerCallback], optional):

A list of callbacks to customize the training loop. Will add those to the list of default callbacks detailed in [here](callback).

If you want to remove one of the default callbacks used, use the [Trainer.remove_callback] method.

optimizers (Tuple[torch.optim.Optimizer, torch.optim.lr_scheduler.LambdaLR], optional): A tuple

containing the optimizer and the scheduler to use. Will default to an instance of [AdamW] on your model and a scheduler given by [get_linear_schedule_with_warmup] controlled by args.

preprocess_logits_for_metrics (Callable[[torch.Tensor, torch.Tensor], torch.Tensor], optional):

A function that preprocess the logits right before caching them at each evaluation step. Must take two tensors, the logits and the labels, and return the logits once processed as desired. The modifications made by this function will be reflected in the predictions received by compute_metrics.

Note that the labels (second parameter) will be None if the dataset does not have them.

Important attributes:

  • model – Always points to the core model. If using a transformers model, it will be a [PreTrainedModel] subclass.

  • model_wrapped – Always points to the most external model in case one or more other modules wrap the original model. This is the model that should be used for the forward pass. For example, under DeepSpeed, the inner model is wrapped in DeepSpeed and then again in torch.nn.DistributedDataParallel. If the inner model hasn’t been wrapped, then self.model_wrapped is the same as self.model.

  • is_model_parallel – Whether or not a model has been switched to a model parallel mode (different from data parallelism, this means some of the model layers are split on different GPUs).

  • place_model_on_device – Whether or not to automatically place the model on the device - it will be set to False if model parallel or deepspeed is used, or if the default TrainingArguments.place_model_on_device is overridden to return False .

  • is_in_train – Whether or not a model is currently running train (e.g. when evaluate is called while in train)

compute_loss(model, inputs, return_outputs=False)[source]

How the loss is computed by Trainer. By default, all models return the loss in the first element.

Subclass and override for custom behavior.

SetfitTrainer

class src.training.trainers.setfit.CustomSetFitTrainer(model: SetFitModel | None = None, train_dataset: Dataset | None = None, eval_dataset: Dataset | None = None, model_init: ~typing.Callable[[], SetFitModel] | None = None, metric: str | ~typing.Callable[[Dataset, Dataset], ~typing.Dict[str, float]] = 'accuracy', metric_kwargs: ~typing.Dict[str, ~typing.Any] | None = None, loss_class=<class 'sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss'>, num_iterations: int = 20, num_epochs: int = 1, learning_rate: float = 2e-05, batch_size: int = 16, seed: int = 42, column_mapping: ~typing.Dict[str, str] | None = None, use_amp: bool = False, warmup_proportion: float = 0.1, distance_metric: ~typing.Callable = <function BatchHardTripletLossDistanceFunction.cosine_distance>, margin: float = 0.25, samples_per_label: int = 2)[source]

Bases: SetFitTrainer

Trainer to train a SetFit model.

Args:
model (SetFitModel, optional):

The model to train. If not provided, a model_init must be passed.

train_dataset (Dataset):

The training dataset.

eval_dataset (Dataset, optional):

The evaluation dataset.

model_init (Callable[[], SetFitModel], optional):

A function that instantiates the model to be used. If provided, each call to [~SetFitTrainer.train] will start from a new instance of the model as given by this function when a trial is passed.

metric (str or Callable, optional, defaults to “accuracy”):

The metric to use for evaluation. If a string is provided, we treat it as the metric name and load it with default settings. If a callable is provided, it must take two arguments (y_pred, y_test).

metric_kwargs (Dict[str, Any], optional):

Keyword arguments passed to the evaluation function if metric is an evaluation string like “f1”. For example useful for providing an averaging strategy for computing f1 in a multi-label setting.

loss_class (nn.Module, optional, defaults to CosineSimilarityLoss):

The loss function to use for contrastive training.

num_iterations (int, optional, defaults to 20):

The number of iterations to generate sentence pairs for. This argument is ignored if triplet loss is used. It is only used in conjunction with CosineSimilarityLoss.

num_epochs (int, optional, defaults to 1):

The number of epochs to train the Sentence Transformer body for.

learning_rate (float, optional, defaults to 2e-5):

The learning rate to use for contrastive training.

batch_size (int, optional, defaults to 16):

The batch size to use for contrastive training.

seed (int, optional, defaults to 42):

Random seed that will be set at the beginning of training. To ensure reproducibility across runs, use the [~SetTrainer.model_init] function to instantiate the model if it has some randomly initialized parameters.

column_mapping (Dict[str, str], optional):

A mapping from the column names in the dataset to the column names expected by the model. The expected format is a dictionary with the following format: {“text_column_name”: “text”, “label_column_name: “label”}.

use_amp (bool, optional, defaults to False):

Use Automatic Mixed Precision (AMP). Only for Pytorch >= 1.6.0

warmup_proportion (float, optional, defaults to 0.1):

Proportion of the warmup in the total training steps. Must be greater than or equal to 0.0 and less than or equal to 1.0.

distance_metric (Callable, defaults to BatchHardTripletLossDistanceFunction.cosine_distance):

Function that returns a distance between two embeddings. It is set for the triplet loss and is ignored for CosineSimilarityLoss and SupConLoss.

margin (float, defaults to 0.25): Margin for the triplet loss.

Negative samples should be at least margin further apart from the anchor than the positive. This is ignored for CosineSimilarityLoss, BatchHardSoftMarginTripletLoss and SupConLoss.

samples_per_label (int, defaults to 2): Number of consecutive, random and unique samples drawn per label.

This is only relevant for triplet loss and ignored for CosineSimilarityLoss. Batch size should be a multiple of samples_per_label.