My Season IV Solution

After a while developing roleplaying models for the competition, I realized three most important ingredients for getting to a high position:

A right base model
Appropriate combination of datasets
A lot of luck

Base model

Even though Mistral-based models are dominating LLama2-based ones, I still feel they are not as good as some existing 13B models at RP, specifically **MythoMax-L2-13b.** I have tested some Mistral models but they usually ended up at around 77% thumbs_up rate, whereas MythoMax can achieve 80+ easily.

Datasets

My first principle for choosing dataset is quality. I would prefer GPT-generated due to its consistency. At first, I focused on RP datasets only but got many feedbacks complaining about the model not making sense or not answering questions. Therefore I decided to add one assistant dataset HH-RLHF, so my dataset combination is as follows:

SODA
Augmental
HH-RLHF

Let’s talk about the SODA dataset. It’s a huge dataset with 1M samples and of course we wouldn’t want to train on that large amount of data otherwise it will take forever. My cleaning step is simple: keep only conversations with more than 20 messages (10 turns). This results to a subset with around 18k samples, which is reasonable for finetuning. For the Augmental dataset, I use all samples since it has only around 8k conversations. For HH-RLHF, I take the helpful-chosen subset https://huggingface.co/datasets/QingyiSi/Alpaca-CoT/blob/main/hh-rlhf/helpful-online_chosen_context.json.

All datasets are formatted in Alpaca format for training.

Training

I finetune with BF16 LoRA just because 4bit and 8bit are slower. Axolotl is the way-to-go. Here is the config:

sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true

adapter: lora
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true

gradient_accumulation_steps: 8
micro_batch_size: 2
eval_batch_size: 2
num_epochs: 4
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.000001

Final thoughts

It can be seen that the metrics can vary greatly. I was lucky to get very strong scores for my top submission. Make sure to submit your models several times at different time of the day and monitor each submission closely. Good luck!