Finetuning
Hygge LLM
Furniture store poetry
Role
Dataset, finetuning, testing
Website
huggingface.co/spaces/5to9/hygge_inference
01
Intro
Challenge: Brand Voice
As a copywriter, I have always been fascinated by the power of a consistent brand voice, that connects brands across multiple channels. The challenge of finetuning an efficient language model to capture a consistent voice, led me to "Hygge."
"Hygge" is a small, 8B-class language model built to run on small setups. My goal was to create a German language model focused on a consistent brand voice, where creativity was prioritized over rigid accuracy. The project was ambitious: build a model that can handle short prompts and generate longer content, while being cost- and energy-efficient.
02
Size
Why the heck a small model?
In an age of large-scale models from OpenAI and Google, opting for a smaller model like "Hygge" offers advantages. First, smaller models allow greater control over data, optimization, and inference behavior. Unlike larger, vendor-controlled platforms, "Hygge" runs on smaller hardware.
Moreover, the training process was fully under my control, allowing me to monitor, tweak, and improve the model at every stage. This level of granularity enabled me to focus on the linguistic details needed to capture a specific brand voice, rather than relying on general-purpose models that might miss nuances.
03
Inputs
Dataset and methodology
The dataset was purpose-built for this task: around 1,000 synthetic instruction-response pairs. Each example was carefully manually refined to simulate brand-specific dialogues. These pairs were short, about 300 tokens each, representing just two or three sentences of tight, brand-aligned communication.
I faced several challenges along the way. Training a model like this on a relatively small dataset increases the risk of overfitting. To avoid this, I worked through over 35 iterations of finetuning, adjusting the balance between enough training to solidify the tone and leaving room for the model to "think" creatively.
The base model, Meta's Llama3 8B, was selected for its general language abilities, but training it heavily on German brought up complications. While Llama and Mistral models are designed for multilingual use, focusing them solely on German can lead to imbalances. For this reason, I leveraged Disco Research's community-driven "DiscoLeo Instruct 8B", a Small Language Model finetuned solely on German datasets.
04
Let’s start
Finetuning and overfitting
One of the most challenging parts of the process was dealing with overfitting, especially given the small dataset size. Overfitting causes the model to lose flexibility, resulting in repetitive gibberish that may ignore prompts altogether. Finding the right balance between teaching the model enough about the tone of voice and preventing it from becoming stuck in a loop was quite a challenge.
Using tools like Weights & Biases, I was able to closely monitor the model's progress across iterations. This helped me adjust training parameters, allowing for more controlled improvements and early detection of issues.
05
Quant
Quantization. How low can you go?
With the core model performing good enough, I turned my focus to quantizing the model to reduce its size while maintaining quality. This is where GGUF and 4-bit techniques came into play. My goal was to make "Hygge" run on a consumer CPU, but quantization presented its own set of problems.
While many models handle 4-bit quantization well, "Hygge" saw a noticeable degradation in output quality. The smaller I made it, the less useful it became. This is a key area for future improvements, as it limits the model's real-world applicability on lower-powered hardware.
06
Results
Results and learnings
After numerous iterations, "Hygge" learned to generate content in a brand-consistent tone, imitating sentence structure and vocabulary of a fictional swedish furniture store. The model showed strong performance in making typical assumptions.
However, my experiments with quantization revealed limitations. Even though I had hoped to run "Hygge" on a basic CPU, the 4-bit quantization process significantly impacted the model's performance, making it impractical for this use case - at least for now.
Stay tuned!