Install Llama on a GPU server
Busy testing the GPU servers per @crunchbits thread, jotted down some notes on how to get a fresh ubuntu server to talking llama model. Note this is on a 16gb GPU - if you're on a smaller one you'll need to change the q8_0 part to q4_0 or even q3_k
Also note that here I'm downloading a fp16 model and converting it to q8 GGUF. In practice you can skip over those steps and just download ready made quantized GGUF models from TheBloke's huggingface repo.. i.e. You'd modify the download model step to point to a quantized GGUF model and skip the generate and quantize step after that.
This assumes Ubuntu 22.04 - you may need to do stuff like install python3 if you're on a different distro
Check that we have a GPU
apt update && apt upgrade
apt install hwinfo -y
hwinfo --gfxcard --short
Set up nvidia driver and SDK
apt install nvidia-driver-535-server nvidia-dkms-535-server nvidia-cuda-toolkit -y
Grab llama.cpp and build it
git clone https://github.com/ggerganov/llama.cpp
apt install cmake -y
cmake .. -DLLAMA_CUBLAS=ON
cmake --build . --config Release
Download a model
mkdir -p /root/llama.cpp/models/llama2-fp16
python3 -m pip install huggingface_hub
from huggingface_hub import snapshot_download
Generate GGUF file
python3 -m pip install gguf sentencepiece
python3 convert.py ./models/llama2-fp16/
./quantize ../../models/llama2-fp16/ggml-model-f16.gguf ../../models/llama2-q8.gguf q8_0
./main -m ../../models/llama2-q8.gguf -ngl 99 --color -p "Tell me a story about a unicorn!"
Tell me a story about a unicorn!
Once upon a time, in a far-off land of rolling hills and sparkling streams, there lived a beautiful unicorn named Luna. She had a shimmering coat of silver and white, and her horn was as bright as the stars in the night sky.
Luna lived a peaceful life, roaming the forests and meadows, and making friends with all the creatures she met. She loved to play with the butterflies and dance with the flowers, and she could make the most beautiful music with her horn.
One day, a wicked witch cast a spell on the land, causing all the plants and animals to become sick and tired. The unicorns were especially affected, and their beautiful coats became dull and lifeless.
Luna knew that she had to do something to save her friends and the land they lived in. She set out on a journey to find the witch and break her spell.
As she traveled through the forest, Luna met many creatures who were suffering from the witch's spell. She used her horn to heal them and bring them back to life. She also met a brave knight who had been searching for the witch for many years. Together, they journeyed on, determined to defeat the wicked witch and bring peace back to the land.
Finally, after many days of traveling, they came to the witch's castle. It was a dark and gloomy place, surrounded by a moat of swirling black water. But Luna was not afraid. She knew that her horn could break any spell, no matter how powerful.
She and the knight entered the castle, ready to face whatever dangers lay inside. As they made their way deeper into the castle, they came across the witch herself. She was a terrifying sight, with warts and a crooked nose, and a cackle that sent chills down your spine.
But Luna was not afraid. She raised her horn and pointed it at the witch, ready to break the spell. The witch laughed and tried to stop her, but Luna's horn was too powerful. With one blast of magic, the spell was broken, and the land was once again filled with light and life.
The creatures who had been turned to stone were returned to their true forms, and they cheered and celebrated as Luna and the knight emerged from the castle. The witch was banished from the land forever, and peace was restored.
And Luna, the little unicorn with the powerful horn, lived happily ever after, knowing that she had saved her homeland from the evil witch's spell. The end.