How to (Ab)Use your KS-LE-B for LLM Models
So, you got one of these KS-LE-B and want to run some LLM models?
Smol short guide.
Grab the dependencies we need.
apt-get install pciutils build-essential cmake curl libcurl4-openssl-dev git ccache python3-pip python3.13-venv -y
Add a new user which will run the LLM models.
adduser llm
Logon
su llm
Grab llama.cpp
cd
git clone https://github.com/ggml-org/llama.cpp.git
Grab huggingface CLI
curl -LsSf https://hf.co/cli/install.sh | bash
export PATH="/home/llm/.local/bin:$PATH"
I have made a smol script to initial build / update llama.cpp: https://pastebin.com/raw/gKYBcXqc
wget -O update.sh https://pastebin.com/raw/gKYBcXq
chmod +x update.sh
bash update.sh
Lets download our first model.
hf download unsloth/GLM-4.7-Flash-GGUF --include "*Q4_K_M*" --local-dir models/
Either you can run llama.cpp on the CLI.
llama.cpp/llama-cli --jinja --model models/GLM-4.7-Flash-Q4_K_M.gguf
or use the Webinterface.
llama.cpp/llama-server --jinja --host 127.0.0.1 --port 8888 --models-dir models/
Including a model autoloader, which you can select in the webinterface.
Add a nginx reverse proxy and you set.

Comments
Its fast enough to chat but not blazing fast on CPU.

Free NAT KVM | Free NAT LXC
That's pretty cool !
TierHive - Hourly VPS - NAT Native - /24 per customer - Lab in the cloud - Free to try.
FREE tokens when you sign up, try before you buy. | Join us on Reddit
Before I forgot to mention this.
Try to get Q4 and higher, Q4 is a good balance.
The model mention above, needs 64GB if you have a KS-LE-B with less, try a smoler model.
To optimize performance / results always check the guide for the model.
e.g https://unsloth.ai/docs/models/glm-4.7-flash
Free NAT KVM | Free NAT LXC
Nice! I messed around with https://github.com/mudler/LocalAI on my OVH server a long time back I think it's time to try again. This one puts it in a docker container and you get an API which is cool
I run it bare metal for maximum performance, while running on CPU, everything counts.
I used OpenWebUI before, but ditched it for llama.cpp, same functionality without the cloud shit.
Free NAT KVM | Free NAT LXC
Oh you really take that much of a performance hit running within Docker? Do you know how many tokens per second you were getting? I can compare
I didn't bench, Container should not cause a big performance loss, however its gonna cost you a little bit.
I just run it on bare metal.
Free NAT KVM | Free NAT LXC