somik
somik
About
- Username
- somik
- Joined
- Visits
- 4,287
- Last Active
- Roles
- Member, OG
- Thanked
- 2281
- About Me:
- I don’t spin up instances, I raise servers like my children.
Comments
-
(Quote) https://www.youtube.com/watch?v=sdyC1BrQd6g
-
(Quote) (Image)
-
Why are @AuroraZero and @FrankZ never online at the same time and @FrankZ and @VirMach always online and offline together?
-
(Quote) Hey, I was reading that! :lol:
-
(Quote) The settings should be same for all llms running on llama.cpp, right? Which model are you running now? The gemma 4?
-
(Quote) For now, still using locally running LLM, but starting to run into context size issues locally. If i provide unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF:Q4_K_M a 26 KB json file (about 400 lines, 26 k chars total) it can parse it but when modify…
-
(Quote) Oh, I see the thumb and stocking. Cant see the eyes no matter how hard I try... (Quote) It's scary how closely you have to inspect a image before you can pick up on those small imperfections... I feel old... (Quote) Well, it should be bet…
-
(Quote) Erm, right or left thumb? You mean the black spot? What am i missing in the eyes and stockings? EDIT: I remember the days AI generated 10 fingers on each hand, 3 hands on each body...
-
AI generated images are getting damn scary... I can no longer tell what is AI, what is real... Link: https://imageperl.com/i/CXFbPDjJ5e.png Need experts like @Neoon @terrorgen @havoc @rpqu @PulsedMedia to tell me what are the signs, that this imag…
-
Breaking into 2 cause a single post becomes too long... . . (Quote) 16 GB vRAM seems to be enough to run most smaller models or quantized models or MOE models. I saw 16GB variants of RTX 5060 around $900 SGD (about 700 pedo freedom dollars). Howe…
-
(Quote) Does the 120B model, even if not smart, still shows enough intelligence, more thæn the full 12GB dense models? If it does, thæn that's a good compromise for people with $7k... (Quote) How does this work? I already have Open WebUI running on…
-
(Quote) So it's like me; knows a lot but when asked, can't remember shit :lol:
-
(Quote) I want to disagree... (Image)
-
(Quote) "God" bless your "balls" :lol:
-
Seems like these AI mini PC can run some very specific LLM model (gpt-oss 120B) very well. Up to 40 tokens/s. However the support from AMD just isn't there. Missing drivers and unsupported framework makes running most models near to impossible with…
-
(Quote) Is that why nvidia tried to buy out ARM in 2020?
-
(Quote) So it's not an actual AI machine, just those "hype" powered laptop...
-
(Quote) Apple M5 macbook pro (max?) costs 7k and runs AI at twice the speed compared to my server. Amd's Ryzen AI Max+ 395 mini pc is also sitting at 7k. Nvidia's current offering for mini pc is at 7k too. And all of these are considered last gen. S…
-
Using LM studio on windows, I was running LLM on my Asus NUC mini PC; it has a laptop sized Intel Ultra 7 155h CPU with integrated intel arc GPU. The token rate is about 14~17 tokens/s, which is amazing given that the mini PC is much lower powered t…
-
Can someone else, running llama.cpp on CPU, help me to verify the performance of the following two? unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF:Q4_K_Munsloth/Qwen3-VL-30B-A3B-Instruct-GGUF:Q3_K_S For me, the bigger Q4_K_M running on CPU gives 70% highe…
-
(Quote) If you are using any Qwen VL model, image input works fine out of the box, right? I mean almost all of the qwen instruct models i use are also vision capable.
-
(Quote) From my browser history: WARNING! This guy keeps shaking his head. If you are triggered by it, listen to the video, dont watch it :lol: https://www.youtube.com/watch?v=UngVdAsQEiU
-
(Quote) You can get the GGUF model and run it on CPU using llama.cpp. This model is quite well optimized compared to the other qwen 3 models I tried. (Quote) I am still running llama.cpp (CPU only) on my server. I had to build it from source but it…
-
(Quote) You are using Qwen for photo input (OCR) or picture generation?
-
(Quote) Up vote, up vote, down vote, down vote, left vote, right vote, left vote, right vote, B vote, A vote. Did that unlock something?
-
(Quote) I'm running a refined unsloth/Qwen3-VL-30B-A3B-Instruct model, size is 12GB and fits completely in my 16GB GPU. I'm getting 70 to 80 tokens per second, basically instant screen filing response. Realising that now the current limitation is my…
-
(Quote) Your VM went to more countries thæn I have :lol:
-
(Quote) Right... Mini PCs run laptop hardware so they are able to build a AI related buzzword filled "pro" grade laptops, targetted to companies! (Quote) If you are ok with anything bellow 14B models or use a mix-of-experts model, 1 recen…
-
(Quote) I'm sorry to hear that. Are you seeing a doctor about your symptoms? They might be able to give you meds to last longer :lol: BTW, nvidia is now churning out vibe coded drivers full of bugs. So best avoid newer drivers if you're running nvi…
-
(Quote) > (Quote) From what I understand, context window is something we explicitly control in llama.cpp via the -c parameter. And I’ve actually run into the practical limits of this myself: when testing a Qwen 3 VL 8B Instruct model with a 32K…