OpenChat 3.5 LLM+Oobabooga issues

don_keedic · December 2023

Hey dudes.

Finally got around to trying some self-hosted type LLM stuff. Got Oobabooga installed. Downloaded the openchat_3.5.Q5_K_M.gguf model, loaded it up and I was thinking that's all I was needing to rock and roll. (like Stable-Fusion-WebUI) but it doesn't seem that's the case.

After loading up the model (successfully) - go to the "chat" portion, type in a prompt and get a completely random response that has nothing to do with what I've prompted for.

Is there another part I need to get configured or file I need to download for it to work like it does on https://openchat.team/?

Any help would be greatly appreciated!!

havoc · December 2023

Oh I know this...

In general, the flow is find the model on thebloke on hugging face, find the template section on model card and set that under ooba > parameters > Instruction template. Then go to Chat tab scroll down and select chat-instruct. If you don't tell it to chat-instruct then it doesn't factor in your template choice.

As for model. That one has a garbage template that frankly makes me wonder wtf they were smoking. Local model user is "GPT4 Correct User:" ?!?!?!

I'd just drop that model entirely. If you're looking for easy and good the dolphin mistrals in GGUF is where it is at. There is a 7B version, if you have a smaller GPU and there is a 8X7B if you have a bigger card (i.e. 24gb)

https://huggingface.co/TheBloke/dolphin-2.6-mixtral-8x7b-GGUF

Note the if you select above repo in ooba you have to put in the branch you want as well else its' gonna download a fk load of data - i.e. all quants. Has to be like so:

If you really need to do that correct GPT4 user bullshit...you'd need to add it as a manual edit in the folder of text-gen subfolder template. The correct answer though is to give the creators of that messed up template a bitchslap

don_keedic · December 2023

@havoc said: As for model. That one has a garbage template that frankly makes me wonder wtf they were smoking. Local model user is "GPT4 Correct User:" ?!?!?!

Hahaha figures.

I was trying to figure out where to start. From what I had read, this seemed to be the bees knees. I gave the demo a run and was pretty blown away. I've never used LLM's in any context before so I didn't have a baseline but I was expecting a whole lot less from something that costs nothing.

@havoc said: I'd just drop that model entirely. If you're looking for easy and good the dolphin mistrals in GGUF is where it is at. There is a 7B version, if you have a smaller GPU and there is a 8X7B if you have a bigger card (i.e. 24gb)

10-4. I'll get that downloading now. For some reason or another, my Web-UI deal gives me an error when trying to download.

    Traceback (most recent call last):

    File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\modules\ui_model_menu.py", line 243, in download_model_wrapper


    model, branch = downloader.sanitize_model_and_branch_names(repo_id, None)
    File "C:\Users\user\Desktop\oobabooga_windows\text-generation-webui\download-model.py", line 39, in sanitize_model_and_branch_names


    if model[-1] == '/':
    IndexError: string index out of range

But I'll snag that manually and load it on up.

So when I get this new model that you recommended..after getting the model loaded up, is this the thing I need to worry about?

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

havoc · December 2023

If its gguf format you can literally just download the file and drop it into the \models\ folder

That looks like ChatML which is indeed what most of the dolphin mistral variants are tuned on.

If you get a quan size that fits into your GPU's vram it should be really fast (you can check windows taskmanager to confirm whether you're out of GPU vram) and if the template matches and you selected chat instruct etc then it should be coherent and fast

don_keedic · December 2023

@havoc said:
If its gguf format you can literally just download the file and drop it into the \models\ folder

That looks like ChatML which is indeed what most of the dolphin mistral variants are tuned on.

If you get a quan size that fits into your GPU's vram it should be really fast (you can check windows taskmanager to confirm whether you're out of GPU vram) and if the template matches and you selected chat instruct etc then it should be coherent and fast

Perfect. That seemed to be the way to roll, so at least I didn't screw up and end up with a GGML right out the gate or something. Didn't realize how big that bitch was. You know of any other models that are maybe no more than 10-15 gigs? Not a ton of VRAM, but good amount of CPU horsepower and 64 gigs of ram. I'd at least like to dip my toes in the water here. Been storming around here on/off today and starlink has been spotty.

Thank you for your responses too by the way. Crazy how many things you have to know just to figure out where to start lol.

OpenChat 3.5 LLM+Oobabooga issues

Comments