GPT4All - The better AI chatbot?

Bitblazer · Post by **Bitblazer** » Sun Jun 04, 2023 5:32 pm

I just want to spread the knowledge about GPT4All because i think that is a project that deserves help and is also a much better AI bot then the other projects that currently are so much hyped by different companies.

This one seems to be fully local (no internet required) and open source. It may not beat the commercial ones today, but let's make sure it will be the better solution within a year

Here is a german article about it and a german video.

Check it out, use it, create products, have fun, get rich or just informed

It is open source, does not require any internet connection and is using the MIT license.

ps: i wonder if this could be the base of the ultimate PureBasic help bot

who has the time to train it with a few thousand sources and websites?

BarryG · Post by **BarryG** » Sun Jun 04, 2023 10:09 pm

Interesting. Thanks! Have downloaded it to play later.

[Edit] Not using it; didn't realise it needed to download 3 GB+ of data.

Post by **Fred** » Mon Jun 05, 2023 5:53 am

Is it possible to get a clean bot and feed it the PB doc + forums ?

Post by **idle** » Mon Jun 05, 2023 6:10 am

Thanks for sharing that. I think it deserves some attention and study
I've built it and am testing it but it's crashing in the prompt with PB I will upload it before I go this evening

this is what I'm getting before with api

gptj_model_load: loading model from 'C:\Users\idle\AppData\Local\nomic.ai\GPT4All\ggml-gpt4all-j-v1.3-groovy.bin' - please wait ...
gptj_model_load: n_vocab = 50400
gptj_model_load: n_ctx = 2048
gptj_model_load: n_embd = 4096
gptj_model_load: n_head = 16
gptj_model_load: n_layer = 28
gptj_model_load: n_rot = 64
gptj_model_load: f16 = 2
gptj_model_load: ggml ctx size = 5401.45 MB
gptj_model_load: kv self size = 896.00 MB
gptj_model_load: ................................... done
gptj_model_load: model size = 3609.38 MB / num tensors = 285
GGML_ASSERT: C:\Users\idle\gpt4all\gpt4all-backend\llama.cpp-230511\ggml.c:5586: !ggml_is_transposed(a)

If you want to build it on windows its easy but you need git, mingw and cmake
Then in power shell

Code: Select all

git clone --recurse-submodules https://github.com/nomic-ai/gpt4all
cd gpt4all/gpt4all-backend/
mkdir build
cd build
cmake -G "MinGW Makefiles"  ..
cmake --build . --parallel

Sure that took me half a day to work out and then it crashed so who knows?

and the import in the zip

Code: Select all

Structure llmodel_error 
    *message;            // Human readable error description; Thread-local; guaranteed to survive until next llmodel C API call
    code.l;             // errno; 0 if none
EndStructure ;

Structure arFloat 
  e.f[0] 
EndStructure 
Structure arLong 
  e.l[0] 
EndStructure   

Structure llmodel_prompt_context
    *logits.arfloat;      // logits of current context
    logits_size.i;        // the size of the raw logits vector
    *tokens.arlong;       // current tokens in the context window
    tokens_size.i;        // the size of the raw tokens vector
    n_past.l;             // number of tokens in past conversation
    n_ctx.l;              // number of tokens possible in context window
    n_predict.l;          // number of tokens to predict
    top_k.l;              // top k logits to sample from
    top_p.f;              // nucleus sampling probability threshold
    temp.f;               // temperature to adjust model's output distribution
    n_batch.l;            // number of predictions to generate in parallel
    repeat_penalty.f;     // penalty factor for repeated tokens
    repeat_last_n.f;      // last n tokens to penalize
    context_erase.f;      // percent of context to erase if we exceed the context window
EndStructure 

PrototypeC llmodel_prompt_callback(token_id.l);
PrototypeC llmodel_response_callback(token_id.l,*response); 
PrototypeC llmodel_recalculate_callback(is_recalculating.l)      ;

ImportC  "libllmodel.dll.a" 
  llmodel_model_create(model_path.p-utf8);
  llmodel_model_create2(model_path.p-utf8,build_variant.p-utf8,*error.llmodel_error); 
  llmodel_model_destroy(model.i)                                                      ;
  llmodel_loadModel(model.i,model_path.p-utf8)                                       ;
  llmodel_isModelLoaded(model.i)                                                      ;
  llmodel_get_state_size(model.i)                                                   ;
  llmodel_save_state_data(model.i,*dest.Ascii)                                      ;
  llmodel_restore_state_data(model.i,*src.Ascii)                                      ;
  llmodel_prompt(model,prompt.p-utf8,*prompt_callback,*response_callback,*recalculate_callback,*ctx.llmodel_prompt_context); *prompt.p-utf8
  llmodel_setThreadCount(model,n_threads.l)                                                                                 ;
  llmodel_threadCount.l(model.i)                                                                                              ;
  llmodel_set_implementation_search_path(path.p-utf8)                                                                        ;
  llmodel_get_implementation_search_path()                                                                                    ; return string peeks 
EndImport 

Global ctx.llmodel_prompt_context
Global err.llmodel_error

OpenConsole()  

ProcedureC CBResponse(token_id.l,*response);   
  
  PrintN("CB Resposne") 
    
  ProcedureReturn #True 
  
EndProcedure 

ProcedureC CBPrompt(token.l) 
  
  PrintN("CB Prompt") 
  
  ProcedureReturn #True 
  
EndProcedure   

ProcedureC CBRecalc(is_recalculating.l) 
  
  PrintN("Is Calc") 
   
  ProcedureReturn #True 
  
EndProcedure   

path.s = "C:\Users\idle\AppData\Local\nomic.ai\GPT4All\ggml-gpt4all-j-v1.3-groovy.bin"
model =  llmodel_model_create2(path,"avxonly",*err);
If model  
  llmodel_setThreadCount(model,12)    
  If llmodel_loadModel(model,path)
     prompt.s = "hello"
     PrintN(prompt) 
     x = llmodel_prompt(model,prompt,@CBPrompt(),@CBResponse(),@CBRecalc(),@ctx); 
     PrintN("after prompt")  
     Input() 
     llmodel_model_destroy(model) 
  EndIf 
EndIf

Here's the binary build extract contents to the bin folder "C:\Users\idle\gpt4all\gpt4all-backend\build\bin
I also installed gpt4all in c:\users\idle\gtp4all

https://dnscope.io/idlefiles/gpt4all_pb.zip

Bitblazer · Post by **Bitblazer** » Mon Jun 05, 2023 8:06 am

I didn't try to build it, just installed the software + the snoozy set and am playing with it. What i miss yet, is a ARM based version to use it on android smartphones, so i asked the bot itself, but that was a royal fail

You should be able to train the AI with purebasic content to teach it the language, but i haven't started that yet. Because i would prefer the bot to learn the language based on current modern elements like modules and not teach it based on old examples that sometimes have annoying problems. So i guess what i need to do first, is to grab a collection of clean modern sources that always use "enableexplicit" and know modules.

The first basic training input (to me) is obviously the purebasic manual.

I don't think there is a need to rebuild the source of the ai and i expect the source to change a lot in the near future anyway.

We should really define the rules for teaching it (always enableexplicit would be my #1 rule) and then collect sources to teach the ai. My #2 rule would be the use of modules, but i don't know if it would be able to teach the ai to use modules where it makes sense.

Fred wrote: Mon Jun 05, 2023 5:53 am Is it possible to get a clean bot and feed it the PB doc + forums ?

Yes. But how clean do you want it? I don't know yet if you can use multiple models at once. So i currently use the biggest freely usable dataset and try to extend it with purebasic knowledge.

Time to ask the AI

Maybe a clean new purebasic only model would make sense.

Another downside of just putting the forum into the ai, is that the forum contains a lot buggy purebasic examples - where people ask why something does not work

Post by **Fred** » Mon Jun 05, 2023 9:15 am

We should feed only Trick and Tips forums IMHO, as it should be mostly OK. The more data the better IIRC, so the manual shouldn't be enough. We can also feed the Andre's examples and official examples as well if needed. Of course, all examples should be compiled before with 6.02 to be sure it works.

BTW, by clean I mean without tainted by other languages like C#, Python etc. I know nothing how it works so feel free to correct me if it doesn't work like that

Post by **idle** » Mon Jun 05, 2023 9:22 am

Fred wrote: Mon Jun 05, 2023 9:15 am We should feed only Trick and Tips forums IMHO, as it should be mostly OK. The more data the better IIRC, so the manual shouldn't be enough. We can also feed the Andre's examples and official examples as well if needed. Of course, all examples should be compiled before with 6.02 to be sure it works.

first thing would be to find a model that's specifically trained in c, then you can train it with the PB c backend and the corresponding PB input. then it will know how to generate PB from C

I updated the build instructions in my previous post and also posted a bug report with gpt4all.

Bitblazer · Post by **Bitblazer** » Mon Jun 05, 2023 9:23 am

In the past, i have seen some pretty awesome bots in ircII coding chats, that where able to answer little questions with code snippets (a lot like the examples in the purebasic manual) and even deliver full large sources as examples. A combination of Discord bot maker and a remote GPT4all installation running a purebasic model and sitting on discord / whatsapp / the web would be a really nice long term project.

Post by **Fred** » Mon Jun 05, 2023 9:25 am

idle wrote: Mon Jun 05, 2023 9:22 am
Fred wrote: Mon Jun 05, 2023 9:15 am We should feed only Trick and Tips forums IMHO, as it should be mostly OK. The more data the better IIRC, so the manual shouldn't be enough. We can also feed the Andre's examples and official examples as well if needed. Of course, all examples should be compiled before with 6.02 to be sure it works.
first thing would be to find a model that's specifically trained in c, then you can train it with the PB c backend and the corresponding PB input. then it will know how to generate PB from C

Ho, I didn't though it worked backward like that, but if it does the job, why not.

Bitblazer · Post by **Bitblazer** » Mon Jun 05, 2023 9:29 am

Fred wrote: Mon Jun 05, 2023 9:15 am We should feed only Trick and Tips forums IMHO, as it should be mostly OK. The more data the better IIRC, so the manual shouldn't be enough. We can also feed the Andre's examples and official examples as well if needed. Of course, all examples should be compiled before with 6.02 to be sure it works.

That should be a nice start, good idea. Mid-/longterm i would build the model based on a pool of PB sources that follow the rules like

#1 - use enablaexplicit
#2 - organise (bigger) sources in modules - where it makes sense

Post by **Fred** » Mon Jun 05, 2023 9:32 am

Well, both module and enableexplicit are optionals so I don't see the point to feed the bot only with this. I personally don't use enableexplicit and few use of modules, so I guess it's a matter of taste or habit as you want to call it

Bitblazer · Post by **Bitblazer** » Mon Jun 05, 2023 9:36 am

Fred wrote: Mon Jun 05, 2023 9:32 am I personally don't use enableexplicit and few use of modules, so I guess it's a matter of taste or habit as you want to call it

True, i come from the C->C++->C# side and i really liked the evolution into c#, so thats why i like those elements

NicTheQuick · Post by **NicTheQuick** » Mon Jun 05, 2023 9:56 am

Sounds interesting. But isn't that model so big you need literally hundreds of big graphics cards and a few weeks to train it, resulting in a big electricity bill?

Or how do you want to casually train it?

Correct me if I'm wrong with that.

Bitblazer · Post by **Bitblazer** » Mon Jun 05, 2023 10:22 am

So the rules for all sources the be accepted into the training set would be

has to compile with the latest purebasic version
has to compile with the ASM and C backend
has to compile / work with all available library subsystems (DX9 / DX11 on windows, gtk2 / ql on linux)

Did i miss anything?

Post by **idle** » Mon Jun 05, 2023 10:26 am

NicTheQuick wrote: Mon Jun 05, 2023 9:56 am Sounds interesting. But isn't that model so big you need literally hundreds of big graphics cards and a few weeks to train it, resulting in a big electricity bill? Or how do you want to casually train it?

Correct me if I'm wrong with that.

No there are 100's of models to play with and you can undoubtedly take one and add to it.

Going back 20 years or so (beyond statute or limitations I hope), I once had the task of doing a pre sale presentation of a speech recognition suite to the board of directors and partners of a large law firm, there were about 30 people in the meeting and I gave them a stunning show. I never told the boss what I did or how I did it, but me being technically savvy nuked the user data model and trained it on the one and only document I was going to recite, so that was effectively all it knew, I then turned up to meeting suitably hungover and strutted around the conference room with a wireless mic parroting off my speech that was appearing on the projections screen 100% correct and even with the flourishes of Scottish accent and prose, it still got it right, it was amazing! They of course signed up, I got a nice bonus and the boss made a tidy profit.

PureBasic Forums - English

GPT4All - The better AI chatbot?

GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?

Re: GPT4All - The better AI chatbot?