Prompt for AI analysis of PB source codes

Everything else that doesn't fall into one of the other PB categories.
dige
Addict
Addict
Posts: 1396
Joined: Wed Apr 30, 2003 8:15 am
Location: Germany
Contact:

Prompt for AI analysis of PB source codes

Post by dige »

Hi guys,

I’ve been collecting a lot of PureBasic code examples and I’m currently thinking about how to better organize and retrieve them.

My idea:
- Send all code snippets to ChatGPT via API
- Automatically generate meta information (e.g. headline, description, keywords, PureBasic version, recognized algorithms, difficulty, etc.)
- Store everything in a database
- Access and search through it via a simple web interface

I’d like to get some input from you:
1. What information should the AI generate from the code to make it really useful?
2. What would be a good universal prompt for this task?
3. Does something like this already exist? (I don’t want to reinvent the wheel if there’s already a tool.)

Here’s my current prompt draft:

Code: Select all

Analyze the attached source code. Determine the programming language used, frameworks or libraries, implemented algorithms, data structures, core concepts, functional categories, file types, target platforms, and the estimated level of difficulty. Also extract, if available, title, short description, long description, version, date, author, license, and external sources from comments in the code. Additionally, determine the file size in bytes, the total number of lines, and the number of non-empty lines.

Return the result exclusively as a valid JSON object that matches exactly the following schema:
{
  "title": "string",
  "short_description": "string",
  "long_description": "string",
  "language": "string",
  "frameworks": ["string"],
  "algorithms": ["string"],
  "data_structures": ["string"],
  "concepts": ["string"],
  "categories": ["string"],
  "file_types": ["string"],
  "platforms": ["string"],
  "complexity": "Beginner | Intermediate | Expert",
  "tags": ["string"],
  "version": "string",
  "date": "string",
  "author": "string",
  "license": "string",
  "source_references": ["string"],
  "source_file_name": "string",
  "source_file_bytes": 0,
  "total_lines": 0,
  "non_empty_lines": 0
}
No explanations, no markdown, no extra fields — only the JSON output.
For example, here is an analysis result of Outlook_DragAndDrop.pbi (viewtopic.php?t=32234&hilit=Outlook_Dra ... p&start=75)

Code: Select all

{
"title": "Outlook_DragAndDrop.pbi",
"short_description": "PureBasic module enabling drag & drop of files and Outlook attachments into a gadget.",
"long_description": "Implements a user-friendly IDropTarget in PureBasic to accept both regular file drops (#CF_HDROP) and Outlook attachments via FILEDESCRIPTOR/FILECONTENTS, reading them through IStream/IStorage, writing to a target folder, and reporting results back via a custom window event. Includes an example with a ListView gadget and event handling loop.",
"language": "PureBasic",
"frameworks": ["Windows API (Win32)", "OLE/COM", "Microsoft Outlook Drag-and-Drop"],
"algorithms": ["Drag-and-drop event handling", "COM interface (IDropTarget) with reference counting", "Stream copying via IStream/IStorage", "File copy and buffering", "Clipboard format probing"],
"data_structures": ["_IDropTarget (vTable, refCount, flags, hWnd)", "FILEDESCRIPTOR", "FILEGROUPDESCRIPTOR", "STATSTG", "ODD_MAP (mapping gadget to target/events)"],
"concepts": ["COM/IDL interfaces", "VTable dispatch", "IDataObject/GetData", "Clipboard formats (#CF_HDROP, CFSTR_FILEDESCRIPTOR, CFSTR_FILECONTENTS)", "IStream/IStorage/ILockBytes", "GUIDs/IIDs", "Windows messaging & PostEvent", "Memory management (Allocate/FreeMemory)", "File I/O", "GUI gadgets"],
"categories": ["Desktop GUI", "Drag & Drop", "Outlook/Email integration", "File processing", "Windows interop", "Example/utility module"],
"file_types": ["pbi", "Email attachments (various)", "RFC822 email (reference)"],
"platforms": ["Windows", "Microsoft Outlook"],
"complexity": "Expert",
"tags": ["PureBasic", "IDropTarget", "IDataObject", "Outlook", "DragAndDrop", "Win32", "COM", "IStream", "Clipboard"],
"version": "1.00",
"date": "2025-03-24",
"author": "unknown (original by mesozorn; ideas by srod)",
"license": "",
"source_references": ["http://www.purebasic.fr/english/viewtopic.php?p=460210#p460210", "http://www.purebasic.fr/english/viewtopic.php?p=438986#p438986", "https://www.purebasic.fr/english/viewtopic.php?p=460437&sid=26be0bc2f4cf794347e2130dedd273b2#p460437"],
"source_file_name": "Outlook_DragAndDrop.pbi",
"source_file_bytes": 13557,
"total_lines": 493,
"non_empty_lines": 395
}
"Daddy, I'll run faster, then it is not so far..."
User avatar
Skipper
User
User
Posts: 44
Joined: Thu Dec 19, 2024 1:26 pm
Location: NW-Europe

Re: Prompt for AI analysis of PB source codes

Post by Skipper »

Hi Dige,

I like your idea a lot. I suggest adding a field 'depends on', as, in some cases, no 'framework' might be detectable. Also, I'd add a compiler version number if at all possible, as with all major version number updates, quite a bit has improved.

Unfortunately, I have no experience with prompting AI's, so I cannot help with that.

Cheers
Skipper
User avatar
idle
Always Here
Always Here
Posts: 5858
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Prompt for AI analysis of PB source codes

Post by idle »

It's a good idea though I would do it with a local llm
If you have it returning json it would be easy to add to dB
But you don't need to do embeddings to retrieve code as it's easy enough to do full text search, it's just that the index becomes larger than the data In that case like in my codedb example.

Having descriptions could go into vector embeddings which
Make it easier to find from nlp llm queries.
I've been pondering that myself at least how to cluster vectors so you don't have to do multiplication over the lot to find closest matches. Though you could just use an embedding dB if you don't care about bloat.
Post Reply