Can a structure be defined dynamically by csv headers

Just starting out? Need help? Post your questions and find answers here.
craig7
User
User
Posts: 35
Joined: Thu Sep 23, 2021 2:49 pm

Re: Can a structure be defined dynamically by csv headers

Post by craig7 »

Thanks all for the help, it is great to get insights when you are just starting on a language. My mind is a bit blown by how much code has to be written in PureBasic to do this, because in R or Python it is a one liner. For example in Python Panda:

df2 = pd.read_csv(filepath,encoding = "latin1")

In R reading in a CSV to a dataframe is in the base language, so even quicker than Python.
tored
User
User
Posts: 86
Joined: Wed Feb 16, 2022 12:47 pm
Location: Sweden

Re: Can a structure be defined dynamically by csv headers

Post by tored »

craig7 wrote: Thu Mar 07, 2024 11:44 am Thanks all for the help, it is great to get insights when you are just starting on a language. My mind is a bit blown by how much code has to be written in PureBasic to do this, because in R or Python it is a one liner. For example in Python Panda:

df2 = pd.read_csv(filepath,encoding = "latin1")

In R reading in a CSV to a dataframe is in the base language, so even quicker than Python.
That is because those languages have CSV support in their respective standard libraries where PureBasic has not. And also Python and R are both interpreted languages, thus have virtual machine and a garbage collector, where PureBasic is a compiled language without a garbage collector, which naturally leads to more code.

Another option is to use an existing C library and use that together with the C backend.

However there is an argument to be made that CSV support ought be included in the PureBasic standard library, but the maintainers are pretty much occupied with more important tasks.
tored
User
User
Posts: 86
Joined: Wed Feb 16, 2022 12:47 pm
Location: Sweden

Re: Can a structure be defined dynamically by csv headers

Post by tored »

infratec wrote: Wed Mar 06, 2024 9:50 pm A quick implementation with an user function:
Nice. You can simplify it even more if you have the CSV file on disk by letting sqlite import it after you created the table.

Code: Select all

.mode csv
.import myfile.csv mytable
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: Can a structure be defined dynamically by csv headers

Post by Tenaja »

My mind is a bit blown by how much code has to be written in PureBasic to do this
That is because those languages have CSV support in their respective standard libraries where PureBasic has not.
I agree... CSV support would be nice.

Keep in mind that there are many libraries native to pb that do not exist in python or r natively. You just happened to stumble across one that is absent here. But should you try to convert every pb file into another language, you would find similar deficiencies that pb has supported for many many years.

One nice thing about pb is the breadth of user-written libraries, as well as the helpful & friendly community! Sure, those exist elsewhere, but the large languages also come with a lot of chaff... More, in my experience, unfortunately.
infratec
Always Here
Always Here
Posts: 7619
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Can a structure be defined dynamically by csv headers

Post by infratec »

tored wrote: Thu Mar 07, 2024 3:17 pm Nice. You can simplify it even more if you have the CSV file on disk by letting sqlite import it after you created the table.

Code: Select all

.mode csv
.import myfile.csv mytable

This is only possible with the command line tool.
So you have to use RunProgram()
craig7
User
User
Posts: 35
Joined: Thu Sep 23, 2021 2:49 pm

Re: Can a structure be defined dynamically by csv headers

Post by craig7 »

Tenaja wrote: Thu Mar 07, 2024 4:32 pm
My mind is a bit blown by how much code has to be written in PureBasic to do this
That is because those languages have CSV support in their respective standard libraries where PureBasic has not.
I agree... CSV support would be nice.

Keep in mind that there are many libraries native to pb that do not exist in python or r natively. You just happened to stumble across one that is absent here. But should you try to convert every pb file into another language, you would find similar deficiencies that pb has supported for many many years.

One nice thing about pb is the breadth of user-written libraries, as well as the helpful & friendly community! Sure, those exist elsewhere, but the large languages also come with a lot of chaff... More, in my experience, unfortunately.
Yes Python library distribution sites like pypi are a bit like the wild west. Only widely used packages like Pandas are well maintained and documented. R is different, in that Cran curates the libraries / packages to a high standard. However, unless you use virtual environments, OR stick to base language which is very stable, your code will break every few months due to changes in the packages when they update... especially if has anything to do with the tidyverse, where the lead dev. once stated: "I prefer to have a more utopian approach where I can be quite aggressive about making backward-incompatible changes while trying to figure out a better API."
It is nice the PB has so many built in libraries, and that everything is in one exe, as long as the libraries respect backwards compatibility.
User avatar
mk-soft
Always Here
Always Here
Posts: 6248
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Can a structure be defined dynamically by csv headers

Post by mk-soft »

Nice function ...

Added error text if csv-file not found :wink:

Code: Select all

EnableExplicit

Structure IndexArray
  *Index[0]
EndStructure


PrototypeC.l Prototype_CallBack(*pCtx, argc.l, *argv.IndexArray, *colnames.IndexArray)

ImportC "sqlite3.lib" 
  sqlite3_create_function16(DatabaseID, zFunctionName.s, nArg.l, eTextRep.l, *pApp, *xFunc, *xStep, *xFinal)
  sqlite3_value_text16(*sqlite3_value)
  sqlite3_context_db_handle.i(*context)
  sqlite3_exec(*db, sql.p-utf8, callback.Prototype_CallBack, *firstArg, *errmsg)
  sqlite3_result_error16(*context, msg.s, bytes)
EndImport

#SQLITE_UTF16 = 4




ProcedureC createTableFromCSV(*context, argc.i, *argv.IndexArray)
  
  Protected Filename$, File.i, Line$, FieldCount.i, Table$, SQL$, Field$, *DB, i.i
  
  
  Filename$ = PeekS(sqlite3_value_text16(*argv\Index[0]))
  File = ReadFile(#PB_Any, Filename$)
  If File
    Table$ = GetFilePart(Filename$, #PB_FileSystem_NoExtension)
    
    Line$ = ReadString(File)
    ;Debug Line$
    
    SQL$ = "CREATE TABLE " + Table$ + "("
    
    FieldCount = 1
    Repeat
      Field$ = StringField(Line$, FieldCount, ",")
      If Field$ <> ""
        SQL$ + Trim(Field$) + " TEXT,"
      EndIf
      FieldCount + 1
    Until Field$ = ""
    SQL$ = RTrim(SQL$, ",") + ")"
    ;Debug SQL$
    
    *DB = sqlite3_context_db_handle(*context)
    
    sqlite3_exec(*DB, SQL$, #Null, #Null, #Null)
    
    While Not Eof(File)
      Line$ = ReadString(File)
      SQL$ = "INSERT INTO " + Table$ + " VALUES ("
      
      For i = 1 To FieldCount
        Field$ = StringField(Line$, i, ",")
        SQL$ + Trim(Field$) + ","
      Next i
      SQL$ = RTrim(SQL$, ",") + ")"
      ;Debug SQL$
      
      sqlite3_exec(*DB, SQL$, #Null, #Null, #Null)
    Wend
    
    CloseFile(File)
  Else
    sqlite3_result_error16(*context, "Error: CSV-File Not Found!", -1)
  EndIf
  
EndProcedure


UseSQLiteDatabase()


If OpenDatabase(0, ":memory:", "", "")
  
  sqlite3_create_function16(DatabaseID(0), "createTableFromCSV",  1, #SQLITE_UTF16, #Null, @createTableFromCSV(), #Null, #Null)
  
  If Not DatabaseUpdate(0, "SELECT createTableFromCSV('c:\tmp\PureBasic\csvtest\Test.csv')")
    Debug DatabaseError()
  Else
    If DatabaseQuery(0, "SELECT * FROM Test")
      While NextDatabaseRow(0)
        Debug GetDatabaseString(0, 0) + " " + GetDatabaseString(0, 1) + " " + GetDatabaseString(0, 2)
      Wend
      FinishDatabaseQuery(0)
    Else
      Debug DatabaseError()
    EndIf
  EndIf
  CloseDatabase(0)
EndIf
[code-pb]
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
ChrisR
Addict
Addict
Posts: 1466
Joined: Sun Jan 08, 2017 10:27 pm
Location: France

Re: Can a structure be defined dynamically by csv headers

Post by ChrisR »

Nicely done with sqlite and probably the most appropriate way for csv

Another possible way without using sqllite:

Code: Select all

CompilerIf #PB_Compiler_IsMainFile
  EnableExplicit
CompilerEndIf

Structure MyFieldTable
  Array Field$(0)
EndStructure
Global NewList MyTable.MyFieldTable()

; mk-soft SplitString to List or Array with option double-quotes (CSV) (Array here): https://www.purebasic.fr/english/viewtopic.php?t=69557
XIncludeFile "SplitString.pbi" 

CompilerIf #PB_Compiler_IsMainFile
  ; GetArtistNameFromID is only for the Test example below
  Procedure.s GetArtistNameFromID(ID)
    Protected ArtistName$
    PushListPosition(MyTable())
    ForEach MyTable()
      If MyTable()\Field$(0) = Str(ID)
        ArtistName$ = MyTable()\Field$(1)
        Break
      EndIf
    Next
    PopListPosition(MyTable())
    ProcedureReturn ArtistName$
  EndProcedure
  
  ; GetBioFromID is only for the Test example below
  Procedure.s GetBioFromID(ID)
    Protected Bio$
    PushListPosition(MyTable())
    ForEach MyTable()
      If MyTable()\Field$(0) = Str(ID)
        Bio$ = MyTable()\Field$(2)
        Break
      EndIf
    Next
    PopListPosition(MyTable())
    ProcedureReturn Bio$
  EndProcedure
CompilerEndIf

Procedure.s GetColValueFromID(Col, ID)
  Protected Value$
  If Col < 0 : ProcedureReturn : EndIf
  PushListPosition(MyTable())
  FirstElement(MyTable())
  If Col <= ArraySize(MyTable()\Field$())
    ForEach MyTable()
      If MyTable()\Field$(0) = Str(ID)
        Value$ = MyTable()\Field$(Col)
        Break
      EndIf
    Next
  EndIf
  PopListPosition(MyTable())
  ProcedureReturn Value$
EndProcedure

Procedure.s GetValueFromID(Field$, ID)
  Protected Value$, ID$, CountField, Col = #PB_Default, I
  PushListPosition(MyTable())
  FirstElement(MyTable())
  CountField = ArraySize(MyTable()\Field$())
  ; Check case-insensitive header fields to make life easier
  Field$ = UCase(Field$) 
  For I = 0 To CountField
    If UCase(MyTable()\Field$(I)) = Field$
      Col = I
      Break
    EndIf
  Next
  If Col <> #PB_Default
    ID$ = Str(ID)
    ForEach MyTable()
      If MyTable()\Field$(0) = ID$
        Value$ = MyTable()\Field$(Col)
        Break
      EndIf
    Next
  EndIf
  PopListPosition(MyTable())
  ProcedureReturn Value$
EndProcedure

Procedure LoadCSV(Filename$)
  Protected File, Line$, NbLine, CountHeaderField, CountField, ReturnVal = #True, I
  Protected Dim StringArray.s(0)
  
  File = ReadFile(#PB_Any, Filename$)
  If File
    NbLine + 1
    Line$ = ReadString(File)
    CountHeaderField = SplitStringArray(Line$, ",", StringArray(), #True) - 1
    AddElement(MyTable())
    ReDim MyTable()\Field$(CountHeaderField)
    For I = 0 To CountHeaderField
      MyTable()\Field$(I) = StringArray(I)
    Next
    
    While Not Eof(File)
      NbLine + 1
      Line$ = ReadString(File)
      CountField = SplitStringArray(Line$, ",", StringArray(), #True) - 1
      If CountField = CountHeaderField
        AddElement(MyTable())
        ReDim MyTable()\Field$(CountField)
        For I = 0 To CountField
          MyTable()\Field$(I) = StringArray(I)
        Next
      Else
        If MessageRequester("Warning", "Line " + Str(NbLine) +", Number of fields = " + Str(CountField+1) + ", does Not match header fields count = " + Str(CountHeaderField+1) +
                                       #CRLF$+#CRLF$+ "Continue loading the csv table, ignoring this line ?", #PB_MessageRequester_YesNo | #PB_MessageRequester_Warning) = #PB_MessageRequester_No
          ClearList(MyTable())
          ReturnVal = #False
          Break
        EndIf
      EndIf
    Wend
    
    CloseFile(File)
    ProcedureReturn ReturnVal
  EndIf
EndProcedure

CompilerIf #PB_Compiler_IsMainFile
  Define Filename$ = "E:\Temp\Test.csv"
  Define OutPut$, CountField, I
  
  If LoadCSV(Filename$)
    
    ForEach MyTable()
      If CountField = 0 : CountField = ArraySize(MyTable()\Field$()) : EndIf
      OutPut$ = ""
      For I = 0 To CountField
        If I = 0 
          OutPut$ + MyTable()\Field$(I)
        Else
          OutPut$ + " | " + MyTable()\Field$(I)
        EndIf
      Next
      Debug OutPut$
    Next
    Debug "-----"
    Debug "ArtistName From ID = 8 : " + GetArtistNameFromID(8)
    Debug "Bio From ID = 8 : " + GetBioFromID(8)
    Debug "Value ArtistName From ID = 8 : " + GetValueFromID("ArtistName", 8)
    Debug "Column 2 Value From ID = 8 : " + GetColValueFromID(2, 8)
  EndIf
  
  ; Test.csv
  ; ArtistId,ArtistName,Bio
  ; 1,"Joe Satriani",""
  ; 2,"Steve Vai",""
  ; 3,"The Tea Party",""
  ; 4,"Noiseworks",""
  ; 5,"Wayne Jury",""
  ; 6,"Mr Percival",""
  ; 7,"Iron Maiden",""
  ; 8,"Atmasphere","Australian jazz band centred around polyrhythms."
  ; 9,"Ian Moss",""
  ; 10,"Magnum",""
  ; 13,"Primus",""
  ; 14,"Pat Metheny",""
  ; 15,"Frank Gambale",""
  ; 16,"Mothers of Invention","Like its name"
CompilerEndIf
Last edited by ChrisR on Sat Mar 09, 2024 11:54 am, edited 4 times in total.
User avatar
mk-soft
Always Here
Always Here
Posts: 6248
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Can a structure be defined dynamically by csv headers

Post by mk-soft »

With doublequotes this is a little more complicated. There can be separators in the field. Hence the doublequotes in the field.

Link: SplitString to list and array with option double-quotes
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
ChrisR
Addict
Addict
Posts: 1466
Joined: Sun Jan 08, 2017 10:27 pm
Location: France

Re: Can a structure be defined dynamically by csv headers

Post by ChrisR »

Yes, it's much better with your SplitString CSV to Array.
I've updated my previous post.
Thanks
Post Reply