Page 2 of 2

Re: Can a structure be defined dynamically by csv headers

Posted: Thu Mar 07, 2024 11:44 am
by craig7
Thanks all for the help, it is great to get insights when you are just starting on a language. My mind is a bit blown by how much code has to be written in PureBasic to do this, because in R or Python it is a one liner. For example in Python Panda:

df2 = pd.read_csv(filepath,encoding = "latin1")

In R reading in a CSV to a dataframe is in the base language, so even quicker than Python.

Re: Can a structure be defined dynamically by csv headers

Posted: Thu Mar 07, 2024 2:58 pm
by tored
craig7 wrote: Thu Mar 07, 2024 11:44 am Thanks all for the help, it is great to get insights when you are just starting on a language. My mind is a bit blown by how much code has to be written in PureBasic to do this, because in R or Python it is a one liner. For example in Python Panda:

df2 = pd.read_csv(filepath,encoding = "latin1")

In R reading in a CSV to a dataframe is in the base language, so even quicker than Python.
That is because those languages have CSV support in their respective standard libraries where PureBasic has not. And also Python and R are both interpreted languages, thus have virtual machine and a garbage collector, where PureBasic is a compiled language without a garbage collector, which naturally leads to more code.

Another option is to use an existing C library and use that together with the C backend.

However there is an argument to be made that CSV support ought be included in the PureBasic standard library, but the maintainers are pretty much occupied with more important tasks.

Re: Can a structure be defined dynamically by csv headers

Posted: Thu Mar 07, 2024 3:17 pm
by tored
infratec wrote: Wed Mar 06, 2024 9:50 pm A quick implementation with an user function:
Nice. You can simplify it even more if you have the CSV file on disk by letting sqlite import it after you created the table.

Code: Select all

.mode csv
.import myfile.csv mytable

Re: Can a structure be defined dynamically by csv headers

Posted: Thu Mar 07, 2024 4:32 pm
by Tenaja
My mind is a bit blown by how much code has to be written in PureBasic to do this
That is because those languages have CSV support in their respective standard libraries where PureBasic has not.
I agree... CSV support would be nice.

Keep in mind that there are many libraries native to pb that do not exist in python or r natively. You just happened to stumble across one that is absent here. But should you try to convert every pb file into another language, you would find similar deficiencies that pb has supported for many many years.

One nice thing about pb is the breadth of user-written libraries, as well as the helpful & friendly community! Sure, those exist elsewhere, but the large languages also come with a lot of chaff... More, in my experience, unfortunately.

Re: Can a structure be defined dynamically by csv headers

Posted: Thu Mar 07, 2024 5:06 pm
by infratec
tored wrote: Thu Mar 07, 2024 3:17 pm Nice. You can simplify it even more if you have the CSV file on disk by letting sqlite import it after you created the table.

Code: Select all

.mode csv
.import myfile.csv mytable

This is only possible with the command line tool.
So you have to use RunProgram()

Re: Can a structure be defined dynamically by csv headers

Posted: Thu Mar 07, 2024 5:46 pm
by craig7
Tenaja wrote: Thu Mar 07, 2024 4:32 pm
My mind is a bit blown by how much code has to be written in PureBasic to do this
That is because those languages have CSV support in their respective standard libraries where PureBasic has not.
I agree... CSV support would be nice.

Keep in mind that there are many libraries native to pb that do not exist in python or r natively. You just happened to stumble across one that is absent here. But should you try to convert every pb file into another language, you would find similar deficiencies that pb has supported for many many years.

One nice thing about pb is the breadth of user-written libraries, as well as the helpful & friendly community! Sure, those exist elsewhere, but the large languages also come with a lot of chaff... More, in my experience, unfortunately.
Yes Python library distribution sites like pypi are a bit like the wild west. Only widely used packages like Pandas are well maintained and documented. R is different, in that Cran curates the libraries / packages to a high standard. However, unless you use virtual environments, OR stick to base language which is very stable, your code will break every few months due to changes in the packages when they update... especially if has anything to do with the tidyverse, where the lead dev. once stated: "I prefer to have a more utopian approach where I can be quite aggressive about making backward-incompatible changes while trying to figure out a better API."
It is nice the PB has so many built in libraries, and that everything is in one exe, as long as the libraries respect backwards compatibility.

Re: Can a structure be defined dynamically by csv headers

Posted: Fri Mar 08, 2024 9:59 am
by mk-soft
Nice function ...

Added error text if csv-file not found :wink:

Code: Select all

EnableExplicit

Structure IndexArray
  *Index[0]
EndStructure


PrototypeC.l Prototype_CallBack(*pCtx, argc.l, *argv.IndexArray, *colnames.IndexArray)

ImportC "sqlite3.lib" 
  sqlite3_create_function16(DatabaseID, zFunctionName.s, nArg.l, eTextRep.l, *pApp, *xFunc, *xStep, *xFinal)
  sqlite3_value_text16(*sqlite3_value)
  sqlite3_context_db_handle.i(*context)
  sqlite3_exec(*db, sql.p-utf8, callback.Prototype_CallBack, *firstArg, *errmsg)
  sqlite3_result_error16(*context, msg.s, bytes)
EndImport

#SQLITE_UTF16 = 4




ProcedureC createTableFromCSV(*context, argc.i, *argv.IndexArray)
  
  Protected Filename$, File.i, Line$, FieldCount.i, Table$, SQL$, Field$, *DB, i.i
  
  
  Filename$ = PeekS(sqlite3_value_text16(*argv\Index[0]))
  File = ReadFile(#PB_Any, Filename$)
  If File
    Table$ = GetFilePart(Filename$, #PB_FileSystem_NoExtension)
    
    Line$ = ReadString(File)
    ;Debug Line$
    
    SQL$ = "CREATE TABLE " + Table$ + "("
    
    FieldCount = 1
    Repeat
      Field$ = StringField(Line$, FieldCount, ",")
      If Field$ <> ""
        SQL$ + Trim(Field$) + " TEXT,"
      EndIf
      FieldCount + 1
    Until Field$ = ""
    SQL$ = RTrim(SQL$, ",") + ")"
    ;Debug SQL$
    
    *DB = sqlite3_context_db_handle(*context)
    
    sqlite3_exec(*DB, SQL$, #Null, #Null, #Null)
    
    While Not Eof(File)
      Line$ = ReadString(File)
      SQL$ = "INSERT INTO " + Table$ + " VALUES ("
      
      For i = 1 To FieldCount
        Field$ = StringField(Line$, i, ",")
        SQL$ + Trim(Field$) + ","
      Next i
      SQL$ = RTrim(SQL$, ",") + ")"
      ;Debug SQL$
      
      sqlite3_exec(*DB, SQL$, #Null, #Null, #Null)
    Wend
    
    CloseFile(File)
  Else
    sqlite3_result_error16(*context, "Error: CSV-File Not Found!", -1)
  EndIf
  
EndProcedure


UseSQLiteDatabase()


If OpenDatabase(0, ":memory:", "", "")
  
  sqlite3_create_function16(DatabaseID(0), "createTableFromCSV",  1, #SQLITE_UTF16, #Null, @createTableFromCSV(), #Null, #Null)
  
  If Not DatabaseUpdate(0, "SELECT createTableFromCSV('c:\tmp\PureBasic\csvtest\Test.csv')")
    Debug DatabaseError()
  Else
    If DatabaseQuery(0, "SELECT * FROM Test")
      While NextDatabaseRow(0)
        Debug GetDatabaseString(0, 0) + " " + GetDatabaseString(0, 1) + " " + GetDatabaseString(0, 2)
      Wend
      FinishDatabaseQuery(0)
    Else
      Debug DatabaseError()
    EndIf
  EndIf
  CloseDatabase(0)
EndIf
[code-pb]

Re: Can a structure be defined dynamically by csv headers

Posted: Fri Mar 08, 2024 4:19 pm
by ChrisR
Nicely done with sqlite and probably the most appropriate way for csv

Another possible way without using sqllite:

Code: Select all

CompilerIf #PB_Compiler_IsMainFile
  EnableExplicit
CompilerEndIf

Structure MyFieldTable
  Array Field$(0)
EndStructure
Global NewList MyTable.MyFieldTable()

; mk-soft SplitString to List or Array with option double-quotes (CSV) (Array here): https://www.purebasic.fr/english/viewtopic.php?t=69557
XIncludeFile "SplitString.pbi" 

CompilerIf #PB_Compiler_IsMainFile
  ; GetArtistNameFromID is only for the Test example below
  Procedure.s GetArtistNameFromID(ID)
    Protected ArtistName$
    PushListPosition(MyTable())
    ForEach MyTable()
      If MyTable()\Field$(0) = Str(ID)
        ArtistName$ = MyTable()\Field$(1)
        Break
      EndIf
    Next
    PopListPosition(MyTable())
    ProcedureReturn ArtistName$
  EndProcedure
  
  ; GetBioFromID is only for the Test example below
  Procedure.s GetBioFromID(ID)
    Protected Bio$
    PushListPosition(MyTable())
    ForEach MyTable()
      If MyTable()\Field$(0) = Str(ID)
        Bio$ = MyTable()\Field$(2)
        Break
      EndIf
    Next
    PopListPosition(MyTable())
    ProcedureReturn Bio$
  EndProcedure
CompilerEndIf

Procedure.s GetColValueFromID(Col, ID)
  Protected Value$
  If Col < 0 : ProcedureReturn : EndIf
  PushListPosition(MyTable())
  FirstElement(MyTable())
  If Col <= ArraySize(MyTable()\Field$())
    ForEach MyTable()
      If MyTable()\Field$(0) = Str(ID)
        Value$ = MyTable()\Field$(Col)
        Break
      EndIf
    Next
  EndIf
  PopListPosition(MyTable())
  ProcedureReturn Value$
EndProcedure

Procedure.s GetValueFromID(Field$, ID)
  Protected Value$, ID$, CountField, Col = #PB_Default, I
  PushListPosition(MyTable())
  FirstElement(MyTable())
  CountField = ArraySize(MyTable()\Field$())
  ; Check case-insensitive header fields to make life easier
  Field$ = UCase(Field$) 
  For I = 0 To CountField
    If UCase(MyTable()\Field$(I)) = Field$
      Col = I
      Break
    EndIf
  Next
  If Col <> #PB_Default
    ID$ = Str(ID)
    ForEach MyTable()
      If MyTable()\Field$(0) = ID$
        Value$ = MyTable()\Field$(Col)
        Break
      EndIf
    Next
  EndIf
  PopListPosition(MyTable())
  ProcedureReturn Value$
EndProcedure

Procedure LoadCSV(Filename$)
  Protected File, Line$, NbLine, CountHeaderField, CountField, ReturnVal = #True, I
  Protected Dim StringArray.s(0)
  
  File = ReadFile(#PB_Any, Filename$)
  If File
    NbLine + 1
    Line$ = ReadString(File)
    CountHeaderField = SplitStringArray(Line$, ",", StringArray(), #True) - 1
    AddElement(MyTable())
    ReDim MyTable()\Field$(CountHeaderField)
    For I = 0 To CountHeaderField
      MyTable()\Field$(I) = StringArray(I)
    Next
    
    While Not Eof(File)
      NbLine + 1
      Line$ = ReadString(File)
      CountField = SplitStringArray(Line$, ",", StringArray(), #True) - 1
      If CountField = CountHeaderField
        AddElement(MyTable())
        ReDim MyTable()\Field$(CountField)
        For I = 0 To CountField
          MyTable()\Field$(I) = StringArray(I)
        Next
      Else
        If MessageRequester("Warning", "Line " + Str(NbLine) +", Number of fields = " + Str(CountField+1) + ", does Not match header fields count = " + Str(CountHeaderField+1) +
                                       #CRLF$+#CRLF$+ "Continue loading the csv table, ignoring this line ?", #PB_MessageRequester_YesNo | #PB_MessageRequester_Warning) = #PB_MessageRequester_No
          ClearList(MyTable())
          ReturnVal = #False
          Break
        EndIf
      EndIf
    Wend
    
    CloseFile(File)
    ProcedureReturn ReturnVal
  EndIf
EndProcedure

CompilerIf #PB_Compiler_IsMainFile
  Define Filename$ = "E:\Temp\Test.csv"
  Define OutPut$, CountField, I
  
  If LoadCSV(Filename$)
    
    ForEach MyTable()
      If CountField = 0 : CountField = ArraySize(MyTable()\Field$()) : EndIf
      OutPut$ = ""
      For I = 0 To CountField
        If I = 0 
          OutPut$ + MyTable()\Field$(I)
        Else
          OutPut$ + " | " + MyTable()\Field$(I)
        EndIf
      Next
      Debug OutPut$
    Next
    Debug "-----"
    Debug "ArtistName From ID = 8 : " + GetArtistNameFromID(8)
    Debug "Bio From ID = 8 : " + GetBioFromID(8)
    Debug "Value ArtistName From ID = 8 : " + GetValueFromID("ArtistName", 8)
    Debug "Column 2 Value From ID = 8 : " + GetColValueFromID(2, 8)
  EndIf
  
  ; Test.csv
  ; ArtistId,ArtistName,Bio
  ; 1,"Joe Satriani",""
  ; 2,"Steve Vai",""
  ; 3,"The Tea Party",""
  ; 4,"Noiseworks",""
  ; 5,"Wayne Jury",""
  ; 6,"Mr Percival",""
  ; 7,"Iron Maiden",""
  ; 8,"Atmasphere","Australian jazz band centred around polyrhythms."
  ; 9,"Ian Moss",""
  ; 10,"Magnum",""
  ; 13,"Primus",""
  ; 14,"Pat Metheny",""
  ; 15,"Frank Gambale",""
  ; 16,"Mothers of Invention","Like its name"
CompilerEndIf

Re: Can a structure be defined dynamically by csv headers

Posted: Fri Mar 08, 2024 5:41 pm
by mk-soft
With doublequotes this is a little more complicated. There can be separators in the field. Hence the doublequotes in the field.

Link: SplitString to list and array with option double-quotes

Re: Can a structure be defined dynamically by csv headers

Posted: Fri Mar 08, 2024 6:34 pm
by ChrisR
Yes, it's much better with your SplitString CSV to Array.
I've updated my previous post.
Thanks