Page 1 of 2

Lesson learned

Posted: Mon Sep 19, 2022 8:32 am
by Cyllceaux
Hi @all,

At the moment I write my own Database in PureBasic. Complete from the scratch, without any given codes. While development, I learned a lot of things about PB. And I think, I want to share this here.

Maps:

Code: Select all

;Slow
ForEach *db\schema("Test")\table("Test")\datas()
   ...
Next

;Fast
Protected *schema.strSchema=FindMapElement(*db\schema(),"Test")
Protected *table.strTable=FindMapElement(*schema\table(),"Test")
ForEach *table\datas()
   ...
Next
And much important: Give every Map in a structure a slotsize.

Write to File:

Code: Select all

;Slow
WriteByte(file,4)
WriteString(file,"Test")
WriteByte(file,4)
WriteString(file,"Test")

;Fast
*m=AllocateMemory(10)
PokeB(*m,4)
PokeS(*m+1,"Test")
PokeB(*m+5,4)
PokeS(*m+6,"Test")
WriteData(file,*m,10)
Encoding:

Code: Select all

;Always use Encoding for Strings and Files

#ENCODING       = #PB_UTF8
#PEEKS_ENCODING = #ENCODING | #PB_ByteLength
#POKES_ENCODING = #ENCODING | #PB_String_NoZero
#FILE_FLAGS     = #ENCODING | #PB_File_SharedRead | #PB_File_SharedWrite | #PB_File_NoBuffering
#WRITE_STRING   = #ENCODING | #PB_File_IgnoreEOL
Globals are sometimes better:

Code: Select all

  Global T_USERNAME.s=ComputerName()+"\"+UserName()
  Global T_S_BYTE=SizeOf(Byte)
  Global T_S_INTEGER=SizeOf(Integer)
  Global T_S_LONG=SizeOf(Long)
  Global T_S_QUAD=SizeOf(Quad)
  Global T_S_DOUBLE=SizeOf(Double)
  Global T_S_FLOAT=SizeOf(Float)
  Global T_S_ROOT=SizeOf(strRoot)
Always test the results of 3 DBs:
  • In Memory
  • DB with File
  • Reopen a File
FindString and ReplaceString are expensive:

Code: Select all

;Faster FindString
ImportC ""
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
      _wcsstr_(*String1, *String2) As "wcsstr"
    CompilerElse
      _wcsstr_(*String1, *String2) As "_wcsstr"
    CompilerEndIf
EndImport

Procedure.i FindStringC(s.s, c.s)
      Protected.i *pcs = _wcsstr_(@s, @c)
      If *pcs
        ProcedureReturn (*pcs - @s) >> 1 + 1 
      EndIf
EndProcedure[size=200][/size]
; I don't have a better Replacement for ReplaceString
Work with Pointers:

Code: Select all

;Slow

result.b=PeekB(*buffer+pos):pos+SizeOf(Byte)

;Fast
Procedure.b _peek_byte(*pos.Quad,*buffer)
      Protected result.b=PeekB(*buffer+*pos\q)
      *pos\q+T_S_BYTE
      ProcedureReturn Result
EndProcedure

result.b=_peek_byte(@pos,*m)
Files:
If you work with FileBuffers, there is a big different between Filesize("test.txt") and LOF(file).
FileSize gives you the Size of the File on the HardDrive.
LOF gives you the size of the file on the HardDrive + the Data writen in the Buffer.

Date:
Use your own Date Format and Date Procedures. My DateTime can have Time between the years -999.999 > 999.999.

------------------------------------------------------------------------------------------------------------------------

There are much more Things, but this were the biggest mistakes I did and slowed down my DB.

At the Moment my DB has this Interface. But I'm working on a DLL Version and SQL.

Code: Select all

Enumeration _datatype
  #_DATATYPE_NONE
  #_DATATYPE_STRING
  #_DATATYPE_BYTE
  #_DATATYPE_INTEGER
  #_DATATYPE_LONG
  #_DATATYPE_QUAD
  #_DATATYPE_DOUBLE
  #_DATATYPE_FLOAT		
  #_DATATYPE_DATE
  #_DATATYPE_TIME
  #_DATATYPE_DATETIME
  #_DATATYPE_MEMORY
EndEnumeration

Enumeration _type
  #_TYPE_NONE
  #_TYPE_DB
  #_TYPE_TABLE
  #_TYPE_DATA		
  #_TYPE_VIEW
  #_TYPE_JOIN
  #_TYPE_UNION		
  #_TYPE_USER
  #_TYPE_SCHEMA
  #_TYPE_RESULT
EndEnumeration

Enumeration _join
  #_JOIN_CROSS
  #_JOIN_INNER
  #_JOIN_LEFT
  #_JOIN_RIGHT
EndEnumeration


Interface Object
  Free()
  GetLastError.s()
EndInterface

Interface Head Extends Object
  GetID.q()
  GetCreated.q()
  GetModified.q()
  GetVersion.q()
  GetType.b()
EndInterface

Interface Named Extends Head		
  GetCreator.s():GetModifier.s()
  GetName.s():SetName(name.s)
  GetSchema.s():SetSchema(schema.s)
EndInterface

#CURRENT_TIME="CURRENT_TIME"
#CURRENT_DATE="CURRENT_DATE"
#CURRENT_DATETIME="CURRENT_DATETIME"
#CURRENT_TIMESTAMP="CURRENT_TIMESTAMP"

Interface DataContainer Extends Named
  
EndInterface

Interface Table Extends DataContainer
  
  AddString(name.s,nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  AddByte(name.s,nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  AddInteger(name.s,nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  AddLong(name.s,nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  AddQuad(name.s,nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  AddDouble(name.s,decimal.b=2,nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  AddFloat(name.s,decimal.b=2,nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  
  AddDate(name.s,format.s="%dd.%mm.%yyyy",nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  AddTime(name.s,format.s="%hh:%ii:%ss",nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  AddDateTime(name.s,format.s="%dd.%mm.%yyyy %hh:%ii:%ss",nullable.b=#True,defaultvalue.s="",index.b=#False,unique.b=#False,primarykey.b=#False)
  
  AddMemory(name.s,nullable.b=#True)
  
  AddIndex(field.s,gruppe.s="")
  AddUnique(field.s,gruppe.s="")
  AddPrimaryKey(field.s)
EndInterface

Interface Result Extends Head
  GetCreator.s():GetModifier.s()
  GetSchema.s()
  GetTable.s()
  GetFieldNames(List names.s())
  GetFieldType.b(name.s)
  
  GetString.s(name.s)
  GetByte.b(name.s)
  GetInteger.i(name.s)
  GetLong.l(name.s)
  GetQuad.q(name.s)
  GetDouble.d(name.s)
  GetFloat.f(name.s)
  
  GetDate.s(name.s)
  GetTime.s(name.s)
  GetDateTime.s(name.s)
  GetFormat.s(name.s)
  
  GetMemory(name.s,*buffer,len.q)
  GetMemoryLen.q(name.s)
  
  AsString.s(name.s)
  AsNumeric.q(name.s)
  AsDecimal.d(name.s)
EndInterface

Interface Data Extends Result
  SetSchema(schema.s)
  SetTable(table.s)
  
  SetString(name.s,value.s)
  SetByte(name.s,value.b)
  SetInteger(name.s,value.i)
  SetLong(name.s,value.l)
  SetQuad(name.s,value.q)
  SetDouble(name.s,value.d)
  SetFloat(name.s,value.f)
  
  SetDate(name.s,value.s,format.s="%dd.%mm.%yyyy")
  SetTime(name.s,value.s,format.s="%hh:%ii:%ss")
  SetDateTime(name.s,value.s,format.s="%dd.%mm.%yyyy %hh:%ii:%ss")
  
  SetMemory(name.s,*buffer,len.q)
  SetDecimal(name.s,value.s)
  SetNull(name.s)
EndInterface

Prototype.b ProtoWhere(*result.Result)

Interface View Extends DataContainer
  GetDatacontainer.s():SetDatacontainer(datacontainer.s)
  AddGroupBy(field.s)
  AddOrderBy(field.s)
  SetOrderOption(option.b=#PB_Sort_Ascending)
  
  AddField(field.s,name.s="")
  AddMax(field.s,name.s="")
  AddMin(field.s,name.s="")
  AddAvg(field.s,name.s="")
  AddCount(field.s,name.s="")
  AddSum(field.s,name.s="")
  AddConcat(field.s,seperator.s=", ",name.s="")
EndInterface

Interface Join Extends DataContainer
  GetJoin.b():SetJoin(join.b=#_JOIN_INNER)
  
  SetDatacontainer1(datacontainer.s)
  SetDatacontainer2(datacontainer.s)
  
  AddField1(field.s)
  AddField2(field.s)
  
  AddJoinField1(field.s)
  AddJoinField2(field.s)
EndInterface

Interface Union Extends DataContainer
  AddDatacontainer(datacontainer.s)
EndInterface

Interface Schema Extends Head
  GetName.s():SetName(name.s)
EndInterface


Interface DB Extends Head
  GetDBVersion.s()
  
  insertTable(*table.Table)
  insertData(*data.Data)
  insertJoin(*join.Join)
  insertUnion(*union.Union)
  insertView(*view.View)
  insertSchema(*schema.Schema)
  
  
  BeginBatch()
  Commit()
  Rollback()
  Vacuum()
  Reload()
  IsReadOnly.b()
  
  countTable(schema.s="")
  countData(table.s,schema.s="")
  countJoin(schema.s="")
  countUnion(schema.s="")
  countView(schema.s="")
  countSchema()
  
  
  TableNames(List names.s(),schema.s="")
  JoinNames(List names.s(),schema.s="")
  UnionNames(List names.s(),schema.s="")
  ViewNames(List names.s(),schema.s="")
  SchemaNames(List names.s())
  IndexNames(table.s,gruppe.s,List names.s(),schema.s="")
  UniqueNames(table.s,gruppe.s,List names.s(),schema.s="")
  
  
  deleteTable(name.s,schema.s="")		
  deleteJoin(name.s,schema.s="")
  deleteUnion(name.s,schema.s="")
  deleteView(name.s,schema.s="")
  deleteSchema(name.s)
  
  
  DeleteDatas.q(table.s,where.ProtoWhere=0,schema.s="")
  UpdateDatas.q(table.s,*data.Data,where.ProtoWhere=0,schema.s="")
  
  ListTableDatas.q(table.s,List datas.Result(),where.ProtoWhere=0,schema.s="",offset.q=-1,limit.q=-1)
  ListViewData.q(view.s,List result.Result(),where.ProtoWhere=0,schema.s="",offset.q=-1,limit.q=-1)
  ListUnionData.q(union.s,List result.Result(),where.ProtoWhere=0,schema.s="",offset.q=-1,limit.q=-1)
  ListJoinData.q(join.s,List result.Result(),where.ProtoWhere=0,schema.s="",offset.q=-1,limit.q=-1)
  
  ListContainerData.q(datacontainer.s,List result.Result(),where.ProtoWhere=0,schema.s="",offset.q=-1,limit.q=-1)
  
  ListTableIndexDatas.q(table.s,index.s,value.s,List datas.Result(),where.ProtoWhere=0,schema.s="",offset.q=-1,limit.q=-1)
  
  hasTable.b(name.s,schema.s="")		
  hasJoin.b(name.s,schema.s="")
  hasUnion.b(name.s,schema.s="")
  hasView.b(name.s,schema.s="")
  hasSchema.b(name.s)
  
 
EndInterface

Re: Lesson learned

Posted: Mon Sep 19, 2022 9:12 am
by idle
Pb string functions are useful but very slow and maps are only useful when they're sized appropriately they're not very dynamic.

Re: Lesson learned

Posted: Mon Sep 19, 2022 10:36 am
by Caronte3D
Useful info, thanks! :wink:

Re: Lesson learned

Posted: Mon Sep 19, 2022 10:58 am
by Bitblazer
One of the necessary additions to PureBasic is a Date64 Module. You can find many versions, but this should be a good one.

Re: Lesson learned

Posted: Mon Sep 19, 2022 11:43 am
by NicTheQuick
I don't get why globals are faster than SizeOf(), because variables are located in memory and cache and SizeOf() is just a constant. In my opinion it can not be faster to use a variable instead of SizeOf() directly. How did you test that?

Also PeekXYZ() in general is not a good idea. Just use pointers directly. I don't get why the procedure "_peek_byte" should be faster when it uses PeekB() again. :?: I would understand it if there were no PeekB at all because the C compiler then can inline that procedure and make it very fast.

Re: Lesson learned

Posted: Mon Sep 19, 2022 11:46 am
by Cyllceaux
Bitblazer wrote: Mon Sep 19, 2022 10:58 am One of the necessary additions to PureBasic is a Date64 Module. You can find many versions, but this should be a good one.
Hey there... I know that, but it's not what I needed ;-)
idle wrote: Mon Sep 19, 2022 9:12 am Pb string functions are useful but very slow and maps are only useful when they're sized appropriately they're not very dynamic.
I know... but wanted to remember 8) :wink:

Re: Lesson learned

Posted: Mon Sep 19, 2022 12:12 pm
by Cyllceaux
NicTheQuick wrote: Mon Sep 19, 2022 11:43 am I don't get why globals are faster than SizeOf(), because variables are located in memory and cache and SizeOf() is just a constant. In my opinion it can not be faster to use a variable instead of SizeOf() directly. How did you test that?

Also PeekXYZ() in general is not a good idea. Just use pointers directly. I don't get why the procedure "_peek_byte" should be faster when it uses PeekB() again. :?: I would understand it if there were no PeekB at all because the C compiler then can inline that procedure and make it very fast.
Hi Nic,

I can't explain it, but all the changes I did made a performance increase from 72s down to 11s.

I was surprised from the SizeOf and the Peek*. Don't forget, this was called a couple million times. For had my "huh" moment by the change from "pos+S_T_SIZE" to "pos\q+S_T_SIZE". I can't explain the performance different, but made a couple seconds for me.

So, I'm sorry I can't help :(

Re: Lesson learned

Posted: Mon Sep 19, 2022 12:58 pm
by mk-soft
The Peek and Poke commands are function calls. These can all be replaced by direct accesses (except for string memory to PB strings). Also with indexed access.

A direct access is always faster than a function call Peek/Poke.
Update

Code: Select all

;-TOP

Structure ArrayOfAscii
  a.a[0] ; <- Null is undefined size
EndStructure

Define *mem.ArrayOfAscii = Ascii("Hello World!")

Define index = 0

While *mem\a[index]
  r1.s = "Index " + index + ": ASCII: " + *mem\a[index]
  Debug r1
  index + 1
Wend

FreeMemory(*mem)

Debug "----"
; Better use Character
Structure ArrayOfCharacter
  c.c[0] ; <- Null is undefined size
EndStructure

Define *mem2.ArrayOfCharacter = @"Hello World!"

Define index = 0

While *mem2\c[index]
  r1.s = "Index " + index + ": Character: " + *mem2\c[index]
  Debug r1
  index + 1
Wend

Re: Lesson learned

Posted: Mon Sep 19, 2022 1:08 pm
by Cyllceaux
mk-soft wrote: Mon Sep 19, 2022 12:58 pm The Peek and Poke commands are function calls. These can all be replaced by direct accesses (except for string memory to PB strings). Also with indexed access.

A direct access is always faster than a function call Peek/Poke.

Code: Select all

;-TOP

Structure ArrayOfAscii
  a.a[0] ; <- Null is undefined size
EndStructure

Define *mem.ArrayOfAscii = Ascii("Hello World!")

Define index = 0

While *mem\a[index]
  r1.s = "Index " + index + ": ASCII: " + *mem\a[index]
  Debug r1
  index + 1
Wend

FreeMemory(*mem)

Debug "----"
; Better use Character
Structure ArrayOfCharacter
  c.c[0] ; <- Null is undefined size
EndStructure

Define *mem2.ArrayOfCharacter = @"Hello World!"

Define index = 0

While *mem2\c[index]
  r1.s = "Index " + index + ": Character: " + *mem2\c[index]
  Debug r1
  index + 1
Wend

UUUH..... nice... I will try it.

Re: Lesson learned

Posted: Mon Sep 19, 2022 1:10 pm
by mk-soft
Better use character. See update

Re: Lesson learned

Posted: Mon Sep 19, 2022 1:22 pm
by Cyllceaux
mk-soft wrote: Mon Sep 19, 2022 1:10 pm Better use character. See update
OK... the problem is, the *buffer is a mix of everything. So I tried this:

Code: Select all

    Procedure.b _peek_byte(*pos.Quad,*buffer)
      Protected *tresult.Byte=*buffer+*pos\q
      *pos\q+T_S_BYTE
      ProcedureReturn *tresult\b
    EndProcedure
But this is slower than peek :(

Re: Lesson learned

Posted: Mon Sep 19, 2022 1:26 pm
by mk-soft
In the past (DOS, UNIX time), static structures were used. This meant that the individual data sets could be accessed with FileSeek.
Pos = index * SizeOf(udtDataSet)

Code: Select all

;-TOP

Enumeration State
  #IsFree
  #IsSet
  #IsDel
EndEnumeration

Structure udtDataSet
  ID.l
  State.w
  FirstName.s{20}
  LastName.s{20}
  Age.w
EndStructure

size = SizeOf(udtDataSet)

Debug size

Re: Lesson learned

Posted: Mon Sep 19, 2022 1:29 pm
by mk-soft

Code: Select all

Structure ArrayOfAny
  StructureUnion
    a.a[0]
    b.b[0]
    c.c[0]
    w.w[0]
    u.u[0]
    l.l[0]
    q.q[0]
    f.f[0]
    d.d[0]
  EndStructureUnion
EndStructure

lVal.l = $12345678

*mem.ArrayOfAny = @lVal

Debug Hex(*mem\a[0])
Debug Hex(*mem\a[1])
Debug Hex(*mem\a[2])
Debug Hex(*mem\a[3])

Debug Hex(*mem\u[0])
Debug Hex(*mem\u[1])

Re: Lesson learned

Posted: Mon Sep 19, 2022 1:39 pm
by Cyllceaux
mk-soft wrote: Mon Sep 19, 2022 1:29 pm

Code: Select all

Structure ArrayOfAny
  StructureUnion
    a.a[0]
    b.b[0]
    c.c[0]
    w.w[0]
    u.u[0]
    l.l[0]
    q.q[0]
    f.f[0]
    d.d[0]
  EndStructureUnion
EndStructure

lVal.l = $12345678

*mem.ArrayOfAny = @lVal

Debug Hex(*mem\a[0])
Debug Hex(*mem\a[1])
Debug Hex(*mem\a[2])
Debug Hex(*mem\a[3])

Debug Hex(*mem\u[0])
Debug Hex(*mem\u[1])
Does not work for me... :( The Buffer looks like
Byte|Byte|Quad|String|Byte
Or
Byte|Quad|String|Quad|Double|Byte|String
Or any other combination

Re: Lesson learned

Posted: Mon Sep 19, 2022 1:51 pm
by mk-soft
The problem is the string. With dynamic string size, you always have to calculate forward from the first data set. To do this, a length of the string must be stored in front of the string. Or you can use static strings with a fixed length.