Page 1 of 1

Check non-word character without a regular expression

Posted: Wed Jul 26, 2023 9:43 am
by Boulcat
I would like to delete my regular expression to check all non-word characters and use character pointers instead (*String = @String)
It may already exist, but I haven't found it and I don't know much about memory work.

Code: Select all

Procedure.s CheckName(String.s)
  Static CheckNameRegEx.i
  If CheckNameRegEx = 0 Or IsRegularExpression(CheckNameRegEx) = 0
    CheckNameRegEx = CreateRegularExpression(#PB_Any, "^[^a-zA-Z_]|\W+")
    ; FreeRegularExpression(CheckVarRegEx) not required: all remaining regular expressions are automatically freed when the program ends
  EndIf
  If IsRegularExpression(CheckNameRegEx)
    String = ReplaceRegularExpression(CheckNameRegEx, String, "")
  EndIf
  ProcedureReturn String
EndProcedure

Debug CheckName("abc~123-ABC\_*")

Re: Check non-word character without a regular expression

Posted: Wed Jul 26, 2023 12:29 pm
by SMaag
I modified my RemoveCharFast function to do so!

Code: Select all

EnableExplicit

Structure pChar   ; virtual CHAR-ARRAY, used as Pointer to overlay on strings 
  a.a[0]          ; fixed ARRAY Of CHAR Length 0
  c.c[0]          
EndStructure

   
 Procedure RemoveNonWordChars(*String)
  ; ============================================================================
  ; NAME: RemoveNonWordChar
  ; NAME: Attention! This is a Pointer-Version! Be sure to call it with a
  ; DESC: correct String-Pointer
  ; DESC: Removes a NonWord Characters from the String
  ; DESC: The String will be shorter after
  ; VAR(*String) : Pointer to String
  ; RET: - 
  ; ============================================================================
    Protected I, pWrite, pRead, *pC.pChar
    
    Macro RemoveNonWordChars_KeepChar()
      If pRead > pWrite               ; if ReadPosition > WritePosition
        *pC\c[pWrite] =  *pC\c[pRead] ; Copy the Character from ReadPosition to WritePosition = compacting the String
      EndIf       
      pWrite +1  : pRead  +1          ; set new Read And Write-Position    
    EndMacro
    
    *pC = *String    ; Set the CharPointer = StartOfString
      
    If *pC        
      
      Repeat
        
        Select *pC\c[I]                   
          Case 0              ; --- EndOfString
            Break                   
            
          ; ----------------------------------------------------------------------
          ; Characters to keep
          ; ----------------------------------------------------------------------

          Case 'a' To 'z'                   ; keep  a to z
            RemoveNonWordChars_KeepChar()
            
          Case 'A' To 'Z'                   ; keep A to Z
             RemoveNonWordChars_KeepChar()
            
          Case '_'                          ; keep '_'
             RemoveNonWordChars_KeepChar()

          ; ----------------------------------------------------------------------
          ; Remove all other charactes
          ; ----------------------------------------------------------------------
           
          Default             ; remove all other characters => compact the String 
            pRead +1                 ; Set the ReadPositon to Next Char           

        EndSelect      
        I + 1 
      ForEver     
   
      ; I is EndOfString and Nulltermination! So if pWrite is not the orginal EndOfString
      ; we must wirte a NullTermination
      If pWrite < I   ;
        *pC\c[pWrite] = 0  ; Write Null at EndOfString
      EndIf       
    EndIf    
  EndProcedure

  
  Procedure RemoveCharFast(*String, Char.c)
  ; ============================================================================
  ; NAME: RemoveChars
  ; NAME: Attention! This is a Pointer-Version! Be sure to call it with a
  ; DESC: correct String-Pointer
  ; DESC: Removes a Character from the String
  ; DESC: The String will be shorter after
  ; VAR(*String) : Pointer to String
  ; VAR(Char.c) : The Character to remove
  ; RET: - 
  ; ============================================================================
    Protected I, pWrite, pRead, *pC.pChar
     
     *pC = *String    ; Set the CharPointer = StartOfString
      
    If *pC And Char        
      ; ----------------------------------------------------------------------
      ; compacting the String
      ; ----------------------------------------------------------------------
      Repeat
        Select *pC\c[I]                   
          Case 0              ; --- EndOfString
            Break                   
            
          Case Char           ; --- the searched Character
            pRead +1                      ; Set the ReadPositon to Next Char
            
          Default             ; --- other characters => compact the String 
            If pRead > pWrite               ; if ReadPosition > WritePosition
              *pC\c[pWrite] =  *pC\c[pRead] ; Copy the Character from ReadPosition to WritePosition = compacting the String
            EndIf       
            pWrite +1  : pRead  +1        ; set new Read And Write-Position

        EndSelect      
        I + 1 
      ForEver     
   
      ; I is EndOfString and Nulltermination! So if pWrite is not the orginal EndOfString
      ; we must wirte a NullTermination
      If pWrite < I   ;
        *pC\c[pWrite] = 0  ; Write Null at EndOfString
      EndIf       
    EndIf    
  EndProcedure

  Define MyStr.s
  
  Mystr = "abc~123-ABC\_*"
  
  Debug MyStr
  RemoveNonWordChars(@MyStr)
  
  Debug MyStr
  
  DisableExplicit

Re: Check non-word character without a regular expression

Posted: Wed Jul 26, 2023 1:37 pm
by Boulcat
Thanks SMaag, it works well :)
For my needs, I just added Case '0' To '9' : RemoveNonWordChars_KeepChar()
In my attempt before posting, I had tried using *String.Character without knowing how to set the read/write position

Re: Check non-word character without a regular expression

Posted: Wed Jul 26, 2023 4:31 pm
by AZJIO

Re: Check non-word character without a regular expression

Posted: Wed Jul 26, 2023 7:05 pm
by Boulcat
@AZJIO
It's not the intended function, but thanks for the link to learn how to manipulate memory and maybe one day use IsLatin, IsDigital,...

SMaag's RemoveNonWordChar procedure suits me, but for the exercise, I tried with *String.Character, without using the *pC.pChar array structure.
I'm not really comfortable, I'm learning. Can you confirm if it's okay ?

Code: Select all

EnableExplicit

Procedure RemoveNonWordChars(*String.Character)
  ; ============================================================================
  ; NAME: RemoveNonWordsChar
  ; NAME: Attention! This is a Pointer-Version! Be sure to call it with a
  ; DESC: correct String-Pointer
  ; DESC: Removes a NonWord Characters from the String
  ; DESC: The String will be shorter after
  ; VAR(*String) : Pointer to String
  ; RET: -
  ; ============================================================================
  Protected *Newstring.Character
  
  Macro RemoveNonWordChars_KeepChar()
    If *Newstring < *String
      *Newstring\c = *String\c       ; Copy the Character from *String Position to *Newstring Position
    EndIf
    *Newstring + SizeOf(Character)   ; set new Write-Position
  EndMacro
  
  If *String
    *Newstring = *String
    ;ShowMemoryViewer(*String, 28)
    Repeat
      
      Select *String\c
        Case 0              ; --- EndOfString
          Break
          
          ; ----------------------------------------------------------------------
          ; Characters to keep
          ; ----------------------------------------------------------------------
        Case '0' To '9'                   ; keep  0 to 9
          RemoveNonWordChars_KeepChar()
        Case 'a' To 'z'                   ; keep  a to z
          RemoveNonWordChars_KeepChar()
        Case 'A' To 'Z'                   ; keep A to Z
          RemoveNonWordChars_KeepChar()
        Case '_'                          ; keep '_'
          RemoveNonWordChars_KeepChar()
          
      EndSelect
      
      *String + SizeOf(Character)
    ForEver     
    
    ; if *Newstring is not the orginal *String, we must write a NullTermination
    If *Newstring < *String
      *Newstring\c = 0  ; Write Null at EndOfString
    EndIf
  EndIf
EndProcedure

Define MyStr.s

Mystr = "abc~123-ABC\_*"
Debug MyStr
RemoveNonWordChars(@MyStr)
Debug MyStr

Re: Check non-word character without a regular expression

Posted: Wed Jul 26, 2023 8:28 pm
by SMaag
Yes, it looks o.k.! You can use *String.Character.
It is a little faster than the pChar[] Version because in assembler it is an operation more to do the indirect access.

The advantage of the Universal Pointer Methoded with pChar Structre is:
- you can access Characters directly by the Character-Position.
- you do not have to deal with the PointerSize, especally if you have to handle other Structures

More numeric checks you can find here:
https://github.com/Maagic7/PureBasicFra ... Numeric.pb

The trick wiht the univeral Pointer Structure I found in the SoruceCode of the PB-IDE
here is how it is defined there!

Code: Select all

 ; The UniversalPointer Structre is trick to get access to Buffer/Memory
  ; as differet PureBasic Values
  ; [4] = define a static Array with 4 valus (0..3). With [0] we get a kind
  ; of virtual Array and we just have a Pointer to each VAR-Type we want.
 
  Structure TUPtr  ; Universal Pointer (see PurePasic IDE Common.pb Structrue PTR)
    StructureUnion
      a.a[0]    ; ASCII   : 8 Bit unsigned  [0..255] 
      b.b[0]    ; BYTE    : 8 Bit signed    [-128..127]
      c.c[0]    ; CAHR    : 2 Byte unsigned [0..65535]
      w.w[0]    ; WORD    : 2 Byte signed   [-32768..32767]
      u.u[0]    ; UNICODE : 2 Byte unsigned [0..65535]
      l.l[0]    ; LONG    : 4 Byte signed   [-2147483648..2147483647]
      f.f[0]    ; FLOAT   : 4 Byte
      q.q[0]    ; QUAD    : 8 Byte signed   [-9223372036854775808..9223372036854775807]
      d.d[0]    ; DOUBLE  : 8 Byte float    
      i.i[0]    ; INTEGER : 4 or 8 Byte INT, depending on System
      *p.TUPtr[0] ; Pointer for TUPtr (it's possible and it's done in PB-IDE Source, but why???
    EndStructureUnion
  EndStructure


Re: Check non-word character without a regular expression

Posted: Wed Jul 26, 2023 9:27 pm
by Boulcat
Thanks for checking me out, I'll use *String.Character then.
I'm not sure I fully understood all the advantages of the universal pointer method with the pChar Structure. I'm going to keep your explanations to reread and use later when I'm a little more comfortable with memory and pointer.
Thanks :)

Re: Check non-word character without a regular expression

Posted: Thu Jul 27, 2023 4:47 am
by AZJIO
Previously, I tried to help while at work and therefore it was not possible to write code. Now here's my version.

Code: Select all

Procedure RemoveNonWordChars(*c.Character )
	Protected flag, *c2.Character
	
	If *c = 0 Or *c\c = 0
		ProcedureReturn 0
	EndIf
	*c2 = *c
	
	Repeat
		If ((*c\c >= '0' And *c\c <= '9') Or (*c\c >= 'a' And *c\c <= 'z') Or (*c\c >= 'A' And *c\c <= 'Z') Or *c\c = '_')
			*c2\c = *c\c
			*c + SizeOf(Character)
			*c2 + SizeOf(Character)
		Else
			*c + SizeOf(Character)
		EndIf
	Until Not *c\c
	*c2\c = 0
	
	If *c <> *c2
		flag = #True
	EndIf
	
	ProcedureReturn flag
EndProcedure

Define MyStr$

MyStr$ = "abc~123-ABC\_*"
Debug RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = "abc123ABC_"
Debug RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = ",!wt=-()ty3456!#$%&"
Debug RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = ""
Debug RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = "!@#$%*(){}[]"
Debug RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = "1234qwer"
Debug RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"
If in most cases the text consists of Latin letters, then a variant with less useless movements, in which the first loop looks for the first mismatch without doing anything (without copying and without shifting the second pointer), and as soon as it finds it, the second loop is turned on.

Code: Select all

Procedure RemoveNonWordChars(*c.Character )
	Protected flag, *c2.Character

	If *c = 0 Or *c\c = 0
		ProcedureReturn 0
	EndIf

	Repeat
		If Not ((*c\c >= '0' And *c\c <= '9') Or (*c\c >= 'a' And *c\c <= 'z') Or (*c\c >= 'A' And *c\c <= 'Z') Or *c\c = '_')
			flag = #True
			*c2 = *c
			*c + SizeOf(Character)
			Break
		EndIf
		*c + SizeOf(Character)
	Until Not *c\c


	If flag = #True
		Repeat
			If ((*c\c >= '0' And *c\c <= '9') Or (*c\c >= 'a' And *c\c <= 'z') Or (*c\c >= 'A' And *c\c <= 'Z') Or *c\c = '_')
				*c2\c = *c\c
				*c + SizeOf(Character)
				*c2 + SizeOf(Character)
			Else
				*c + SizeOf(Character)
			EndIf
		Until Not *c\c
		*c2\c = 0
	EndIf

	ProcedureReturn flag
EndProcedure

Define MyStr$

MyStr$ = "abc~123-ABC\_*"
RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = "abc123ABC_"
RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = ",!wt=-()ty3456!#$%&"
RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = ""
RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = "!@#$%*(){}[]"
RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"

MyStr$ = "1234qwer"
RemoveNonWordChars(@MyStr$)
Debug "|" + MyStr$ + "|"
You can make a third option, saving a list of pointers to the correct lines in the text, and fill the wrong ones with zeros, and then read the data and write using CopyMemoryString. But let's not complicate things, since this option most likely will not speed up the processing process.

Re: Check non-word character without a regular expression

Posted: Thu Jul 27, 2023 10:21 am
by Boulcat
Thanks for the effort AZJIO, it's almost the same thing and it does the job.
I'm not sure the return flag (0/1) is useful. It does its job whether the string is empty, identical or different.
Your 2nd variant code looks good for the concept but I'm not sure there's any real gain.

Anyway, thanks to both of you, I'm starting to feel a little more comfortable with pointers now :)

To nitpick, I measured the times with different options (Select vs If, While Wend vs Repeat Until)

Code: Select all

EnableExplicit

Procedure RemoveNonWordChars_WhileWend_Select_If(*c.Character )
  If *c = 0 Or *c\c = 0 : ProcedureReturn : EndIf
  
  Protected *c2.Character = *c
  
  Macro _KeepChar_()
    If *c2 <> *c
      *c2\c = *c\c
    EndIf
    *c2 + SizeOf(Character)
  EndMacro
  
  While *c\c
    Select *c\c
      Case '0' To '9'
        _KeepChar_()
      Case 'a' To 'z'
        _KeepChar_()
      Case 'A' To 'Z'
        _KeepChar_()
      Case '_'
        _KeepChar_()
    EndSelect
    *c + SizeOf(Character)
  Wend
  
  If *c <> *c2
    *c2\c = 0
  EndIf
EndProcedure

Procedure RemoveNonWordChars_WhileWend_Select(*c.Character )
  If *c = 0 Or *c\c = 0 : ProcedureReturn : EndIf
  
  Protected *c2.Character = *c
  
  Macro _KeepChars_()
    *c2\c = *c\c
    *c2 + SizeOf(Character)
  EndMacro
  
  While *c\c
    Select *c\c
      Case '0' To '9'
        _KeepChars_()
      Case 'a' To 'z'
        _KeepChars_()
      Case 'A' To 'Z'
        _KeepChars_()
      Case '_'
        _KeepChars_()
    EndSelect
    *c + SizeOf(Character)
  Wend
  *c2\c = 0
EndProcedure


Procedure RemoveNonWordChars_WhileWend_If(*c.Character )
  If *c = 0 Or *c\c = 0 : ProcedureReturn : EndIf
  
  Protected *c2.Character = *c
  While *c\c
    If ((*c\c >= '0' And *c\c <= '9') Or (*c\c >= 'a' And *c\c <= 'z') Or (*c\c >= 'A' And *c\c <= 'Z') Or *c\c = '_')
      If *c <> *c2
        *c2\c = *c\c
      EndIf
      *c2 + SizeOf(Character)
    EndIf
    *c + SizeOf(Character)
  Wend
  
  If *c <> *c2
    *c2\c = 0
  EndIf
EndProcedure

Procedure RemoveNonWordChars_RepeatUntil_If(*c.Character )
  If *c = 0 Or *c\c = 0 : ProcedureReturn : EndIf
  
  Protected *c2.Character = *c
  Repeat
    If ((*c\c >= '0' And *c\c <= '9') Or (*c\c >= 'a' And *c\c <= 'z') Or (*c\c >= 'A' And *c\c <= 'Z') Or *c\c = '_')
      If *c <> *c2
        *c2\c = *c\c
      EndIf
      *c2 + SizeOf(Character)
    EndIf
    *c + SizeOf(Character)
  Until Not *c\c
  
  If *c <> *c2
    *c2\c = 0
  EndIf
EndProcedure

Define MyStr$
CompilerIf Not #PB_Compiler_Debugger
  
  Define I, s1, s2, s3, s4, Loop = 1000000
  
  MyStr$ = "@abc~123-ABC\_*,!wt=-()ty3456!_#$%*àçéèÉÊ(){}[]XYZ"
  s1  = ElapsedMilliseconds()
  For I = 1 To Loop
    RemoveNonWordChars_WhileWend_Select_If(@MyStr$)
  Next I
  s1 = ElapsedMilliseconds() - s1
  
  MyStr$ = "@abc~123-ABC\_*,!wt=-()ty3456!_#$%*àçéèÉÊ(){}[]XYZ"
  s2  = ElapsedMilliseconds()
  For I = 1 To Loop
    RemoveNonWordChars_WhileWend_Select(@MyStr$)
  Next I
  s2 = ElapsedMilliseconds() - s2
  
  MyStr$ = "@abc~123-ABC\_*,!wt=-()ty3456!_#$%*àçéèÉÊ(){}[]XYZ"
  s3  = ElapsedMilliseconds()
  For I = 1 To Loop
    RemoveNonWordChars_WhileWend_If(@MyStr$)
  Next I
  s3 = ElapsedMilliseconds() - s3
  
  MyStr$ = "@abc~123-ABC\_*,!wt=-()ty3456!_#$%*àçéèÉÊ(){}[]XYZ"
  s4  = ElapsedMilliseconds()
  For I = 1 To Loop
    RemoveNonWordChars_RepeatUntil_If(@MyStr$)
  Next I
  s4 = ElapsedMilliseconds() - s4
  
  MessageRequester("Info", ~"RemoveNonWordChars_WhileWend_Select_If: \t" + Str(s1) + ~"ms\nRemoveNonWordChars_WhileWend_Select: \t" + Str(s2) + ~"ms\nRemoveNonWordChars_WhileWend_If: \t" + Str(s3) + ~"ms\nRemoveNonWordChars_RepeatUntil_If: \t" + Str(s4) + "ms")
  
CompilerElse
  
  MyStr$ = "@abc~123-ABC\_*,!wt=-()ty3456!_#$%*àçéèÉÊ(){}[]XYZ"
  Debug MyStr$
  RemoveNonWordChars_WhileWend_Select_If(@MyStr$)
  Debug MyStr$
  
CompilerEndIf
Result:
RemoveNonWordChars_WhileWend_Select_If: 48ms
RemoveNonWordChars_WhileWend_Select: 54ms
RemoveNonWordChars_WhileWend_If: 63ms
RemoveNonWordChars_RepeatUntil_If: 65ms

Do you have ~ same results ? here:
Tested If *c <> *c2 before before moving *c2\c = *c\c is a little faster
With multiple ranges, Select Case '0' To '9', 'a' To 'z',... is a little faster than If (*c\c >= '0' And *c\c <= '9') Or (*c\c >= 'a' And *c\c <= 'z') Or...
While *c\c : Wend is a little faster than Repeat : Until Not *c\c

Re: Check non-word character without a regular expression

Posted: Thu Jul 27, 2023 3:26 pm
by AZJIO
A shorter notation for Select.

Code: Select all

Procedure RemoveNonWordChars(*c.Character )
	If *c = 0 Or *c\c = 0 : ProcedureReturn : EndIf
	
	Protected *c2.Character = *c
	
	While *c\c
		Select *c\c
			Case '0' To '9', 'a' To 'z',  'A' To 'Z', '_'
				If *c2 <> *c
					*c2\c = *c\c
				EndIf
				*c2 + SizeOf(Character)
		EndSelect
		*c + SizeOf(Character)
	Wend
	
	If *c <> *c2
		*c2\c = 0
	EndIf
EndProcedure
You can make it this way to turn off the debugger, then turn it on and make output.

Code: Select all

DisableDebugger
;... here's the test section of the code
EnableDebugger
Debug ...
viewtopic.php?t=80656
string operations

If I remove the If *c2 <> *c condition, then the time increases from 83 ms to 101 ms, although in fact there is no point in checking from the first step *c2 <> *c. In theory it should work faster

Re: Check non-word character without a regular expression

Posted: Thu Jul 27, 2023 4:10 pm
by Boulcat
AZJIO wrote: Thu Jul 27, 2023 3:26 pm A shorter notation for Select.
Yes, it's better for reading and we don't need a macro with it. I assume it's the final version :)
AZJIO wrote: Thu Jul 27, 2023 3:26 pm You can make it this way to turn off the debugger, then turn it on and make output.

Code: Select all

DisableDebugger
;... here's the test section of the code
EnableDebugger
Debug ...
um, I've just tried it, DisableDebugger function seems to be done at runtime and it's not good for measuring execution times.
It's not the same thing as create an exe or disabling the debugger in compiler options + F5 (3600 ms vs 56 ms).
AZJIO wrote: Thu Jul 27, 2023 3:26 pm If I remove the If *c2 <> *c condition, then the time increases from 83 ms to 101 ms, although in fact there is no point in checking from the first step *c2 <> *c. In theory it should work faster
Not sure and i tend to agree with the execution times. It seems faster to compare the 2 pointers and write the character(s) to memory only if the writing position changes, rather than writing to memory in all cases.

Re: Check non-word character without a regular expression

Posted: Thu Jul 27, 2023 5:22 pm
by AZJIO
Boulcat wrote: Thu Jul 27, 2023 4:10 pm DisableDebugger function seems to be done at runtime
No. DisableDebugger prevents debug information from being embedded in code. You must enable EnableDebugger just before displaying information. And then immediately insert DisableDebugger again.
Boulcat wrote: Thu Jul 27, 2023 4:10 pm Not sure and i tend to agree with the execution times. It seems faster to compare the 2 pointers and write the character(s) to memory only if the writing position changes, rather than writing to memory in all cases.
In the string "@abc~1..." the first character is @, so your pointers don't match right away. After the first shift has occurred, they will never be the same, because the second pointer is always less. This is provided that your first character is not a word character.
Make pointer output in the debugger by specifying LOOP = 1 (and remove DisableDebugger)

Code: Select all

  While *c\c
  	Debug *c
  	Debug *c2
  	Select *c\c
This is how the debugger is embedded with the ability to output using "Debug"

Code: Select all

EnableExplicit
DisableDebugger 

Procedure RemoveNonWordChars(*c.Character )
	If *c = 0 Or *c\c = 0 : ProcedureReturn : EndIf
	
	Protected *c2.Character = *c
	
	While *c\c
		Select *c\c
			Case '0' To '9', 'a' To 'z',  'A' To 'Z', '_'
				If *c2 <> *c
					*c2\c = *c\c
				EndIf
				*c2 + SizeOf(Character)
		EndSelect
		*c + SizeOf(Character)
	Wend
	
	If *c <> *c2
		*c2\c = 0
	EndIf
EndProcedure

Define MyStr$

Define I, s1, LOOP = 1000000

MyStr$ = "@abc~123-ABC\_*,!wt=-()ty3456!_#$%*aceeEE(){}[]XYZ"
s1  = ElapsedMilliseconds()
For I = 1 To LOOP
	RemoveNonWordChars(@MyStr$)
Next I
s1 = ElapsedMilliseconds() - s1
EnableDebugger
Debug s1

DisableDebugger 

MyStr$ = "@abc~123-ABC\_*,!wt=-()ty3456!_#$%*aceeEE(){}[]XYZ"
s1  = ElapsedMilliseconds()
For I = 1 To LOOP
	RemoveNonWordChars(@MyStr$)
Next I
s1 = ElapsedMilliseconds() - s1
EnableDebugger
Debug s1

Re: Check non-word character without a regular expression

Posted: Thu Jul 27, 2023 6:46 pm
by Boulcat
As much for me, I had added DisableDebugger in the main part before the loop.
It works better indeed if placed at the beginning, before the RemoveNonWordChars procedure.

I agree that if the first character is a non-word, all subsequent characters are written.
And, as you say, it shouldn't be any faster in theory. But without knowing why, it is faster by adding If *c <> *c2 before writing!!!
Well, ayway, in most cases there's no change to be made to the string.