MID strings

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
Andras
User
User
Posts: 34
Joined: Wed Oct 27, 2004 6:58 am
Location: Germany

Post by Andras »

Oh, I didn't know that you can directly read one byte with \b ... is this documented?
Undank ist der Welten Lohn
Andras
User
User
Posts: 34
Joined: Wed Oct 27, 2004 6:58 am
Location: Germany

Post by Andras »

Mine is then:

Code: Select all

Procedure MidSet_Fast(*PtrStr.BYTE, DestPos.l, DestLen.l, *ReplaceStr.BYTE) 
  
  If DestLen>1 
    *PtrStr+DestPos-1+DestLen
    LastChar.b=*PtrStr\b
    PokeS(*PtrStr-DestLen, PeekS(*ReplaceStr, DestLen)) 
    *PtrStr\b=LastChar
  Else 
    *PtrStr+DestPos-1
    *PtrStr\b=*ReplaceStr\b
  EndIf 
  
EndProcedure 
Yours is faster for ReplaceStrings with a Length from 2 to 7... Mine is faster if ReplaceString is 1 Char or more than 7 Chars ;)
Undank ist der Welten Lohn
Andras
User
User
Posts: 34
Joined: Wed Oct 27, 2004 6:58 am
Location: Germany

Post by Andras »

This one should always be faster than yours:

Code: Select all

Procedure MidSet_Fast(*PtrStr.BYTE, DestPos.l, DestLen.l, *ReplaceStr.BYTE) 

  If DestLen>4
    CopyMemory(*ReplaceStr,*PtrStr+DestPos-1,DestLen)
  ElseIf DestLen=1
    *PtrStr+DestPos-1 
    *PtrStr\b=*ReplaceStr\b 
  ElseIf DestLen=2
    *PtrStr+DestPos-1 
    *PtrStr\b=*ReplaceStr\b 
    *PtrStr+1
    *ReplaceStr+1
    *PtrStr\b=*ReplaceStr\b 
  ElseIf DestLen=3
    *PtrStr+DestPos-1 
    *PtrStr\b=*ReplaceStr\b 
    *PtrStr+1
    *ReplaceStr+1
    *PtrStr\b=*ReplaceStr\b 
    *PtrStr+1
    *ReplaceStr+1
    *PtrStr\b=*ReplaceStr\b 
  ElseIf DestLen=4
    *PtrStr+DestPos-1 
    *PtrStr\b=*ReplaceStr\b 
    *PtrStr+1
    *ReplaceStr+1
    *PtrStr\b=*ReplaceStr\b 
    *PtrStr+1
    *ReplaceStr+1
    *PtrStr\b=*ReplaceStr\b 
    *PtrStr+1
    *ReplaceStr+1
    *PtrStr\b=*ReplaceStr\b 
  EndIf

EndProcedure 
I think I've got too much time ;)
Undank ist der Welten Lohn
User avatar
Frarth
Enthusiast
Enthusiast
Posts: 241
Joined: Tue Jul 21, 2009 11:11 am
Location: On the planet
Contact:

Re: MID strings

Post by Frarth »

This may not be the fastest approach, but it is safe and flexible. Posting this here for those looking up the mid$ issue.

The flexibility is the 'length' parameter. If length = 0, the new string will be inserted at the specified start position.

Code: Select all

Procedure.s SetMid(old.s, start.l, length.l, new.s)
  Protected count.l
  
  count = Len(old)
  If start < 1 Or start > count
    ProcedureReturn old
  EndIf
  
  count = (count - start) + 1
  If length < 0 Or length > count
    length = count
  EndIf
  
  ProcedureReturn Left(old, start - 1) + new + Mid(old, start + length)
EndProcedure
example:

Code: Select all

s = SetMid("male", 3, 0, "p") ; returns "maple"
s = SetMid(s, 1, 1, "") ;returns "aple"
s = SetMid(s, 3, 0, "p") ; returns "apple"
PureBasic 5.41 LTS | Xubuntu 16.04 (x32) | Windows 7 (x64)
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: MID strings

Post by Lunasole »

The Mid$ = "a" construction was very cool and handy. It was equal to writing bytes directly to string memory, thus attempts to write it as separate procedure returning string by value are NOT right, it will be extremely slow comparing to clear Mid$ = .
Sad that PB has no it. ReplaceString with #PB_InPlace paramether is similar, but it also does some trash manipulations (at least comparing 1 char) and is relatively expensive because of that.

I have used such a code doing the same, but it is far from Mid syntax (and usable only inside procedures, because need to define some temp variables), also it looks even more unclear to be working both with unicode/ansi

Code: Select all

Procedure$ SimpleXOR (pNStr$) 
	Protected sLen = Len(pNStr$)
	Protected sEn.w, tXr.c = 64
	For sEn = 1 To sLen
		PokeC(@pNStr$ + (sEn - 1)  * SizeOf(tXr), Asc(Mid(pNStr$, sEn, 1)) ! tXr) ; a kind of Mid$ =
	Next

	ProcedureReturn pNStr$
EndProcedure

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Define.s A = SimpleXOR("test")
Define.s B = SimpleXOR(A)

Debug A
Debug B
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
kenmo
Addict
Addict
Posts: 2033
Joined: Tue Dec 23, 2003 3:54 am

Re: MID strings

Post by kenmo »

A trick for 1-character replacement:

Code: Select all

Structure _CMidStruct
  c.c[0]
EndStructure
Threaded *_CMid._CMidStruct
Macro CRep(StringVar, Position, NewChar)
  *_CMid = @StringVar
  *_CMid\c[(Position)-1] = (NewChar)
EndMacro

test$ = "Hello World!"
Debug test$

CRep(test$,  1, 'W')
CRep(test$,  7, 'H')
CRep(test$, 12, '?')
Debug test$
Alternate:

Code: Select all

Macro CRep(StringVar, Position, NewChar)
  PokeC(@StringVar + ((Position)-1)*SizeOf(CHARACTER), Asc(NewChar))
EndMacro

test$ = "Hello World!"
Debug test$

CRep(test$,  1, "W")
CRep(test$,  7, "H")
CRep(test$, 12, "?")
Debug test$
Last edited by kenmo on Fri Feb 05, 2016 8:32 pm, edited 1 time in total.
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: MID strings

Post by Lunasole »

Here are some more, closer to Mid = original.

Code: Select all

; that first one can only "mid" single char of string
; problems will be if Pos set incorrectly
Macro PoorMidEmulation (pStr, Pos, Char)
	CompilerIf #PB_Compiler_Unicode
		PokeC(@pStr + (Pos - 1)  * 2, Asc(Char))
	CompilerElse
		PokeC(@pStr + (Pos - 1), Asc(Char))
	CompilerEndIf
EndMacro

; that second is a bit improved, it can "mid" a whole part of string
; again, problems will be if Pos set incorrectly
Macro PoorMidEmulation2 (pStr, Pos, Char)
	CompilerIf #PB_Compiler_Unicode
		CopyMemory(@Char, @pStr + (Pos - 1) * 2, Len(Char) * 2)
	CompilerElse
		CopyMemory(@Char, @pStr + (Pos - 1), Len(Char))
	CompilerEndIf
EndMacro

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


Define.s TEST = "test"

Debug TEST ; clear variable

PoorMidEmulation(TEST, 1, "R")
Debug TEST	; mid = single char

PoorMidEmulation2(TEST, 2, "ock")
Debug TEST   ; mid = part of string
That second "pooremulation" can be improved to full mid equivalent (by adding max length control), but of course it is not so beautiful as original "Mid$ (String, 1) = Mid$ (String, 2)" construction
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
kenmo
Addict
Addict
Posts: 2033
Joined: Tue Dec 23, 2003 3:54 am

Re: MID strings

Post by kenmo »

String replacement macro. Definitely NOT overflow-safe! Use carefully.

Code: Select all

Macro SRep(StringVar, Position, NewString, MaxChars = -1)
  PokeS(@StringVar + ((Position)-1)*SizeOf(CHARACTER), (NewString), MaxChars, #PB_String_NoZero)
EndMacro

test$ = "Hello World!"
Debug test$

SRep(test$,  2, "ond")
SRep(test$,  8, "eir???", 3)
Debug test$
User avatar
Frarth
Enthusiast
Enthusiast
Posts: 241
Joined: Tue Jul 21, 2009 11:11 am
Location: On the planet
Contact:

Re: MID strings

Post by Frarth »

@Lunasole, I would not try to hold on to the 'beauty' of the MID$ = syntax, simply because you were used to it. In the BASIC world that statement was inconsistent, illogical and to some even confusing. The syntax is a left-over from the days when programming languages only had to deal with 1-byte characters (ASCII). Today we have multi-byte character encodings which makes things more complicated and comparatively slower because of the memory overflow checking and string space adjustment, which is inevitable.

The easiest and best solution may not be the 'fastest' but unless you need to manipulate thousands of strings in a row, my example works great and is even more flexible allowing to insert strings as well, which I use quite often. Also, these days computers are so much faster than in the 1-byte days, that my example processes faster than MID$ on the early computers. :wink:
PureBasic 5.41 LTS | Xubuntu 16.04 (x32) | Windows 7 (x64)
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: MID strings

Post by Dude »

Doesn't InsertString() do what you want?

http://www.purebasic.com/documentation/ ... tring.html
User avatar
Frarth
Enthusiast
Enthusiast
Posts: 241
Joined: Tue Jul 21, 2009 11:11 am
Location: On the planet
Contact:

Re: MID strings

Post by Frarth »

Dude wrote:Doesn't InsertString() do what you want?

http://www.purebasic.com/documentation/ ... tring.html
No, because InsertString does what it says: it only inserts. AFAIK, PB does not have a procedure to insert/replace parts of a string using only position and length as the MID$ statement does in other BASIC dialects.
PureBasic 5.41 LTS | Xubuntu 16.04 (x32) | Windows 7 (x64)
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: MID strings

Post by Lunasole »

Frarth wrote:@Lunasole, I would not try to hold on to the 'beauty' of the MID$ = syntax, simply because you were used to it. In the BASIC world that statement was inconsistent, illogical and to some even confusing. The syntax is a left-over from the days when programming languages only had to deal with 1-byte characters (ASCII). Today we have multi-byte character encodings which makes things more complicated and comparatively slower because of the memory overflow checking and string space adjustment, which is inevitable.
It is not more ilogical than "swap" keyword for example.
And it must not be overflow-safe of course, cause that thing meant as variant of direct memory manipulation, like PokeS/PokeC.
Also even if add all check to make it fully safe it will be several times faster than creating new string using procedure return or doing concatenation, it is no matter 1-byte chars or 2, just multiply * 2 is performed.

Frarth wrote: The easiest and best solution may not be the 'fastest' but unless you need to manipulate thousands of strings in a row, my example works great and is even more flexible allowing to insert strings as well, which I use quite often. Also, these days computers are so much faster than in the 1-byte days, that my example processes faster than MID$ on the early computers. :wink:
I didn't said about your example (and other previous here used procedure return), just noticed that this is different from what Mid = meant to do. Surely your solution is fine as it works.
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
Lunasole
Addict
Addict
Posts: 1091
Joined: Mon Oct 26, 2015 2:55 am
Location: UA
Contact:

Re: MID strings

Post by Lunasole »

What about speed, just for interest I performed small test and got following results with 500k strings and 500k * 7 replacement operations:

Regular replacement (equal to procedure returning string) :: 1074ms (slowest)
Replacement with "In place" flag (close to mid logic) :: 248ms (4.3x faster)
Replacement with Mid (using variants me or @kenmo posted) :: 186ms (5.7x faster)


Well the 2nd and 3rd showing almost no difference. It is just unhandy to use ReplaceString this way:

Code: Select all

T$ = "test"
ReplaceString(T$, "e", "o", #PB_String_NoCase | #PB_String_InPlace, 2, 1)

;comparing to mid macro:
MidS(T$, 2, "o")
Also 500k strings is a huge data, and the worst overall time is only 1 second (additionally with debugger turned on).
So yes @Frarth, in typical case it anyway will be fast enough even if allocating new strings.
"W̷i̷s̷h̷i̷n̷g o̷n a s̷t̷a̷r"
User avatar
Frarth
Enthusiast
Enthusiast
Posts: 241
Joined: Tue Jul 21, 2009 11:11 am
Location: On the planet
Contact:

Re: MID strings

Post by Frarth »

Lunasole wrote:It is not more ilogical than "swap" keyword for example.
When I say illogical and confusing, I mean that the three basic functions LEFT$, RIGHT$ and MID$ behave the same. If MID$= is allowed, then LEFT$= and RIGHT$= should also be allowed? This is what I find illogical. But that is just my personal opinion.
PureBasic 5.41 LTS | Xubuntu 16.04 (x32) | Windows 7 (x64)
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: MID strings

Post by Dude »

Frarth wrote:InsertString does what it says: it only inserts.
Sorry, I thought the request to insert something with Mid, but I see now the request is to replace a character (or characters) in-place, at the specified position. My bad.
eevee wrote:I now have to use something as convoluted as
Come now, is using Left and Right so bad?
eevee wrote:set it as a Procedure (with the parameter-passing overheads that would incur)
So make it a macro instead, which doesn't suffer the overheads of procedures.
Post Reply