Regexp Expression

Just starting out? Need help? Post your questions and find answers here.
AZJIO
Addict
Addict
Posts: 2227
Joined: Sun May 14, 2017 1:48 am

Re: Regexp Expression

Post by AZJIO »

Here's another option

Code: Select all

If CreateRegularExpression(1,"(?<=\d)[\h]+(?=\d{3})" , #PB_RegularExpression_NoCase)
Old
"[a-f\d]+"
New
"[a-z\d]+"
myInput$="AS 18 000 AE 25 000 35 000"
AS> A-F
I assumed that you had a range of hexadecimal numbers, but from the next message I realized that this is more than A-F\d

You can not use an unknown length in the previous expression "(?<=\d+)" Therefore, we use \K
Last edited by AZJIO on Sat Dec 02, 2023 6:29 pm, edited 1 time in total.
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Re: Regexp Expression

Post by loulou2522 »

If i use this sentence
text$="AD 500 BC 24 000 45 000 50 000 500"
the lasr occurence is concatened with 50000 and produced 50000500
AZJIO
Addict
Addict
Posts: 2227
Joined: Sun May 14, 2017 1:48 am

Re: Regexp Expression

Post by AZJIO »

Code: Select all

text$="AD 500 BC 24 000 45 000 50 000 500"

; Removes spaces between numbers
If CreateRegularExpression(1,"\h\d{1,2}\K[\h;.]+(?=\d{3}\b)", #PB_RegularExpression_NoCase)
  text$ = ReplaceRegularExpression(1, text$, "")
Else
  Debug RegularExpressionError()
EndIf
If CreateRegularExpression(0,"[a-f\d]+", #PB_RegularExpression_NoCase)
  Dim Result$(0)
  NbFound = ExtractRegularExpression(0, text$, Result$())
  For i = 0 To NbFound-1
    Debug Result$(i)
  Next
  FreeRegularExpression(0)
EndIf
\d{1,2} - The previous number can only be from two digits
Added "\b" to demand the border of the number
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Re: Regexp Expression

Post by loulou2522 »

It's possible than the number could be 158 300 1 158 000 os it possible ?
AZJIO
Addict
Addict
Posts: 2227
Joined: Sun May 14, 2017 1:48 am

Re: Regexp Expression

Post by AZJIO »

What result should be?

??
158300
1158
000
Search for pairs of numbers to combine them?
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Re: Regexp Expression

Post by loulou2522 »

The result for 158 200 would be 158200 and for 1 158 000 1158000
Thanks for your help , regexp are very complicate
AZJIO
Addict
Addict
Posts: 2227
Joined: Sun May 14, 2017 1:48 am

Re: Regexp Expression

Post by AZJIO »

"AD 500 BC 24 000 45 000 50 000 500"
In order not to make regular expression too complicated, I need to know:
1. Is this a unity line or a list of such lines
2. Whether the beginning of the line is always the same, whether it corresponds to the template: "Word | Number | Word | several numbers."
3. How many numbers should be at the end. It is very difficult to determine what numbers to combine. Each time, new rules and I do not want to adjust the regular expression under a special case, and then change in connection with the new rules.
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Re: Regexp Expression

Post by loulou2522 »

Here's what I do
-1 I load a text file containing the standardized balance sheet data for a company in France.
- 2 The format is always the same
Letter letter amount Letter letter amount amount amount amount
the amount can have a value between 1 and 1 000 000 €uros the only constraint being to be able to transform these numbers into amount. I don't do any processing on the letters, they stay as they are. I hope I've given you enough information.
AZJIO
Addict
Addict
Posts: 2227
Joined: Sun May 14, 2017 1:48 am

Re: Regexp Expression

Post by AZJIO »

loulou2522 wrote: Sat Dec 02, 2023 7:25 pm Letter letter amount Letter letter amount amount amount amount
1. Is the repetition of "amount" 4 times with the exact rule? In previous messages, there were 2 times "Amount".
2. Since you have a list, there is probably a special marking, for example, between the categories of thousands of "30 000" a space of Chr(32), and between the numbers "20 000" and "30 000" - #TAB$ Chr(9). If there is marking, then many problems with the separation of numbers disappear.
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Re: Regexp Expression

Post by loulou2522 »

here is an exemple
chaine = "AD 1 201 500 AE 1 200 000 1 500 10 500"
all the string are the same built in the file
AZJIO
Addict
Addict
Posts: 2227
Joined: Sun May 14, 2017 1:48 am

Re: Regexp Expression

Post by AZJIO »

Code: Select all

#RegExp = 0

Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"

If CreateRegularExpression(#RegExp, "([a-z]{2})  \h  ([\d\h]+)   ([a-z]{2})   \h   (\d{3}[\h;.]\d{3}|\d{1,2}[\h;.]\d{3}|\d[\h;.]\d{3}[\h;.]\d{3}) \h (\d{3}[\h;.]\d{3}|\d{1,2}[\h;.]\d{3}|\d[\h;.]\d{3}[\h;.]\d{3})", #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
	Groups = CountRegularExpressionGroups(#RegExp)
	If Not Groups
		Debug 0
	EndIf
	If ExamineRegularExpression(#RegExp, Text$)
		While NextRegularExpressionMatch(#RegExp)
			For i = 1 To Groups
				Debug RegularExpressionGroup(#RegExp, i)
			Next
		Wend
	EndIf
Else
	Debug RegularExpressionError()
EndIf

Use RegExpPB

Code: Select all

#RegExp = 0

Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"

num$ = "(\d{3}[\h;.]\d{3}    |     \d{1,2}[\h;.]\d{3}     |     \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"

RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
; RegExp + "([\d\h]+)"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
; Debug RegExp$

If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
	Groups = CountRegularExpressionGroups(#RegExp)
	If Not Groups
		Debug 0
	EndIf
	If ExamineRegularExpression(#RegExp, Text$)
		While NextRegularExpressionMatch(#RegExp)
			For i = 1 To Groups
				Debug RegularExpressionGroup(#RegExp, i)
			Next
		Wend
	EndIf
Else
	Debug RegularExpressionError()
EndIf
Last edited by AZJIO on Sat Dec 02, 2023 10:02 pm, edited 1 time in total.
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Re: Regexp Expression

Post by loulou2522 »

here is the result
the original string is
[AD 1 201 500 AE 1 200 000 1 500 10 500]
when i execute the programm the result is
AD
1 201 500
AE
1 200 000
1 500
what i find is to have the result like that
AD
1201500
AE
1200000
1500
10500
AZJIO
Addict
Addict
Posts: 2227
Joined: Sun May 14, 2017 1:48 am

Re: Regexp Expression

Post by AZJIO »

Code: Select all

#RegExp = 0

; Text$ = "AD 1 201 500 AE 1 200 000 1 500"
Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"

num$ = "(\d{3}[\h;.]\d{3}    |     \d{1,2}[\h;.]\d{3}     |     \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"

RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
; RegExp + "([\d\h]+)"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h*"
RegExp$ + num$ + "?"
; Debug RegExp$

If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
	Groups = CountRegularExpressionGroups(#RegExp)
	If Not Groups
		Debug 0
	EndIf
	If ExamineRegularExpression(#RegExp, Text$)
		While NextRegularExpressionMatch(#RegExp)
			For i = 1 To Groups
				If i = 2 Or i > 3
					Debug ReplaceString(RegularExpressionGroup(#RegExp, i), " ", "") ; There may be a regular expression for [\h;.]
				Else
					Debug RegularExpressionGroup(#RegExp, i)
				EndIf
			Next
		Wend
	EndIf
Else
	Debug RegularExpressionError()
EndIf
The minimum number is 1000, the maximum number 9 999 999
loulou2522
Enthusiast
Enthusiast
Posts: 553
Joined: Tue Oct 14, 2014 12:09 pm

Re: Regexp Expression

Post by loulou2522 »

Thanks azijo.
There is only one last problem when a number is < 1000 the expression doesn't work
Can you find a solution
example of string
Text$ = "AD 500 AE 1 200 000 1 500 10 500"
AZJIO
Addict
Addict
Posts: 2227
Joined: Sun May 14, 2017 1:48 am

Re: Regexp Expression

Post by AZJIO »

Code: Select all

#RegExp = 0

; Text$ = "AD 1 201 500 AE 1 200 000 1 500"
Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"
Text$ = "AD 500 AE 1 200 000 1 500 10 500" 

num$ = "(\d{3}[\h;.]\d{3}    |     \d{1,2}[\h;.]\d{3}     |     \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"

RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + "([\d\h]+)"
; RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h*"
RegExp$ + num$ + "?"
; Debug RegExp$

If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
	Groups = CountRegularExpressionGroups(#RegExp)
	If Not Groups
		Debug 0
	EndIf
	If ExamineRegularExpression(#RegExp, Text$)
		While NextRegularExpressionMatch(#RegExp)
			For i = 1 To Groups
				If i = 2 Or i > 3
					Debug ReplaceString(RegularExpressionGroup(#RegExp, i), " ", "") ; There may be a regular expression for [\h;.]
				Else
					Debug RegularExpressionGroup(#RegExp, i)
				EndIf
			Next
		Wend
	EndIf
Else
	Debug RegularExpressionError()
EndIf
Post Reply