Regexp Expression

AZJIO · Post by **AZJIO** » Sat Dec 02, 2023 6:18 pm

Here's another option

If CreateRegularExpression(1,"(?<=\d)[\h]+(?=\d{3})" , #PB_RegularExpression_NoCase)

Old
"[a-f\d]+"
New
"[a-z\d]+"

myInput$="AS 18 000 AE 25 000 35 000"

AS> A-F
I assumed that you had a range of hexadecimal numbers, but from the next message I realized that this is more than A-F\d

You can not use an unknown length in the previous expression "(?<=\d+)" Therefore, we use \K

loulou2522 · Post by **loulou2522** » Sat Dec 02, 2023 6:26 pm

If i use this sentence
text$="AD 500 BC 24 000 45 000 50 000 500"
the lasr occurence is concatened with 50000 and produced 50000500

AZJIO · Post by **AZJIO** » Sat Dec 02, 2023 6:35 pm

Code: Select all

text$="AD 500 BC 24 000 45 000 50 000 500"

; Removes spaces between numbers
If CreateRegularExpression(1,"\h\d{1,2}\K[\h;.]+(?=\d{3}\b)", #PB_RegularExpression_NoCase)
  text$ = ReplaceRegularExpression(1, text$, "")
Else
  Debug RegularExpressionError()
EndIf
If CreateRegularExpression(0,"[a-f\d]+", #PB_RegularExpression_NoCase)
  Dim Result$(0)
  NbFound = ExtractRegularExpression(0, text$, Result$())
  For i = 0 To NbFound-1
    Debug Result$(i)
  Next
  FreeRegularExpression(0)
EndIf

\d{1,2} - The previous number can only be from two digits
Added "\b" to demand the border of the number

loulou2522 · Post by **loulou2522** » Sat Dec 02, 2023 6:46 pm

It's possible than the number could be 158 300 1 158 000 os it possible ?

AZJIO · Post by **AZJIO** » Sat Dec 02, 2023 6:52 pm

What result should be?

??

158300
1158
000

Search for pairs of numbers to combine them?

loulou2522 · Post by **loulou2522** » Sat Dec 02, 2023 6:59 pm

The result for 158 200 would be 158200 and for 1 158 000 1158000
Thanks for your help , regexp are very complicate

AZJIO · Post by **AZJIO** » Sat Dec 02, 2023 7:14 pm

"AD 500 BC 24 000 45 000 50 000 500"

In order not to make regular expression too complicated, I need to know:
1. Is this a unity line or a list of such lines
2. Whether the beginning of the line is always the same, whether it corresponds to the template: "Word | Number | Word | several numbers."
3. How many numbers should be at the end. It is very difficult to determine what numbers to combine. Each time, new rules and I do not want to adjust the regular expression under a special case, and then change in connection with the new rules.

loulou2522 · Post by **loulou2522** » Sat Dec 02, 2023 7:25 pm

Here's what I do
-1 I load a text file containing the standardized balance sheet data for a company in France.
- 2 The format is always the same
Letter letter amount Letter letter amount amount amount amount
the amount can have a value between 1 and 1 000 000 €uros the only constraint being to be able to transform these numbers into amount. I don't do any processing on the letters, they stay as they are. I hope I've given you enough information.

AZJIO · Post by **AZJIO** » Sat Dec 02, 2023 7:58 pm

loulou2522 wrote: Sat Dec 02, 2023 7:25 pm Letter letter amount Letter letter amount amount amount amount

1. Is the repetition of "amount" 4 times with the exact rule? In previous messages, there were 2 times "Amount".
2. Since you have a list, there is probably a special marking, for example, between the categories of thousands of "30 000" a space of Chr(32), and between the numbers "20 000" and "30 000" - #TAB$ Chr(9). If there is marking, then many problems with the separation of numbers disappear.

loulou2522 · Post by **loulou2522** » Sat Dec 02, 2023 8:39 pm

here is an exemple
chaine = "AD 1 201 500 AE 1 200 000 1 500 10 500"
all the string are the same built in the file

AZJIO · Post by **AZJIO** » Sat Dec 02, 2023 9:45 pm

Code: Select all

#RegExp = 0

Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"

If CreateRegularExpression(#RegExp, "([a-z]{2})  \h  ([\d\h]+)   ([a-z]{2})   \h   (\d{3}[\h;.]\d{3}|\d{1,2}[\h;.]\d{3}|\d[\h;.]\d{3}[\h;.]\d{3}) \h (\d{3}[\h;.]\d{3}|\d{1,2}[\h;.]\d{3}|\d[\h;.]\d{3}[\h;.]\d{3})", #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
	Groups = CountRegularExpressionGroups(#RegExp)
	If Not Groups
		Debug 0
	EndIf
	If ExamineRegularExpression(#RegExp, Text$)
		While NextRegularExpressionMatch(#RegExp)
			For i = 1 To Groups
				Debug RegularExpressionGroup(#RegExp, i)
			Next
		Wend
	EndIf
Else
	Debug RegularExpressionError()
EndIf

Use RegExpPB

Code: Select all

#RegExp = 0

Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"

num$ = "(\d{3}[\h;.]\d{3}    |     \d{1,2}[\h;.]\d{3}     |     \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"

RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
; RegExp + "([\d\h]+)"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
; Debug RegExp$

If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
	Groups = CountRegularExpressionGroups(#RegExp)
	If Not Groups
		Debug 0
	EndIf
	If ExamineRegularExpression(#RegExp, Text$)
		While NextRegularExpressionMatch(#RegExp)
			For i = 1 To Groups
				Debug RegularExpressionGroup(#RegExp, i)
			Next
		Wend
	EndIf
Else
	Debug RegularExpressionError()
EndIf

loulou2522 · Post by **loulou2522** » Sat Dec 02, 2023 10:02 pm

here is the result
the original string is

[AD 1 201 500 AE 1 200 000 1 500 10 500]

when i execute the programm the result is

AD
1 201 500
AE
1 200 000
1 500

what i find is to have the result like that

AD
1201500
AE
1200000
1500
10500

AZJIO · Post by **AZJIO** » Sat Dec 02, 2023 10:08 pm

Code: Select all

#RegExp = 0

; Text$ = "AD 1 201 500 AE 1 200 000 1 500"
Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"

num$ = "(\d{3}[\h;.]\d{3}    |     \d{1,2}[\h;.]\d{3}     |     \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"

RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
; RegExp + "([\d\h]+)"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h*"
RegExp$ + num$ + "?"
; Debug RegExp$

If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
	Groups = CountRegularExpressionGroups(#RegExp)
	If Not Groups
		Debug 0
	EndIf
	If ExamineRegularExpression(#RegExp, Text$)
		While NextRegularExpressionMatch(#RegExp)
			For i = 1 To Groups
				If i = 2 Or i > 3
					Debug ReplaceString(RegularExpressionGroup(#RegExp, i), " ", "") ; There may be a regular expression for [\h;.]
				Else
					Debug RegularExpressionGroup(#RegExp, i)
				EndIf
			Next
		Wend
	EndIf
Else
	Debug RegularExpressionError()
EndIf

The minimum number is 1000, the maximum number 9 999 999

loulou2522 · Post by **loulou2522** » Sat Dec 02, 2023 10:51 pm

Thanks azijo.
There is only one last problem when a number is < 1000 the expression doesn't work
Can you find a solution
example of string

Text$ = "AD 500 AE 1 200 000 1 500 10 500"

AZJIO · Post by **AZJIO** » Sun Dec 03, 2023 5:36 am

Code: Select all

#RegExp = 0

; Text$ = "AD 1 201 500 AE 1 200 000 1 500"
Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"
Text$ = "AD 500 AE 1 200 000 1 500 10 500" 

num$ = "(\d{3}[\h;.]\d{3}    |     \d{1,2}[\h;.]\d{3}     |     \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"

RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + "([\d\h]+)"
; RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h*"
RegExp$ + num$ + "?"
; Debug RegExp$

If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
	Groups = CountRegularExpressionGroups(#RegExp)
	If Not Groups
		Debug 0
	EndIf
	If ExamineRegularExpression(#RegExp, Text$)
		While NextRegularExpressionMatch(#RegExp)
			For i = 1 To Groups
				If i = 2 Or i > 3
					Debug ReplaceString(RegularExpressionGroup(#RegExp, i), " ", "") ; There may be a regular expression for [\h;.]
				Else
					Debug RegularExpressionGroup(#RegExp, i)
				EndIf
			Next
		Wend
	EndIf
Else
	Debug RegularExpressionError()
EndIf

PureBasic Forums - English

Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression

Re: Regexp Expression