Just starting out? Need help? Post your questions and find answers here.
AZJIO
Addict
Posts: 2228 Joined: Sun May 14, 2017 1:48 am
Post
by AZJIO » Sat Dec 02, 2023 6:18 pm
Here's another option
Code: Select all
If CreateRegularExpression(1,"(?<=\d)[\h]+(?=\d{3})" , #PB_RegularExpression_NoCase)
Old
"[a-f\d]+"
New
"[a-z\d]+"
myInput$="AS 18 000 AE 25 000 35 000"
AS> A-F
I assumed that you had a range of hexadecimal numbers, but from the next message I realized that this is more than A-F\d
You can not use an unknown length in the previous expression "(?<=\d+)" Therefore, we use \K
Last edited by
AZJIO on Sat Dec 02, 2023 6:29 pm, edited 1 time in total.
loulou2522
Enthusiast
Posts: 553 Joined: Tue Oct 14, 2014 12:09 pm
Post
by loulou2522 » Sat Dec 02, 2023 6:26 pm
If i use this sentence
text$="AD 500 BC 24 000 45 000 50 000 500"
the lasr occurence is concatened with 50000 and produced 50000500
AZJIO
Addict
Posts: 2228 Joined: Sun May 14, 2017 1:48 am
Post
by AZJIO » Sat Dec 02, 2023 6:35 pm
Code: Select all
text$="AD 500 BC 24 000 45 000 50 000 500"
; Removes spaces between numbers
If CreateRegularExpression(1,"\h\d{1,2}\K[\h;.]+(?=\d{3}\b)", #PB_RegularExpression_NoCase)
text$ = ReplaceRegularExpression(1, text$, "")
Else
Debug RegularExpressionError()
EndIf
If CreateRegularExpression(0,"[a-f\d]+", #PB_RegularExpression_NoCase)
Dim Result$(0)
NbFound = ExtractRegularExpression(0, text$, Result$())
For i = 0 To NbFound-1
Debug Result$(i)
Next
FreeRegularExpression(0)
EndIf
\d{1,2} - The previous number can only be from two digits
Added "\b" to demand the border of the number
loulou2522
Enthusiast
Posts: 553 Joined: Tue Oct 14, 2014 12:09 pm
Post
by loulou2522 » Sat Dec 02, 2023 6:46 pm
It's possible than the number could be 158 300 1 158 000 os it possible ?
AZJIO
Addict
Posts: 2228 Joined: Sun May 14, 2017 1:48 am
Post
by AZJIO » Sat Dec 02, 2023 6:52 pm
What result should be?
??
158300
1158
000
Search for pairs of numbers to combine them?
loulou2522
Enthusiast
Posts: 553 Joined: Tue Oct 14, 2014 12:09 pm
Post
by loulou2522 » Sat Dec 02, 2023 6:59 pm
The result for 158 200 would be 158200 and for 1 158 000 1158000
Thanks for your help , regexp are very complicate
AZJIO
Addict
Posts: 2228 Joined: Sun May 14, 2017 1:48 am
Post
by AZJIO » Sat Dec 02, 2023 7:14 pm
"AD 500 BC 24 000 45 000 50 000 500"
In order not to make regular expression too complicated, I need to know:
1. Is this a unity line or a list of such lines
2. Whether the beginning of the line is always the same, whether it corresponds to the template: "Word | Number | Word | several numbers."
3. How many numbers should be at the end. It is very difficult to determine what numbers to combine. Each time, new rules and I do not want to adjust the regular expression under a special case, and then change in connection with the new rules.
loulou2522
Enthusiast
Posts: 553 Joined: Tue Oct 14, 2014 12:09 pm
Post
by loulou2522 » Sat Dec 02, 2023 7:25 pm
Here's what I do
-1 I load a text file containing the standardized balance sheet data for a company in France.
- 2 The format is always the same
Letter letter amount Letter letter amount amount amount amount
the amount can have a value between 1 and 1 000 000 €uros the only constraint being to be able to transform these numbers into amount. I don't do any processing on the letters, they stay as they are. I hope I've given you enough information.
AZJIO
Addict
Posts: 2228 Joined: Sun May 14, 2017 1:48 am
Post
by AZJIO » Sat Dec 02, 2023 7:58 pm
loulou2522 wrote: Sat Dec 02, 2023 7:25 pm
Letter letter amount Letter letter amount amount amount amount
1. Is the repetition of "amount" 4 times with the exact rule? In previous messages, there were 2 times "Amount".
2. Since you have a list, there is probably a special marking, for example, between the categories of thousands of "30 000" a space of Chr(32), and between the numbers "20 000" and "30 000" - #TAB$ Chr(9). If there is marking, then many problems with the separation of numbers disappear.
loulou2522
Enthusiast
Posts: 553 Joined: Tue Oct 14, 2014 12:09 pm
Post
by loulou2522 » Sat Dec 02, 2023 8:39 pm
here is an exemple
chaine = "AD 1 201 500 AE 1 200 000 1 500 10 500"
all the string are the same built in the file
AZJIO
Addict
Posts: 2228 Joined: Sun May 14, 2017 1:48 am
Post
by AZJIO » Sat Dec 02, 2023 9:45 pm
Code: Select all
#RegExp = 0
Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"
If CreateRegularExpression(#RegExp, "([a-z]{2}) \h ([\d\h]+) ([a-z]{2}) \h (\d{3}[\h;.]\d{3}|\d{1,2}[\h;.]\d{3}|\d[\h;.]\d{3}[\h;.]\d{3}) \h (\d{3}[\h;.]\d{3}|\d{1,2}[\h;.]\d{3}|\d[\h;.]\d{3}[\h;.]\d{3})", #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
Groups = CountRegularExpressionGroups(#RegExp)
If Not Groups
Debug 0
EndIf
If ExamineRegularExpression(#RegExp, Text$)
While NextRegularExpressionMatch(#RegExp)
For i = 1 To Groups
Debug RegularExpressionGroup(#RegExp, i)
Next
Wend
EndIf
Else
Debug RegularExpressionError()
EndIf
Use
RegExpPB
Code: Select all
#RegExp = 0
Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"
num$ = "(\d{3}[\h;.]\d{3} | \d{1,2}[\h;.]\d{3} | \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"
RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
; RegExp + "([\d\h]+)"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
; Debug RegExp$
If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
Groups = CountRegularExpressionGroups(#RegExp)
If Not Groups
Debug 0
EndIf
If ExamineRegularExpression(#RegExp, Text$)
While NextRegularExpressionMatch(#RegExp)
For i = 1 To Groups
Debug RegularExpressionGroup(#RegExp, i)
Next
Wend
EndIf
Else
Debug RegularExpressionError()
EndIf
Last edited by
AZJIO on Sat Dec 02, 2023 10:02 pm, edited 1 time in total.
loulou2522
Enthusiast
Posts: 553 Joined: Tue Oct 14, 2014 12:09 pm
Post
by loulou2522 » Sat Dec 02, 2023 10:02 pm
here is the result
the original string is
[AD 1 201 500 AE 1 200 000 1 500 10 500 ]
when i execute the programm the result is
AD
1 201 500
AE
1 200 000
1 500
what i find is to have the result like that
AD
1201500
AE
1200000
1500
10500
AZJIO
Addict
Posts: 2228 Joined: Sun May 14, 2017 1:48 am
Post
by AZJIO » Sat Dec 02, 2023 10:08 pm
Code: Select all
#RegExp = 0
; Text$ = "AD 1 201 500 AE 1 200 000 1 500"
Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"
num$ = "(\d{3}[\h;.]\d{3} | \d{1,2}[\h;.]\d{3} | \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"
RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
; RegExp + "([\d\h]+)"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h*"
RegExp$ + num$ + "?"
; Debug RegExp$
If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
Groups = CountRegularExpressionGroups(#RegExp)
If Not Groups
Debug 0
EndIf
If ExamineRegularExpression(#RegExp, Text$)
While NextRegularExpressionMatch(#RegExp)
For i = 1 To Groups
If i = 2 Or i > 3
Debug ReplaceString(RegularExpressionGroup(#RegExp, i), " ", "") ; There may be a regular expression for [\h;.]
Else
Debug RegularExpressionGroup(#RegExp, i)
EndIf
Next
Wend
EndIf
Else
Debug RegularExpressionError()
EndIf
The minimum number is 1000, the maximum number 9 999 999
loulou2522
Enthusiast
Posts: 553 Joined: Tue Oct 14, 2014 12:09 pm
Post
by loulou2522 » Sat Dec 02, 2023 10:51 pm
Thanks azijo.
There is only one last problem when a number is < 1000 the expression doesn't work
Can you find a solution
example of string
Text$ = "AD 500 AE 1 200 000 1 500 10 500"
AZJIO
Addict
Posts: 2228 Joined: Sun May 14, 2017 1:48 am
Post
by AZJIO » Sun Dec 03, 2023 5:36 am
Code: Select all
#RegExp = 0
; Text$ = "AD 1 201 500 AE 1 200 000 1 500"
Text$ = "AD 1 201 500 AE 1 200 000 1 500 10 500"
Text$ = "AD 500 AE 1 200 000 1 500 10 500"
num$ = "(\d{3}[\h;.]\d{3} | \d{1,2}[\h;.]\d{3} | \d[\h;.]\d{3}[\h;.]\d{3})"
word$ = "([a-z]{2})"
RegExp$ = ""
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + "([\d\h]+)"
; RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + word$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h+"
RegExp$ + num$
RegExp$ + "\h*"
RegExp$ + num$ + "?"
; Debug RegExp$
If CreateRegularExpression(#RegExp, RegExp$, #PB_RegularExpression_Extended | #PB_RegularExpression_NoCase)
Groups = CountRegularExpressionGroups(#RegExp)
If Not Groups
Debug 0
EndIf
If ExamineRegularExpression(#RegExp, Text$)
While NextRegularExpressionMatch(#RegExp)
For i = 1 To Groups
If i = 2 Or i > 3
Debug ReplaceString(RegularExpressionGroup(#RegExp, i), " ", "") ; There may be a regular expression for [\h;.]
Else
Debug RegularExpressionGroup(#RegExp, i)
EndIf
Next
Wend
EndIf
Else
Debug RegularExpressionError()
EndIf