How to replace text in files?

Just starting out? Need help? Post your questions and find answers here.
AZJIO
Addict
Addict
Posts: 2152
Joined: Sun May 14, 2017 1:48 am

How to replace text in files?

Post by AZJIO »

1. How to define "UTF-8 without BOM" in #PB_Ascii
2. How to find text if #PB_Ascii as Win-1251 in Linux
3. How to find text in binary files

Code: Select all

EnableExplicit

Define Path$, Format, Text$, SearchStr$, Pos

Path$ = OpenFileRequester("select a file", "/home/user/", "Text (.txt)|*.txt|Все (*.*)|*.*", 0)
If Path$
	If ReadFile(0, Path$)
		Format=ReadStringFormat(0)
		Select Format
			Case #PB_UTF16BE, #PB_UTF32, #PB_UTF32BE ; string functions do not work with these files
				End
		EndSelect
		Text$ = ReadString(0, Format | #PB_File_IgnoreEOL) ; read the file into a string
		CloseFile(0)
		; 		Debug Text$
		SearchStr$ = " "
		Pos = FindString(Text$, SearchStr$) ; changes only if necessary
		If Pos
			Text$ = ReplaceString(Text$, SearchStr$ , "_", #PB_String_CaseSensitive, Pos)
			If OpenFile(0 , Path$, Format)
				Select Format
					Case #PB_UTF8, #PB_Unicode
						WriteStringFormat(0 , Format)
				EndSelect
				WriteString(0, Text$, Format)
				CloseFile(0)
			EndIf
		EndIf
	EndIf
EndIf
Last edited by AZJIO on Sun Feb 21, 2021 6:17 pm, edited 1 time in total.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: How to replace text in files?

Post by IdeasVacuum »

Hi

Code: Select all

    Select Format
       Case #PB_UTF16BE, #PB_UTF32, #PB_UTF32BE
        End
    EndSelect
That is not how to use Select, but in this case there is no role for select to play, that part of the code is redundant.
How to define "UTF-8 without BOM" in #PB_Ascii
Sorry but that question does not make sense. If you are reading a file that does not have a BOM, you might assume that it is an ASCII file, often it is, but there is no guarantee. Therefore, the file needs to be read and displayed so that the User can determine whether or not the text is good - that way, if the text is not good, the file can be re-read as UTF8 and again displayed and assessed. If there are a lot of files to be processed, you could of course ask the person that supplied them.

If you are writing the file as UTF8 but wish to omit the BOM, just do that, omit it!

Code: Select all

If CreateFile(#FileOut,MyFullPath)

         WriteString(#FileOut, "I am an Arsenal Supporter", #PB_UTF8)
         WriteString(#FileOut, "no need to groan", #PB_UTF8)
         CloseFile(#FileOut)
Else
         MessageRequester("Alert","File Create failed")
EndIf
How to find text if #PB_Ascii as Win-1251 in Linux
From Wikipedia: Windows-1251
Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Bulgarian, Serbian Cyrillic and other languages. It is the most widely used encoding for the Bulgarian, Serbian and Macedonian languages.[citation needed] As of December 2020, 0.9% of all (and 0.6% of top-1000[1]) websites use Windows-1251.[2][3] However, it is used by 9.9% of Russian (.ru) websites....

So Win-1251 is effectively a special extension of standard ASCII that PB is perhaps unlikely to be able to process correctly without some form of translation. There are a few discussions in the forum if you search for "Win 1251". For example:
viewtopic.php?f=13&t=35027
How to find text in binary files
To find text in a binary file, you need to know the format of the text before it was saved to binary.
Some methods here: viewtopic.php?f=13&t=42561
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
AZJIO
Addict
Addict
Posts: 2152
Joined: Sun May 14, 2017 1:48 am

Re: How to replace text in files?

Post by AZJIO »

That is not how to use Select,
It's easier for me to read. One element is compared to 3 elements. If I use "if Or", then I have 6 units of comparison. I have to find out in the future if such a record is optimized in ASM.
If you are reading a file that does not have a BOM, you might assume that it is an ASCII file,
In many editors "UTF-8 without BOM" is easily identified. Usually any non-latin letter is enough to be guaranteed to define. When viewed in a hex editor, the first byte of the double-byte code is almost the same, i.e. some code prevails.
if the text is not good, the file can be re-read as UTF8 and again displayed
I need batch processing of files. I already wrote some code to find files.

In Windows, at the system level, the correct perception of files occurs. This is not the case on Linux. But there are editors, for example Geany, which in the settings offers the ability to open ANSI as 1251.
Web pages use "UTF-8 without BOM". Notepad ++ detects HTML files by their content, by the presence of charset = UTF-8.
I can try to convert this code, but I thought maybe there is something ready.
you need to know the format of the text before it was saved to binary.
Some binaries are not demanding on the length of the text. For example Grub4Dos contains a text configuration file inside of it. See also comments in picture files.
RASHAD
PureBasic Expert
PureBasic Expert
Posts: 4951
Joined: Sun Apr 12, 2009 6:27 am

Re: How to replace text in files?

Post by RASHAD »

- Open file to Read and file to Write at the same time(Ex.File 0 to read & file 1 to write)
- Repeat
- Read file 0 one line then process it for any changes
- Write it to file 1
- Until the end of file 0

- Close file 0
- Close file 1
- You can delete file 0 if you are sure that the process went OK

Did not test it but it is logic enough
Egypt my love
AZJIO
Addict
Addict
Posts: 2152
Joined: Sun May 14, 2017 1:48 am

Re: How to replace text in files?

Post by AZJIO »

RASHAD
It is clear to make a safe operation in case of failure.

I also wanted to add a search to make a replacement from the found position. This will prevent overwriting the file if there is no replacement. So the code is not perfect and the main task is to define encodings.
IdeasVacuum
Always Here
Always Here
Posts: 6426
Joined: Fri Oct 23, 2009 2:33 am
Location: Wales, UK
Contact:

Re: How to replace text in files?

Post by IdeasVacuum »

It's easier for me to read. One element is compared to 3 elements.
You are not going to return the actual format with that code, it just confirms that it is one of the 3 or not.
IdeasVacuum
If it sounds simple, you have not grasped the complexity.
AZJIO
Addict
Addict
Posts: 2152
Joined: Sun May 14, 2017 1:48 am

Re: How to replace text in files?

Post by AZJIO »

IdeasVacuum wrote:You are not going to return the actual format with that code, it just confirms that it is one of the 3 or not.
ReadStringFormat
The other results represent string formats that cannot be directly read with PureBasic string functions. They are included for completeness so that an application can display a proper error-message.
______________
Added FindString

I can detect Win-1251 using the previously referenced code, but how do I replace it? How to translate the searched text from UTF-8 to Win-1251, i.e. convert to some binary data using collation table? This is true for most languages, even those that have a mixture of Latin letters and their own.
Post Reply