Check Binaryfile

Share your advanced PureBasic knowledge/code with the community.
User avatar
oryaaaaa
Addict
Addict
Posts: 825
Joined: Mon Jan 12, 2004 11:40 pm
Location: Okazaki, JAPAN

Check Binaryfile

Post by oryaaaaa »

Code updated For 5.20+

Code: Select all

Procedure.b CheckBinaryfile(FileName.s)
  Protected file.l, buf.b
  If FileSize(FileName)>0
    file = ReadFile(#PB_Any, FileName)
    While Eof(file)=0
      buf.b =ReadByte(file)
      Select buf
        Case $01,$04,$05,$07,$08,$10,$12 To $15, $1A, $1C To $1F
          Debug "BINARY"
          CloseFile(file)
          ProcedureReturn #True
      EndSelect
    Wend
    CloseFile(file)
  EndIf
  ProcedureReturn #False
EndProcedure
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

This slight changed one will detect any non text file as binary.

http://en.wikipedia.org/wiki/Ascii

Linefeed, Carriage Return,Tab is allowed.
Anything else below from $0 - $1F is flagged as binary,
likewise $7F is considered binary.

Code: Select all

Procedure.b CheckBinaryfile(FileName.s)
 Protected file.l, buf.b
 If FileSize(FileName)>0
  file = ReadFile(#PB_Any,FileName)
  While Eof(file)=0
   buf.b =ReadByte(file)
   Select buf
    Case $00 To $08,$0B,$0C,$0E To $1F,$7F
     Debug "BINARY"
     CloseFile(file)
     ProcedureReturn #True
   EndSelect 
  Wend
  CloseFile(file)
 EndIf
 ProcedureReturn #False
EndProcedure
anything else in 0-31 ascii range or 127 will flag as being binary.

This should be as good as it can get, for detecting normal text (latin-1) and binary.
I'm not sure about UTF8 text, but I do believe that it might be ok in most cases.

However, the only way to truly detect if something is binary or not is simple (but cpu/disk intensive),
if you find $0 (binary zero) then it's binary, if not, it's 99.99% chance of being text.

I have yet to see any text files with a binary zero in them.
(Microsoft Word documents do not count as being "text" those are actually a binary document file format)
User avatar
oryaaaaa
Addict
Addict
Posts: 825
Joined: Mon Jan 12, 2004 11:40 pm
Location: Okazaki, JAPAN

Post by oryaaaaa »

I cannot read a part of document by your method.
$00 is FileEnd, It method is All File Binary.

Code: Select all

Procedure.b CheckBinaryfile(FileName.s)
  Protected file.l, buf.b
  If FileSize(FileName)>0
    file = ReadFile(#PB_Any, FileName)
    While Eof(file)=0
      buf.b =ReadByte(file)
      Select buf
        Case $01 To $08,$0B,$0C,$0E To $1F,$7F 
          Debug "BINARY"
          CloseFile(file)
          ProcedureReturn #True
      EndSelect
    Wend
    CloseFile(file)
  EndIf
  ProcedureReturn #False
EndProcedure 
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

Post by Kale »

Here's another i came up with a few years ago:

Code: Select all

;Check to see if the file is binary
Procedure IsBinary(File.s)
    Protected CurrentByte.b
    If ReadFile(0, File)
        Repeat
            CurrentByte = ReadByte(0)
            If CurrentByte <= 9 Or CurrentByte = 127
                CloseFile(0)
                ProcedureReturn 1
            ElseIf CurrentByte > 10 And CurrentByte < 13
                CloseFile(0)
                ProcedureReturn 1
            ElseIf CurrentByte > 13 And CurrentByte < 32
                CloseFile(0)
                ProcedureReturn 1
            EndIf
        Until Loc(0) = Lof(0)
        ProcedureReturn 0
    Else
        MessageRequester("Error", "File does not exist or can not be opened:" + Chr(10) + File, #PB_MessageRequester_Ok)
        ProcedureReturn -1
    EndIf
EndProcedure
--Kale

Image
User avatar
Rescator
Addict
Addict
Posts: 1769
Joined: Sat Feb 19, 2005 5:05 pm
Location: Norway

Post by Rescator »

oryaaaaa wrote:I cannot read a part of document by your method.
$00 is FileEnd, It method is All File Binary.
*scratches head* What kind of "text" document has binary 0 in it?
Wait.. silly me. Asian systems store text as 16bit don't they?

That's a small headache then.
What you need to do is to check for valid 16bit character ranges/values,
and if that is false, then check for valid 8bit ascii similar to what I did and then flag it as binary.

No wonder my code failed as it would see $0020 (16bit space) for example as $00 and then $20. (intel/least significant byte order would store it in "reverse" so the $00 is found first etc)

Not sure what advice I can give you other than that. since your system most likely has a mixture of 8bit and 16bit text?
Last edited by Rescator on Sun Jul 09, 2006 11:58 am, edited 1 time in total.
User avatar
Flype
Addict
Addict
Posts: 1542
Joined: Tue Jul 22, 2003 5:02 pm
Location: In a long distant galaxy

Post by Flype »

thanks to your 'binary' rules, i made this useful Macro :

Code: Select all

REMOVED
[EDIT]
The forum is bugged with the macro i wanted to post. :?
No programming language is perfect. There is not even a single best language.
There are only languages well suited or perhaps poorly suited for particular purposes. Herbert Mayer
User avatar
oryaaaaa
Addict
Addict
Posts: 825
Joined: Mon Jan 12, 2004 11:40 pm
Location: Okazaki, JAPAN

Post by oryaaaaa »

Im sorry

My Software code belong to bug

Code: Select all

PokeB(@Edit_Buf+Len(Edit_Buf)-1,0)

&

file = CreateFile(#PB_Any, ExtensionArchive()\FileName)
WriteData(file, @Edit_Buf, Edit_Len )
CloseFile(file)
Fix WriteData(file, @Edit_Buf, Edit_Len-1)

The buffer control of Scintilla.DLL is difficult. It made a mistake.
User avatar
Flype
Addict
Addict
Posts: 1542
Joined: Tue Jul 22, 2003 5:02 pm
Location: In a long distant galaxy

Post by Flype »

another way - probably not the faster but interessant :

Code: Select all

#BINARY$ = #SOH$+#STX$+#ETX$+#EOT$+#ENQ$+#ACK$+#BEL$+#BS$+#VT$+#FF$+#SO$+#SI$+#DLE$+#DC1$+#DC2$+#DC3$+#DC4$+#NAK$+#SYN$+#ETB$+#CAN$+#EM$+#SUB$+#ESC$+#FS$+#GS$+#RS$+#US$+#DEL$+#NUL$

Procedure.l CheckBinaryFile(FileName.s) 
  Protected byte.b, result.l, file.l = ReadFile(#PB_Any, FileName) 
  If file
    While Not Eof(file)
      byte = ReadByte(file)
      If byte
        If FindString(#BINARY$, Chr(byte), 1)
          result = #True
          Break
        EndIf
      EndIf
    Wend 
    CloseFile(file) 
  EndIf
  ProcedureReturn result
EndProcedure

If ExamineDirectory(0, #PB_Compiler_Home, "*.*")
  While NextDirectoryEntry(0)
    If DirectoryEntryType(0) = #PB_DirectoryEntry_File
      Debug DirectoryEntryName(0) + ": " + Str(CheckBinaryFile(#PB_Compiler_Home + DirectoryEntryName(0)))
    EndIf
  Wend
EndIf
No programming language is perfect. There is not even a single best language.
There are only languages well suited or perhaps poorly suited for particular purposes. Herbert Mayer
Post Reply