Page 1 of 1

Best method to preallocate a big file?

Posted: Fri Nov 11, 2011 1:36 am
by Dummy
I'm looking for the best way to preallocate a big file before starting to write into it in order to make code simpler in the long run.

Let's assume this as basic app code:

Code: Select all

Declare CreateFilePreallocated(fileId, filename.s, fileSize.q)

Define big_size.q

big_size = 1024 * 1024 ; 1mb for testing
; big_size = 1024 * 1024 * 1024 * 4 ; 4gb

CreateFilePreallocated(1, "blah.bin", big_size)
; write big_size of data nonsequentially (meaning that I might write the last byte just before the first)
CloseFile(1)
Sadly the following Code is not officially supported for "security concerns" and therefore might break on any future PB update, so that can't be the best method...

Code: Select all

Procedure CreateFilePreallocated(fileId, filename.s, fileSize.q)
  CreateFile(fileId, filename)
  FileSeek(fileId, fileSize)
  TruncateFile(fileId)
  FileSeek(fileId, 0)
EndProcedure
On many *nix-filesystems and NTFS this method might produce a sparse file. A sparse file does not consume disk space for file-blocks that were skipped and never written. This also greatly benefits Virtual Machines with dynamically growing volumes.
Win-API as well as POSIX C API definition offer seeking commands with this functionality.
The C API definition requires the space that is freshly allocated to be zero. WinAPI states that the skipped block has undefined contents. And WinAPI only mentions security concerns for SetFileValidData that therefore requires special user privileges but not for SetEndOfFile.
http://pubs.opengroup.org/onlinepubs/00 ... fseek.html
http://msdn.microsoft.com/en-us/library ... s.85).aspx
http://msdn.microsoft.com/en-us/library ... s.85).aspx
But that's not what this Thread is about.

This seems to eat a lot of CPU power and breaks sparsity:

Code: Select all

Procedure CreateFilePreallocated(fileId, filename.s, fileSize.q)
  CreateFile(fileId, filename)
  i.q
  For i = 1 To fileSize-7 Step 8
    WriteQuad(fileId, 0)
  Next
  i - 8
  For i = i To fileSize
    WriteByte(fileId, 0)
  Next
  FileSeek(fileId, 0)
EndProcedure

This seems to be a pretty stupid solution and breaks sparsity: (allocate memory just to have some bigger buffer to write?!)

Code: Select all

#FALLOC_STEP = 1024 * 1024 ; 1mb
Procedure CreateFilePreallocated(fileId, filename.s, fileSize.q)
  CreateFile(fileId, filename)
  *buf = AllocateMemory(#FALLOC_STEP)
  While fileSize > 0
    len.q
    If fileSize > #FALLOC_STEP
      len = #FALLOC_STEP
    Else
      len = fileSize
    End

    WriteData(fileId, *buf, len)
    fileSize - len
  Wend

  FreeMemory(*buf)

  FileSeek(fileId, 0)
EndProcedure
There are other variants with a DataSection, a string or a struct of given size. Also I could preallocate that junk-buffer on application startup and keep it till end but that just sounds stupid as well.

Yes I know, we have plenty of resources on modern machines. And no, I don't like applications that are written with that thought in mind. The best example for such an app is the official ICQ client: Eats memory like there's no tomorrow and still is slow as hell.

Re: Best method to preallocate a big file?

Posted: Fri Nov 11, 2011 7:26 am
by dige
I'm not really sure what you're mean, but let do windows the job:

Code: Select all

; by Dige 11/2011
Procedure.L OpenFileAsMemory ( file.S, maxSize.q );

  Handle = CreateFile_( file, #GENERIC_READ|#GENERIC_WRITE, 3, 0, 3, #OPEN_EXISTING,0 )
  
  If Handle <> #INVALID_HANDLE_VALUE
    ;HighOrder = (maxSize & $FFFFFFFF00000000) >> 32
    HighOrder = 0; Max. 32 Bit = 2.1GB!!
    LowOrder  = maxSize & $00000000FFFFFFFF
    
    MapHandle = CreateFileMapping_(Handle, 0, #PAGE_READWRITE,  HighOrder, LowOrder, 0)
    
    If MapHandle
      ViewHandle = MapViewOfFile_(MapHandle, #FILE_MAP_WRITE, 0, 0, 0)
       
      If ViewHandle
        ProcedureReturn ViewHandle
        UnmapViewOfFile_( ViewHandle )
      EndIf
      CloseHandle_(MapHandle)
    EndIf
    CloseHandle_(Handle)
  EndIf

  ProcedureReturn #False
EndProcedure

; Example, how to use
*mem = OpenFileAsMemory ( "MyNewBigFile.bin", $100000 )
!! If you want to use it in that way, please consider the following variables:

- ViewHandle
- MapHandle
- Handle

should store in a list/structur for later release the resources !!

Re: Best method to preallocate a big file?

Posted: Fri Nov 11, 2011 8:37 am
by Danilo
Dummy wrote:On many *nix-filesystems and NTFS this method might produce a sparse file. A sparse file does not consume disk space for file-blocks that were skipped and never written. This also greatly benefits Virtual Machines with dynamically growing volumes.
Little code to create a sparse file on NTFS filesystem:

Code: Select all

EnableExplicit

Macro CTL_CODE( DeviceType, Function, Method, Access )
 ( ((DeviceType) << 16) | ((Access) << 14) | ((Function) << 2) | (Method) )
EndMacro

Import "Kernel32.lib"
    SetFilePointerEx.i(*hFile,liDistanceToMove.q,*lpNewFilePointer,dwMoveMethod.l) ;As "_SetFilePointerEx@24"
EndImport

#FILE_DEVICE_FILE_SYSTEM = $00000009
#METHOD_BUFFERED         = 0
#FILE_SPECIAL_ACCESS     = 0
#FILE_WRITE_DATA         = 2
#FSCTL_SET_SPARSE        = CTL_CODE(#FILE_DEVICE_FILE_SYSTEM, 49, #METHOD_BUFFERED, #FILE_SPECIAL_ACCESS)
#FSCTL_SET_ZERO_DATA     = CTL_CODE(#FILE_DEVICE_FILE_SYSTEM, 50, #METHOD_BUFFERED, #FILE_WRITE_DATA)

Structure FILE_ZERO_DATA_INFORMATION
  FileOffset.q
  BeyondFinalZero.q
EndStructure

Structure FileSizeLowHigh
    fileSizeLow.l
    fileSizeHigh.l
EndStructure

Structure FileSizeQuad
    StructureUnion
       fileSizeAsLongs.FileSizeLowHigh
       fileSize.q
    EndStructureUnion
EndStructure

Procedure.i ZeroSparseFile(fileId,FileOffset.q,Size.q)
    Protected fzdi.FILE_ZERO_DATA_INFORMATION, bytesReturned.l
    fzdi\FileOffset      = FileOffset
    fzdi\BeyondFinalZero = FileOffset+Size+1
    ProcedureReturn DeviceIoControl_(FileID(fileId),#FSCTL_SET_ZERO_DATA,@fzdi,SizeOf(FILE_ZERO_DATA_INFORMATION),#Null,0,@bytesReturned,#Null)
EndProcedure

Procedure.q GetSparseFileSize(filename.s)
    Protected fileSize.FileSizeQuad
    fileSize\fileSizeAsLongs\fileSizeLow = GetCompressedFileSize_(filename,@fileSize\fileSizeAsLongs\fileSizeHigh)
    If fileSize\fileSizeAsLongs\fileSizeLow = -1
        If GetLastError_() <> #NO_ERROR
            ProcedureReturn -1 ; ERROR, file does not exist?
        EndIf
    EndIf
    ProcedureReturn fileSize\fileSize
EndProcedure

Procedure CreateSparseFile(fileId, filename.s, fileSize.q)
    Protected file, bytesReturned.l, fileId2
    fileId2 = fileId
    file = CreateFile(fileId,filename)
    If file
        If fileId = #PB_Any : fileId = file : EndIf
        If Not DeviceIoControl_(FileID(fileId),#FSCTL_SET_SPARSE,#Null,0,#Null,0,@bytesReturned,#Null)
            ; failed to set file to sparse file
            CloseFile(fileId)
            ;DeleteFile(filename) ; ?? remove created file on failure
            ProcedureReturn 0
        EndIf
        If Not ZeroSparseFile(fileId,0,fileSize)
            ; failed to set sparse file content to 0
            CloseFile(fileId)
            ;DeleteFile(filename) ; ?? remove created file on failure
            ProcedureReturn 0
        EndIf
        ; success
        SetFilePointerEx(FileID(fileId),fileSize,#Null,#FILE_BEGIN)
        SetEndOfFile_(FileID(fileId))
        SetFilePointerEx(FileID(fileId),0,#Null,#FILE_BEGIN)
    EndIf
    ProcedureReturn file
EndProcedure

Define file, big_size.q

;big_size = 1024 * 1024 ; 1mb for testing
big_size = 1024 * 1024 * 1024 * 4 ; 4gb
Debug big_size

OpenConsole()

file = CreateSparseFile(1, "blah.bin", big_size)
If file
    ; write big_size of data nonsequentially (meaning that I might write the last byte just before the first)
    PrintN("Lof(): "+Str(Lof(1)))
    WriteByte(1,$64)
    FileSeek(1, big_size - 100)
    WriteQuad(1,$AABBCCDDEEFF1122)
    FileSeek(1, big_size - 1)
    WriteAsciiCharacter(1,$80)

    CloseFile(1)
    PrintN("GetSparseFileSize(): "+Str(GetSparseFileSize("blah.bin")))

    If ReadFile(2,"blah.bin")
        PrintN("Lof(): "+Str(Lof(2)))

        PrintN("reading Byte: $"+Hex(ReadByte(2)))
        FileSeek(2, big_size - 100)
        PrintN("reading Quad: $"+Hex(ReadQuad(2)))
        FileSeek(2, big_size - 1)
        PrintN("reading Byte: $"+Hex(ReadAsciiCharacter(2)))

        CloseFile(2)
        PrintN("GetSparseFileSize(): "+Str(GetSparseFileSize("blah.bin")))

    EndIf
Else
    PrintN("Error: can not create sparse file")
EndIf

PrintN("press <ENTER>")
Input()
It creates a 4GB sparse file. Size on disk: 128,0 kb

- CreateSparseFile(fileId, filename.s, fileSize.q) creates the file
- with ZeroSparseFile(fileId,FileOffset.q,Size.q) you can set a region in the file to all 0's (this 0's are compressed and use little space in the file)
- GetSparseFileSize(filename.s) gets the "Size on disk", the real size of the sparse file

GetSparseFileSize() does not work after using the PureBasic file functions, don't know why. It returns always 0.
It shows the correct value before using the file functions, so you will see the size on 2nd run. On first run it shows -1, file does not exist.

Re: Best method to preallocate a big file?

Posted: Fri Nov 11, 2011 12:52 pm
by Dummy
Hmm dige, your code won't work for big files in 32 Bit PureBasic as it maps the whole file into virtual memory space... still it might come in handy in other projects...

Thanks Danilo for the neat piece of code. Two questions:
1. Your code fails for devices that don't support sparse files (e.g. most variants of FAT) which is not what I intend. On devices that do not support sparse files in most cases file fragmentation also is a major issue, so it would make sense to preallocate all memory on those devices either.
2. Do you know any similar code for Linux/Mac?

I might have missed to say that I originally didn't intend to use Platform dependant code as the resulting App should be compilable on Windows, Linux and Mac. Also I'd be happy to write clean code that doesn't need to disable the debugger. But it seems that's currently not possible with PureBasic to meet both conditions at the same time and be sure that that code won't break without changelog notice in future PB updates...

Re: Best method to preallocate a big file?

Posted: Fri Nov 11, 2011 8:25 pm
by Danilo
Dummy wrote:1. Your code fails for devices that don't support sparse files (e.g. most variants of FAT) which is not what I intend. On devices that do not support sparse files in most cases file fragmentation also is a major issue, so it would make sense to preallocate all memory on those devices either.
You need to test the drive for sparse file support:
To determine whether a file system supports sparse files, call the GetVolumeInformation function and examine the FILE_SUPPORTS_SPARSE_FILES bit flag returned through the lpFileSystemFlags parameter.
Search MSDN Library for Sparse Files

Dummy wrote:2. Do you know any similar code for Linux/Mac?
No.
Dummy wrote:I might have missed to say that I originally didn't intend to use Platform dependant code as the resulting App should be compilable on Windows, Linux and Mac. Also I'd be happy to write clean code that doesn't need to disable the debugger. But it seems that's currently not possible with PureBasic to meet both conditions at the same time and be sure that that code won't break without changelog notice in future PB updates...
The Debugger does not know about this manipulations when checking PB file functions.
Best thing would be to write your own functions/libs for everything you need (FileSeek, Peek/Poke, ...).

Don't know how to do this cross platform for many different file systems. For large files
memory mapping could speed up things if you write some sequencial blocks. When writing
to totally random positions, another block is mapped into view every time and it becomes
slower again.
Not easy, maybe you can find a library for this task for all platforms you need? It must have
be done before for tools that work with large .iso/.bin files.

Re: Best method to preallocate a big file?

Posted: Sat Nov 12, 2011 7:30 am
by Danilo
Changed the code above. Works now with Debugger.