Page 1 of 1

Need help with some coding in Memory

Posted: Sat Oct 06, 2007 2:14 pm
by raska
Hi, I just made a program to manage some ASCII files. As they are so big files (until 100 meg or so), I need to optimize the code to write these files
Actually the code I have is the one below. I would like to translate it to make the same but in memory, then when finish save it to disk.

Any help will be highly appreciated. Thanks!! :)

Code: Select all

OpenFile(6,file1$)                            ;open files
OpenFile(7,file2$)

Repeat
 c$=ReadString(6)                             ;read every line of file1$$
 WriteString(7,c$+Chr(10))                    ;write the line to file2$
Until Left (c$,7)=Chr(9)+Chr(9)+"twist"       ;until find twist
 
pos=Loc(7)                                    ;file pointer
FileSeek (7,pos-Len(c$))                      ;set the pointer a line up
WriteString (7,tg$+c$+Chr(10))                ;add the variable tg$ to the file (the length of this variable is unknown 
                                              ;when you Allocate memory to the begining). It can be from 10 k. until 2/3 meg.
                                              ;not sure if the memory allocated can be modified when knowing the value of this
                                              ;variable. This value is different every time that the programa calculates it 
                                              ;and it will be inserted  in the second file a lot of times during the program
                                              ;execution. Say that if the original file1$ is 100 k. big, file2$ would be at the 
                                              ;end of the program about 50 or 100 meg. big
Repeat
 c$=ReadString(6)                             ;read every line of file1$$
 WriteString(7,c$+Chr(10))                    ;write the line to file2$  
Until Eof(6)                                  ;until end of file               

CloseFile (6)                                 ;close files
CloseFile (7)

Posted: Sat Oct 06, 2007 7:52 pm
by Demivec
raska wrote:Hi, I just made a program to manage some ASCII files. As they are so big files (until 100 meg or so), I need to optimize the code to write these files
Actually the code I have is the one below. I would like to translate it to make the same but in memory, then when finish save it to disk.

Any help will be highly appreciated. Thanks!! Smile
I think this could accomplish what you need.

Code: Select all

ReadFile(6,file1$)                                            ;open source file
*buffer=AllocateMemory(100000)                                ;set aside buffer that's larger than expected size, if length of tg$ is known at this time,
                                                              ;size can be estimated by pre-reading source file and counting its lines.
                                                              ;bufferSize=FileSize(6)+lines+len(tg$)

If *buffer=0                                                  ;only proceed if memory was allocated
  MessageRequester("Error","Not enough memory for disk buffer!")
Else
    
  *Ptr=*buffer                                                ;initialize moving pointer
  Repeat
    c$=ReadString(6,#PB_Ascii)                                ;read every line of file1$
    PokeS(*Ptr,c$+Chr(10),#PB_Ascii)                          ;write the line to the buffer
    *Ptr+StringByteLength(c$+Chr(10))+1                       ;update buffer ptr, we add the stringLength+1 for the null
  Until Left(c$,7)=Chr(9)+Chr(9)+"twist "                     ;until find twist
  
  PokeS(*Ptr-StringByteLength(c$+Chr(10))-1,tg$+c$+Chr(10))   ;set the pointer a line up and add the variable tg$ to the file (the length of this variable is unknown
                                                              ;when you Allocate memory to the begining). It can be from 10 k. until 2/3 meg.
                                                              ;not sure if the memory allocated can be modified when knowing the value of this
                                                              ;variable. This value is different every time that the programa calculates it
                                                              ;and it will be inserted  in the second file a lot of times during the program
                                                              ;execution. Say that if the original file1$ is 100 k. big, file2$ would be at the
                                                              ;end of the program about 50 or 100 meg. big
  
  *Ptr=ptr2+stringsize(tg$)                                   ;update pointer with length of tg$, we already added the rest previously
  
  Repeat
    c$=ReadString(6,#PB_Ascii)                                ;read every line of file1$
    PokeS(*Ptr,c$+Chr(10),#PB_Ascii)                          ;write the line to the buffer                    
    *Ptr+StringByteLength(c$+Chr(10))+1                       ;update buffer ptr, we add the stringLength+1 for the null
  Until Eof(6)                                                ;until end of file 
  
  CloseFile(6)                                                ;close source file
  
  OpenFile (7,file2$)                                         ;open destination file
  WriteData(7,*buffer,*Ptr-*buffer)                           ;write out contents of buffer
  CloseFile(7)                                                ;close destination file
  FreeMemory(*buffer)                                         ;free buffer memory
EndIf 
Note the method of figuring the buffer size to avoid allocated too little or too much memory to it. I did not include the pre-reading of the source, but I'm sure you can figure it from the parts already present. :wink:

Posted: Sat Oct 06, 2007 10:05 pm
by raska
Hi DEmivex:

Thanks for the tip.

I think I've understood what you mean in your code, but I see same lacks and a question (the main one) don't answered.

Lemme explain exacttly what I wanna do



1 - Allocate memory for the first file. i.e. FileID1=AllocateMemory (len (file1$))

2 - Copy the whole file in that memory section

3 - Allocate memory after the first for the second file with the same amount of bytes i.e FileID2=AllocateMemory (len (file1$))

4 - Read every single line of the first file in memory and write to rhe second buffer.

(IN this pass, the program will calculate the value of the variable tg$ every time)

5 - When needed inject the variable tg$ INCREASE THE AMOUNT OF MEMORY ALLOCATED for the second file in the tg$'s length- (ReAllocateMemory could work ??)

6 - Continue that way until end of file. tg$ will vary in length several times from the beginning to the end and every time the second buffer length must be realllocated increasinh it with the lenght of tg$.

7 - When the whole file was read, look at the final byte of the second memory allocation and sub the pointer to calculate the total length of the new second file

8 - Move all that block (second file) to the first byte of the first allocation (second file will be now the first).

9 - Allocate from the last byte of this moving the same amount for the second file again.

i.e say that the first buffer is from 1 to 500 and the second that first was 501 to 1000 after to add all tg$ it will be from 501 to 50000, so we need to move 501-50000 to 1 then allocate memory 50001 to 100000

10 - Back to the point 1 to make the same again and again .... This operation would be made about 30 times giving at the end a file until 100 meg big.

So, my main question here is if I can reallocate memory in the second block every time that tg$ is calculated so, I can continue writing withou get damage in the previous writing.

Posted: Sat Oct 06, 2007 10:31 pm
by Demivec
Hi rAskv:


raska wrote:I think I've understood what you mean in your code, but I see same lacks and a question (the main one) don't answered.
raska wrote:Actually the code I have is the one below. I would like to translate it to make the same but in memory, then when finish save it to dis
The reason it doesn't answer it is because your original code was incomplete, and unable to complete the function you said it performed, contrary to what you said in your posting.
raska wrote:So, my main question here is if I can reallocate memory in the second block every time that tg$ is calculated so, I can continue writing withou get damage in the previous writing.
The answer is "yes," though it would seem that there would be a more efficent way. What that way would be depends on whether you will share enough code to describe the process more completely, since you say that you have already written code that performs the needed task.

Posted: Sun Oct 07, 2007 12:01 am
by raska
Demivec wrote:
The answer is "yes," though it would seem that there would be a more efficent way. What that way would be depends on whether you will share enough code to describe the process more completely, since you say that you have already written code that performs the needed task.
The program is finished and it works great, only that I would like to optimize it so, instead to spend 1 min or so in get all task done, I would like to get the same task done in some seconds.
It's late today and my eyes are almost closed. :) Tomorrow i wil try to give you some more extense code where you can see exactly what I mean. I will have to do it, no way to extract it from the program that is a mess of procedures and subroutines. :)
Please, don't feel bad because my afirmation about the non answer. Sometimes, writing in a language that is not your native one is a pain and words are not the correct ones.:(
I liked to give you a big thanks for all your time and your offer.
Have a nice day or night or whatever it was in your country :)

Posted: Tue Oct 09, 2007 12:05 am
by raska
Well, here you have more or less what program does. As you can see the variable tg$ got increased several times for every file loaded depending if differences between 2 files are found. This happens a lot of times for every file of a total of 20 in this pseudo code. In the reality they are the total amount of files inside a directory: Hope you have the ideas more clear now of what I wanna do:)

Code: Select all

;pseudo code to make the thing more short :)

main$=main.tst
end$=main$+".tmp"                     ;temp file to increase it with the tg$ variable  
file1$=c:\test.dta                    ;initial file to test


For a= 1 to 20                        ;this would be the equivalent to read 20 files in a directory
  file2$=next file in directory       ;take the files one by one

  openfile(1,file1$)                  ;open both files
  openfile(2,file2$)

  repeat
     a$=readstring(1)                 ;read line by line file 1
     b$=readstring(2)                 ;read line by line file 1
      if left (a$,3)="kkk"            ;if 3 first chars are kkk
       if mid(a$,4,10)<>mid(b$,4,10)  ;compare 2 parts of the line if they are diferent
       tg$=tg$+mid(b$,4,10)           ;increase tg$ with the second file part (here the memory must be reallocated with the length of tg$)
       endif
      endif 
  until eof(1)                        ; until end of file

     gosub inject                     ;go to inject the variable tg$ to the temp file

  closefile(1)
  closefile(2)

copy end$, main$                      ; endfole$ (temp one), is now main$ file
delete end$                           ;delete temp file

Next                                  ; and go to the next file in directory until reach the last

;---------------------------------------------------------------------------

inject:
OpenFile(6,main$)                            ;open files (main and temp)
OpenFile(7,end$)

Repeat
 c$=ReadString(6)                             ;read every line of file1$$
 WriteString(7,c$+Chr(10))                    ;write the line to file2$
Until Left (c$,7)=Chr(9)+Chr(9)+"twist"       ;until find twist
 
pos=Loc(7)                                    ;file pointer
FileSeek (7,pos-Len(c$))                      ;set the pointer a line up
WriteString (7,tg$+c$+Chr(10))                ;add the variable tg$ to the file (the length of this variable is unknown
                                              ;when you Allocate memory to the begining). It can be from 10 k. until 2/3 meg.
                                              ;not sure if the memory allocated can be modified when knowing the value of this
                                              ;variable. This value is different every time that the programa calculates it
                                              ;and it will be inserted  in the second file a lot of times during the program
                                              ;execution. Say that if the original file1$ is 100 k. big, file2$ would be at the
                                              ;end of the program about 50 or 100 meg. big
Repeat
 c$=ReadString(6)                             ;read every line of file1$$
 WriteString(7,c$+Chr(10))                    ;write the line to file2$ 
Until Eof(6)                                  ;until end of file               

CloseFile (6)                                 ;close files
CloseFile (7)

return 

Posted: Tue Oct 09, 2007 6:58 pm
by Demivec
raska wrote:

Code: Select all

;pseudo code to make the thing more short :) 
Pseudo-code does make it short but, if you had posted actual code than I would be able to give you an actual answer and not a pseudo-answer. :wink: In addition the answers have to be given through several postings instead of one, because you thought it more expedient to give an incomplete version of your code.

Here is a suggestion, with pseudo-code:

Code: Select all

main$="main.tst"
temp$=main$+".tmp"                     ;temp file to increase it with the tg$ variable 
file1$="c:\test.dta"                    ;initial file to test

  
Gosub beforeInject                                        ;initiate injection process

For all the files in directory
  file1 = Open(#PB_Any,file1$)                                      ;open files to compare
  currentFile = open(#PB_Any,currentFile$)
  
  While Not Eof(file1)
    Read the next line from both files (=a$,b$)
    If Left(a$,3)="kkk" And Mid(a$,4,10) <> Mid(b$,4,10)   ;what are examples of the content of tg$?
      tg$ = tg$ + Mid(b$,4,10)
    Endif
  Wend

  Gosub inject     ;Put tg$ into temp file

  Closefile(file1)
  Closefile(currentFile)

Next

Gosub afterInject                                         ;complete injection process

End

;----------------------------------------------------------------------------------------------------

beforeInject:

tempFile = open(#PB_Any,temp$)                                       ;This creates tempFile, it should not exist before this
mainFile = open(#PB_Any,main$)

Repeat 
  c$ = ReadString(mainFile)
  If Left(c$,7) <> Chr(9) + Chr(9) + "twist"              
    WriteString(tempFile,c$ + Chr(10))                    
  Else                                                    
    lastPos=Loc(mainFile) - Len(c$))                       ;this points to start of line containing Chr(9) + Chr(9) + "twist" in mainFile                 
    Break   
  Endif
Forever

Return

;---------------------

inject:

Fileseek(mainFile,lastPos)
Read Chr(9) + Chr(9) + "twist" line from mainFile
Write WriteString(7,tg$ + c$ + Chr(10)) to tempFile
 
Repeat 
  c$ = ReadString(mainFile)
  If Left(c$,7) <> Chr(9) + Chr(9) + "twist"              
    WriteString(tempFile,c$ + Chr(10))                    
  Else                                                    
    lastPos=Loc(mainFile) - Len(c$))                      ;This points to start of line containing Chr(9) + Chr(9) + "twist" in mainFile                 
    Break                                                 ;lastPos is where the next update will take place. Will wait until then for further action
  Endif
Forever

Return

;----------------------  

afterInject

Repeat 
  c$ = ReadString(mainFile)                               ;copy remaining portion of file
  WriteString(tempFile,c$ + Chr(10))                    
Until Eof(mainFile)

Closefile(tempFile)
Closefile(mainFile)

Copy tempFile to mainFile
Delete tempFile

Return