Page 1 of 1
How do i get the real size of a file on disk?
Posted: Thu Jul 24, 2003 6:52 pm
by Kale
Hi, Does anybody know a way i could get the real size on disk from a file/files that is only reported in bytes? I.E.:
Code: Select all
Size on Disk: 71312896b (68MB) <-- How do i get this,
Size: 71059561b <-- From this?
Thanks
Posted: Thu Jul 24, 2003 6:57 pm
by Tranquil
filesize() reports the correct filesize here. or open the file and use len(), reports the same size.
Dont see the problem!? From where do you get the other values?
Mike
Posted: Thu Jul 24, 2003 7:13 pm
by Kale
Im using this procedure:
Code: Select all
;Return the size of a directory in bytes
Procedure.l DirectorySize(DirectoryID.l, DirectoryName.s)
If ExamineDirectory(DirectoryID, DirectoryName, "*.*")
Repeat
Entry.l = NextDirectoryEntry()
Name.s = DirectoryEntryName()
If Entry = 1
TotalFileSize + DirectoryEntrySize()
ElseIf Entry = 2
If Name <> "." And Name <> ".."
TotalFileSize + DirectorySize(DirectoryID + 1, DirectoryName + Name + "\")
UseDirectory(DirectoryID)
EndIf
EndIf
Until Entry = 0
EndIf
ProcedureReturn TotalFileSize
EndProcedure
Debug DirectorySize(1, "C:\My Documents\")
And when i right click on this directory and select 'Properties' in WinXP i get 2 different sizes, one normal in bytes and one 'real' size on disk. This procedure returns the normal size in bytes, not how much disk space the files take up.

see what i mean

Posted: Thu Jul 24, 2003 7:37 pm
by GPI
The diffrent between Real File Size and File Size on Disc is not so easy to explain.
(Nearly) all filesystems with random access use a litte trick to do there Work faster: Clusters (ok, Fat known this a clusters, the C64 called this Blocks (save as XBOX and GC).
See a cluster as a page in a book. Every Cluster has a fixed size (for example 4 KB). A file can only start on a Cluster and when a file is 1 Byte big, he use the complete clusters. So 1024 Files with a size of 1 Byte need in our example 4MB!
Why This?
Ok, here a example
our HDD:
Code: Select all
Cluster 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
---+---+---+---+---+---+---+---
FILE - | - | - | - | - | - | - | -
No save 3 Files A 3-Clusters, b 2 Clusters, C 2 Clusters
Code: Select all
Cluster 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
---+---+---+---+---+---+---+---
FILE A | A | A | B | B | C | C | -
now we delete b
Code: Select all
Cluster 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
---+---+---+---+---+---+---+---
FILE A | A | A | - | - | C | C | -
And add a file D (3 Clusters)
Code: Select all
Cluster 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
---+---+---+---+---+---+---+---
FILE A | A | A | D | D | C | C | D
What did you see. It is easier to earse and you the free space. with 8 Bytes you can control a filesize with 8*4KB (Example Cluster:4kb) = 32Kb.
Without this Clusters/blocks/page (or how you call this), you must a very big table. You must know, wich byte is used, which are free, which byte is for which file, etc. (or you move the complete hdd, when you delete 10 bytes at the beginning, so that all free stuff is on the end of hdd). With clusters is all faster and easier, but need more space.
I don't know ntfs, but i don't think, that this is handle diffrent.
(btw: It can happen, that you can't copy a 50MB-HDD to a 200MB-HDD, because of this problem. AND to reduce the Cluster-Size, don't make to big drives, use partitions.)
Posted: Thu Jul 24, 2003 8:16 pm
by Kale
Yeah thanks GPI i understood the size on disk could be larger due to the filesystem

but heres a little problem for you, take a look at this screenshot
(of my 'C:\Windows\') the size on disk is
smaller explain that!
WinXP SP1

Posted: Thu Jul 24, 2003 8:20 pm
by Pupil
Kale wrote:...the size on disk is
smaller explain that!
File is on a compressed volume perhaps...
Posted: Thu Jul 24, 2003 9:08 pm
by GPI
Pupil wrote:Kale wrote:...the size on disk is
smaller explain that!
File is on a compressed volume perhaps...
Or somebody should run scandisk and hope, that not the complete HDD is crashed...
Posted: Thu Jul 24, 2003 10:54 pm
by ricardo
GPI wrote:See a cluster as a page in a book. Every Cluster has a fixed size (for example 4 KB). A file can only start on a Cluster and when a file is 1 Byte big, he use the complete clusters. So 1024 Files with a size of 1 Byte need in our example 4MB!
Yep.
One 69 bytes file use 4,096 bytes here.

Posted: Tue Jul 29, 2003 2:33 pm
by Johan_Haegg
If its smaller, maybee your running some kind of compression? Look in Properties - General and check if Comression is activated.
#ClusterSize = 4096
Procedure.l SizeOnDisk(Size.l)
ProcedureReturn Int(Size.l/#ClusterSize+1)*#ClusterSize
EndProcedure
Might work, untested...
[Off Topic]
Wouldnt it be nice with a filesystem with flexible clusters?
[256byte name][1byte flags][6byte physical location][LOADS of bytes length] [some bytes CRC] :>
Main problem with this is that fragmentation is impossible (yes, fragmentation is good in some aspects) and the system would probably be slow.
Posted: Tue Jul 29, 2003 4:55 pm
by GPI
>#ClusterSize = 4096
ClusterSize is not constant! It can changed from Filesystem (Fat16 - Fat32) and Partition-Size. Don't use constants in your own routines!
Posted: Tue Jul 29, 2003 5:28 pm
by Kale
BTW i run no compression

Posted: Tue Jul 29, 2003 9:35 pm
by GPI
Kale wrote:BTW i run no compression

Folders are a little bit special. Maybe also there are is a bug. I don't know, but try this with single files.
GPI
Posted: Tue Jul 29, 2003 10:56 pm
by Kale
Yeah works fine with files, just scandisk'd my HDD just to be on the safe side. Its maybe an undocumented
feature of the Windows folder.

Cluster Sizes
Posted: Thu Aug 07, 2003 5:55 am
by oldefoxx
Clusters came into vogue as hard drives got bigger. Sector size on hard drives remains at 256 bytes, so clusters are groups of sectors. The number of sectors per cluster is determined by the size of the hard drive, and the cluster size has to be stretched so that no more clusters are on a drive than can be accessed by a 32-bit word, which is where FAT32 got its name (FAT stands for File Allocation Table).
If you divide a hard drive into several logical drive partitions, then each would have its own FAT32 structure, and since only a portion of the total disk surface would be in each partition, the cluster sizes would be smaller. Of course, each physical and logical drive can have its own structure, so you can have different partitions for DOS, Windows, Linux, and so on. DOS and Windows can only "see" partition structures that they can recognize, do DOS cannot "see" NTFS (used by Windows) or any of the file types that are native to Linux. Linux, on the other hand, usually has drivers that allow it to see or access both DOS and Windows partitions.
Within a partition, any part of a file that extends into a cluster means that the file "owns" that cluster, even if it only needed one byte our ot the thousands that might be involved. But the idea of variable sized clusters is ludercrous, because the system can only address the hardware down to that level. It can also project where any given cluster would be on the hard drive because they all have the same size, so the system can drive the disk mechanism to position to a given cluster as soon as it determines where it should be. If they were variable sized, it could only find a given cluster by following each one leading up to that particular one and ticking them off as they went by. Then instead of having seek times in milliseconds, we would see disk access slowed to megaminutes.
Having several partitions instead of just one means better use of disk space since the number of bytes in partially filled clusters would go down on the average. With small clusters, you do not have to read and write such large amounts of information at once, nor do you have to accumulate as much information in a buffer before being able to write it to disk. On the other hand, large clusters mean fewer seeks to find the next chained cluster in cases where the data is fragmented on the drive.
Some of this is supposition, a reasoning after the fact. I haven't had the opportunity to actually measure these things firsthand - few have, but my opinions in this regard have been expressed by others. I just brought them out to clarity some issues here about sectors, clusters, and all that.
The directory not only keeps tabs on files, it keeps tabs on their size. If it didn't, then it could not report each and every one's size without scanning every file from beginning to end. Without having that data on hand, it could only give you the number of clusters involved, and the size of the clusters then would give a size at least as large as the file. However, the system can go the other way, which is correctly calculate how many clusters at their given size would be needed to store a file of some specific size. Directories and paging structures on the drive also take up some additional space, which deducts from the total available.
To determine the size of a file, open it and check the length of file. Whatever it agrees with, is the correct report. If something else says a different size, then it must be wrong, or there is a flaw in the directory structure on that drive. Using ChkDsk or ScanDisk will allow you to determine if this is true or not, although for some errors the only "cure" is to delete the files that are somehow in error. Corrupted directories, files, and drives are often the result of a failure to close an application or the system to shut down in a proper manner. Power loss or equipment shutdown while Windows or DOS is still running are the most frequent cause of drive corruption, so if that warning comes up that one or more of your hard drives need to be checked before restarting the system - you had better let it happen, and not get impatient to get back up immediately.