How do i get the real size of a file on disk?

Just starting out? Need help? Post your questions and find answers here.
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

How do i get the real size of a file on disk?

Post by Kale »

Hi, Does anybody know a way i could get the real size on disk from a file/files that is only reported in bytes? I.E.:

Code: Select all

Size on Disk: 71312896b (68MB)   <-- How do i get this,
Size: 71059561b                  <-- From this?
Thanks
--Kale

Image
Tranquil
Addict
Addict
Posts: 952
Joined: Mon Apr 28, 2003 2:22 pm
Location: Europe

Post by Tranquil »

filesize() reports the correct filesize here. or open the file and use len(), reports the same size.

Dont see the problem!? From where do you get the other values?

Mike
Tranquil
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

Post by Kale »

Im using this procedure:

Code: Select all

;Return the size of a directory in bytes
Procedure.l DirectorySize(DirectoryID.l, DirectoryName.s)
    If ExamineDirectory(DirectoryID, DirectoryName, "*.*")
        Repeat
            Entry.l = NextDirectoryEntry()
            Name.s = DirectoryEntryName()
            If Entry = 1
                TotalFileSize + DirectoryEntrySize()
            ElseIf Entry = 2
                If Name <> "." And Name <> ".."
                    TotalFileSize + DirectorySize(DirectoryID + 1, DirectoryName + Name + "\")
                    UseDirectory(DirectoryID)
                EndIf
            EndIf
        Until Entry = 0
    EndIf
    ProcedureReturn TotalFileSize
EndProcedure

Debug DirectorySize(1, "C:\My Documents\")
And when i right click on this directory and select 'Properties' in WinXP i get 2 different sizes, one normal in bytes and one 'real' size on disk. This procedure returns the normal size in bytes, not how much disk space the files take up. :? see what i mean :?:
--Kale

Image
GPI
PureBasic Expert
PureBasic Expert
Posts: 1394
Joined: Fri Apr 25, 2003 6:41 pm

Post by GPI »

The diffrent between Real File Size and File Size on Disc is not so easy to explain.

(Nearly) all filesystems with random access use a litte trick to do there Work faster: Clusters (ok, Fat known this a clusters, the C64 called this Blocks (save as XBOX and GC).

See a cluster as a page in a book. Every Cluster has a fixed size (for example 4 KB). A file can only start on a Cluster and when a file is 1 Byte big, he use the complete clusters. So 1024 Files with a size of 1 Byte need in our example 4MB!

Why This?

Ok, here a example
our HDD:

Code: Select all

Cluster 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
       ---+---+---+---+---+---+---+---
FILE    - | - | - | - | - | - | - | -
No save 3 Files A 3-Clusters, b 2 Clusters, C 2 Clusters

Code: Select all

Cluster 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8
       ---+---+---+---+---+---+---+---
FILE    A | A | A | B | B | C | C | -
now we delete b

Code: Select all

Cluster 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 
       ---+---+---+---+---+---+---+---
FILE    A | A | A | - | - | C | C | -  
And add a file D (3 Clusters)

Code: Select all

Cluster 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 
       ---+---+---+---+---+---+---+---
FILE    A | A | A | D | D | C | C | D  
What did you see. It is easier to earse and you the free space. with 8 Bytes you can control a filesize with 8*4KB (Example Cluster:4kb) = 32Kb.

Without this Clusters/blocks/page (or how you call this), you must a very big table. You must know, wich byte is used, which are free, which byte is for which file, etc. (or you move the complete hdd, when you delete 10 bytes at the beginning, so that all free stuff is on the end of hdd). With clusters is all faster and easier, but need more space.

I don't know ntfs, but i don't think, that this is handle diffrent.

(btw: It can happen, that you can't copy a 50MB-HDD to a 200MB-HDD, because of this problem. AND to reduce the Cluster-Size, don't make to big drives, use partitions.)
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

Post by Kale »

Yeah thanks GPI i understood the size on disk could be larger due to the filesystem :) but heres a little problem for you, take a look at this screenshot (of my 'C:\Windows\') the size on disk is smaller explain that! :twisted:

WinXP SP1
Image
--Kale

Image
Pupil
Enthusiast
Enthusiast
Posts: 715
Joined: Fri Apr 25, 2003 3:56 pm

Post by Pupil »

Kale wrote:...the size on disk is smaller explain that! :twisted:
File is on a compressed volume perhaps...
GPI
PureBasic Expert
PureBasic Expert
Posts: 1394
Joined: Fri Apr 25, 2003 6:41 pm

Post by GPI »

Pupil wrote:
Kale wrote:...the size on disk is smaller explain that! :twisted:
File is on a compressed volume perhaps...
Or somebody should run scandisk and hope, that not the complete HDD is crashed...
ricardo
Addict
Addict
Posts: 2438
Joined: Fri Apr 25, 2003 7:06 pm
Location: Argentina

Post by ricardo »

GPI wrote:See a cluster as a page in a book. Every Cluster has a fixed size (for example 4 KB). A file can only start on a Cluster and when a file is 1 Byte big, he use the complete clusters. So 1024 Files with a size of 1 Byte need in our example 4MB!
Yep.

One 69 bytes file use 4,096 bytes here. :?
Johan_Haegg
User
User
Posts: 60
Joined: Wed Apr 30, 2003 2:25 pm
Location: Västerås
Contact:

Post by Johan_Haegg »

If its smaller, maybee your running some kind of compression? Look in Properties - General and check if Comression is activated.

#ClusterSize = 4096

Procedure.l SizeOnDisk(Size.l)
ProcedureReturn Int(Size.l/#ClusterSize+1)*#ClusterSize
EndProcedure

Might work, untested...

[Off Topic]
Wouldnt it be nice with a filesystem with flexible clusters?
[256byte name][1byte flags][6byte physical location][LOADS of bytes length] [some bytes CRC] :>

Main problem with this is that fragmentation is impossible (yes, fragmentation is good in some aspects) and the system would probably be slow.
GPI
PureBasic Expert
PureBasic Expert
Posts: 1394
Joined: Fri Apr 25, 2003 6:41 pm

Post by GPI »

>#ClusterSize = 4096

ClusterSize is not constant! It can changed from Filesystem (Fat16 - Fat32) and Partition-Size. Don't use constants in your own routines!
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

Post by Kale »

BTW i run no compression :)
--Kale

Image
GPI
PureBasic Expert
PureBasic Expert
Posts: 1394
Joined: Fri Apr 25, 2003 6:41 pm

Post by GPI »

Kale wrote:BTW i run no compression :)
Folders are a little bit special. Maybe also there are is a bug. I don't know, but try this with single files.

GPI
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

Post by Kale »

Yeah works fine with files, just scandisk'd my HDD just to be on the safe side. Its maybe an undocumented feature of the Windows folder. ;)
--Kale

Image
oldefoxx
Enthusiast
Enthusiast
Posts: 532
Joined: Fri Jul 25, 2003 11:24 pm

Cluster Sizes

Post by oldefoxx »

Clusters came into vogue as hard drives got bigger. Sector size on hard drives remains at 256 bytes, so clusters are groups of sectors. The number of sectors per cluster is determined by the size of the hard drive, and the cluster size has to be stretched so that no more clusters are on a drive than can be accessed by a 32-bit word, which is where FAT32 got its name (FAT stands for File Allocation Table).

If you divide a hard drive into several logical drive partitions, then each would have its own FAT32 structure, and since only a portion of the total disk surface would be in each partition, the cluster sizes would be smaller. Of course, each physical and logical drive can have its own structure, so you can have different partitions for DOS, Windows, Linux, and so on. DOS and Windows can only "see" partition structures that they can recognize, do DOS cannot "see" NTFS (used by Windows) or any of the file types that are native to Linux. Linux, on the other hand, usually has drivers that allow it to see or access both DOS and Windows partitions.

Within a partition, any part of a file that extends into a cluster means that the file "owns" that cluster, even if it only needed one byte our ot the thousands that might be involved. But the idea of variable sized clusters is ludercrous, because the system can only address the hardware down to that level. It can also project where any given cluster would be on the hard drive because they all have the same size, so the system can drive the disk mechanism to position to a given cluster as soon as it determines where it should be. If they were variable sized, it could only find a given cluster by following each one leading up to that particular one and ticking them off as they went by. Then instead of having seek times in milliseconds, we would see disk access slowed to megaminutes.

Having several partitions instead of just one means better use of disk space since the number of bytes in partially filled clusters would go down on the average. With small clusters, you do not have to read and write such large amounts of information at once, nor do you have to accumulate as much information in a buffer before being able to write it to disk. On the other hand, large clusters mean fewer seeks to find the next chained cluster in cases where the data is fragmented on the drive.

Some of this is supposition, a reasoning after the fact. I haven't had the opportunity to actually measure these things firsthand - few have, but my opinions in this regard have been expressed by others. I just brought them out to clarity some issues here about sectors, clusters, and all that.

The directory not only keeps tabs on files, it keeps tabs on their size. If it didn't, then it could not report each and every one's size without scanning every file from beginning to end. Without having that data on hand, it could only give you the number of clusters involved, and the size of the clusters then would give a size at least as large as the file. However, the system can go the other way, which is correctly calculate how many clusters at their given size would be needed to store a file of some specific size. Directories and paging structures on the drive also take up some additional space, which deducts from the total available.

To determine the size of a file, open it and check the length of file. Whatever it agrees with, is the correct report. If something else says a different size, then it must be wrong, or there is a flaw in the directory structure on that drive. Using ChkDsk or ScanDisk will allow you to determine if this is true or not, although for some errors the only "cure" is to delete the files that are somehow in error. Corrupted directories, files, and drives are often the result of a failure to close an application or the system to shut down in a proper manner. Power loss or equipment shutdown while Windows or DOS is still running are the most frequent cause of drive corruption, so if that warning comes up that one or more of your hard drives need to be checked before restarting the system - you had better let it happen, and not get impatient to get back up immediately.
has-been wanna-be (You may not agree with what I say, but it will make you think).
Post Reply