PDA

View Full Version : Text document size


Zeppelin
27th Jun 2011, 05:06
Can someone explain this to me.

I have a text document file which is 568kb and am updating it with a new one which is considerably larger but shows its size as only 68kb.

Both files contain the same sort of info i.e. a series of coordinates, so don't quite understand why the larger one shows fewer kb?

Spitoon
27th Jun 2011, 05:53
Is it really a text file (i.e. .TXT)? If not, perhaps a MS Word doc, the extra size is probably down to the metadata stored with the file.

Zeppelin
27th Jun 2011, 07:38
I´ve checked in properties>file type and they are both called text document(.txt)

So not sure if I update, whether the new larger file (but smaller kb) is actually transfering all the new info... if you understand!

green granite
27th Jun 2011, 08:10
why not save it as xxx1.txt then you can compare before you over write the original.

Zeppelin
27th Jun 2011, 09:24
I'm really puzzled, 'cos opened both files in notepad. File 1 contains about 60 lines of coordinates. File 2 has about 70 lines of similar cordinates, yet file 2 says 68kb and file 1 568kb.
Tried renaming and saving, still the same, so just dont understand how it can only show so few kbs?

mixture
27th Jun 2011, 09:49
Zeppelin,

If it is genuinely a plain text file, then you should be able to work out the expected size yourself.

1 character = 1 byte
1 kilobyte = 1024 bytes (or 1000 for approximation purposes !)

Basic mathematics will then tell you how many bytes a string of your coordinates is, and consequently how many 60 lines will come to.

Mike-Bracknell
27th Jun 2011, 12:00
568kb is pretty huge for a text file, so it's possible that it has some extraneous info in it somewhere.

Saab Dastard
27th Jun 2011, 12:08
Line breaks (paragraph marks / carriage returns) and spaces all count!

Edit / select all - see if there's any "white space" in the large file.

SD

Spurlash2
27th Jun 2011, 12:12
Not an answer, but the different file sizes you mention are a classic when comparing .doc with .docx file sizes. (The .docx being far smaller)

...or...

Track changes or version numbers of the document. (but that doesn't make sense, as you are in Notepad:(

You haven't snuck a picture in there, have you?

BOAC
27th Jun 2011, 12:22
Check for a Macro?

FullOppositeRudder
27th Jun 2011, 13:19
Hmmm - that's a very large text file :confused:

Along the lines of what SD has suggested, check that the larger file doesn't have some "hidden" characters or data hiding way past the end of your primary information. It's a very remote chance but with a simple processor like Notepad, that can sometimes happen.

I've encountered this when stripping satellite keplarian elements of spurious text information prior to uploading them to a prediction program.

FWIW

F-O-R

BOAC
27th Jun 2011, 15:15
Hmm - I guess someone has to do it.............:)

mixture
27th Jun 2011, 15:56
very remote chance but with a simple processor like Notepad, that can sometimes happen.

Erm, hidden characters in Notepad ???

You'd have to be inputting some pretty funky ASCII for that to happen !

I reckon he's probably created the doc in word or something else that generates formatting data.

FullOppositeRudder
28th Jun 2011, 00:08
Erm, hidden characters in Notepad ???

You'd have to be inputting some pretty funky ASCII for that to happen !OK - point taken.:ok:

Suggestion withdrawn! :suspect:

It could be a bug. Similar (if isolated) reporting anomalies seem to have been observed elsewhere - see here (http://answers.microsoft.com/en-us/windows/forum/windows_vista-performance/windows-explorer-reporting-incorrect-sizes-in/792dcd0e-12ce-4008-be01-0c60f202959a)

f-o-r

Spitoon
28th Jun 2011, 05:49
If the lists really are just co-ordinates and you wouldn't have to kill us afterwards, why not post the files up on the web somewhere for us to look at? I'm sure someone could give you an answer in seconds then!

mixture
28th Jun 2011, 08:00
It could be a bug

Suggestion accepted ! :ok:

When in doubt, blame a bug (or, as per tradition, the user !).

Spurlash2
28th Jun 2011, 08:39
Using the old =RAND(1,5) text insertion trick; I got 1 paragraph, 98 words, 7 lines and 581 characters with spaces. The .txt file was 581 bytes, the .docx was 12.8 KB and the .doc was 22 KB. (This is just to illustrate the point about the different file formats - I'm not crazy or anything...)

Are you sure you're not using .doc???

Zeppelin
28th Jun 2011, 09:36
2shared - download File1.txt (http://www.2shared.com/document/jbN00z5C/File1.html)

2shared - download File2.txt (http://www.2shared.com/document/Q92XZTOC/File2.html)

Ok, these are the 2 files- nothing very exciting, but intrested to know the difference

mixture
28th Jun 2011, 10:06
but intrested to know the difference

At first glance, about 12,159 lines.....

$ wc -l File1.txt
8714 File1.txt
$ wc -l File2.txt
20873 File2.txt

Removing everything apart from basic printable characters.....
$tr -cd '\40-\126' < File2.txt > filex.txt
$ ls -ltrh File2.txt filex.txt
-rw-r--r--@ 1 * * 565K 28 Jun 11:04 File2.txt
-rw-r--r-- 1 * * 524K 28 Jun 11:11 filex.txt

So you've just got a bigger text file. Nothing sinister hidden.

BOAC
28th Jun 2011, 10:07
A quick look shows that at some stage in their lives, both files had 3 macros attached. They do not appear to be there now.

A hex look at each file shows nought.

File 1 has a carriage return after each line, file 2 has a % after each line. I suspect file2 was written by some data extraction programme and the mode of writing has added 'length' to each line by not closing it?

Sorry I cannot be more help, but I'm sure someone will know! It might help to tell us how you 'came by' each file.

mixture
28th Jun 2011, 10:23
Building on my previous post.

If we leave only .,-azAZ09 + CRLFsp (i.e. similar charset to what's in File1).

$ tr -cd '\32\r\n\45\46\44\48-\57\65-\90\97-\122' < File2.txt > filex.txt
-rw-r--r-- 1 * * 224K 28 Jun 11:21 filex.txt

Mike-Bracknell
28th Jun 2011, 15:01
Beard
Sandals with socks
Red woollen jumper

:}

mixture
28th Jun 2011, 16:01
Nah, just giving a good demo of one of the many reasons why it's great to have a Mac on your desk and not a PC. :cool:

(Before someone with a high IQ called Alex replies, yes, you could do the same on a nix/nux desktop, but they're not really that good at doing desktop stuff compared to macs).


Beard
Sandals with socks
Red woollen jumper

Nope. But in times past I have had meetings with fairly senior people at Sun who sported such an outfit.

You're just jealous because you didn't think up such a smart analysis.... :E