Yesterday we've been talking about data compression to fit as many value as possible inside banks.
Banks capacity are very limited therefore there is a need for compacting data as much as possible. Strings can be used for this purpose since they have a dynamic arbitrary length. However there are no native way to access each character as an integer.
(Introduction stolen from vjeux :P)
So here I present ...
STARCODE v1.4
STARCODE is a library designed to compress integers into strings.
You can give as many integers to STARCODE as you want to and it will turn them into a string as small as possible. These strings can then be stored in banks and later retrieved, decompressed and the integers can be read again.
STARCODE offers the mathematically best compression rate possible (when disregarding compression using probability distribution) and is very easy to use.
Current Features:
Packaging STARCODE reduces the size of information through packaging to it's smallest possible outcome.
Base Conversion STARCODE can convert decimal strings into any other base and back.
This way STARCODE can utilize almost the full range of string characters for best possible compression.
En-/Decryption STARCODE can encrypt any string with a special key which makes it impossible to read correct data
from the string without entering this key.
Hashing STARCODE uses checksums in order to be able to spot arbitrary code modification or an incorrect encryption key. More info here
Integration
All of STARCODE's functions can be easily accessed in the trigger editor.
Documentation
Please download the example map to see the full documentation and an example of how to use STARCODE.
Credits
rrowland (who repeatedly taught us that 18 can't fit into 16)
RodOfNOD (who didn't let up searching for a solution)
Sholdak (who supported RodOfNOD in his eternal struggle)
Interesting... right now I'm doing my hero-saving by going through and taking the hero's level, converting it to a num/letter/symbol (I think around 95 characters I'm using from 0-10, a-z, A-Z, and all symbols, so base-95). I do this with abilities, items, and so on.
Would this tool here assist me in anyway? Everything works for me atm with converting my hero's data into a string and putting anti-cheat and saving it in the bank... but what concerns me of course is that I'm pretty sure my anti-cheat sucks. I haven't fully looked at this sexy tool, but could it be used for my case to store stuff like hero level, class index, rank of abilities, indexes of items carried, etc, into a string with GOOD encryption?
This tool would indeed help you. It is made for saving integers, so exactly what you want to.
In terms of encryption it is a bit lacking. The key encryption itself itself cannot be cracked (and even the cleartext would be undecypherable without the used alphabet and the exact order and size of all stored values).
But there is something like a CRC check amiss, which checks whether the code is wrong or modified. My plan is to add this with the next version.
But just to throw it in: Even the best encryption won't help you if somebody is able to open your map and look at the map script.
You might have the option to protect the map, but that doesn't mean that you can't crack open that protection eventually.
Tremendous work. Beat me to publishing by a day lol. Its good though yours works better than mine and is more efficient not to mention overall more elegannt.
Hashing: STARCODE reduces the size of information through hashing to it's smallest possible outcome.
A hash is a small amount of bits that "represents" a data. But in a one-way only, you cannot get your data back through that value. For example md5 or crc are hashing methods.
You may have been confused by the name of my lib SmallHash. In fact, Hash in this case is the part of the url after the # :) You probably want to name it Packing.
What's the highest base you can go in the compression? As far as i know, the game strings are unicode, so you should be able to use all sorts of funky characters.
Unfortunately not. Strings basically have the possibility to store 255 different characters within them, but all apart from the alphanumeric (including special characters) need to be entered using a code like \x5F.
This works well within SC2, but when you want to store such a string in a bank, it'll store "\x5F" and not it's unicode equivalent. It'd need 4 characters to store a single one. This is obviously bad.
Right now it converts into base 87. I know of 92 different characters, but needed to exclude 2 and I forgot 3 others, which will be included in the next version.
This is it:
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!$%&/()=?,.-;:_^#+*' @{[]}|>"
I'd be grateful if anyone knows if I'm missing any.
Yea, you got me there. I'll add some hashing, too, though. Right now there's no way of identifying whether a code can be decompressed correctly or not (after encryption).
Also, vjeux, I'd be very happy if you could write down your thoughts about adding Huffman as compression.
I still don't get where you could achieve good compression with it. Where and how should it be used, in your opinion? (maybe I'm just not overlooking all possibilities. It's been a while since I used it).
I wouldn't think that Huffman coding would work too well unless you could figure out a suitable tree beforehand. If you had to store the tree in the bank, it'd most likely cancel out most of the benefit of the compression…
Yea, well. The question is whether the gain is greater than the loss.
In my opinion it isn't, unless someone stores.. like... 20 times the same value with the same max value. Which can happen. But the values will change and the usefulness of Huffman will decrease.
But vjeux knows his math better than I do, so if he says Huffman can be of use, I'll not discard it lightly :P
Oh yeah about huffman, I remembered there were variable-length values but I thought thas was from the input and not the output ... It reduces the usefulness of what I thought.
Anyway, I am conviced that a compression will give some gain anyway. Because the values won't be completly random, ie most player will only play the map one or two time, so the values will more likely be near 0. If you use a binary vector to tell if you have an item or not, again there will be many 0. In the general case, values are not going to be uniformly distributed, so there come the compression!
Now, we are working on a really limited space, so the compression overhead may be important. It needs to be tested :)
do i get it right that tis is in fact nearly the same as the old systems used in wc3 to generate savecodes? (e.g. with 64 chars)
its very useful and i used smth similar for my rpg, the problem is however, that most of the space needed in my savecodes were the quest states, witch are saved by an integer with values from 0 to 3. so to compress the quest info you whould have to add 5 quests or so together to one integer value and then convert it to the new system.
so my suggestion: maybe you could implement a function that allows you to combine 5 or 6 boolean values or integer values with only one digit (0-9) to one big number witch is then converted to the XX char sys. cause booleans and integers with only one digit might be used very often, too.
for example 6 quests with different states would be:
023001 and this integer would be converted later.
correct me if i get smth wrong^^
i think this would make sense because the system is only useful for big numbers, as less digits as less effective is the "compression".
You could just add a boolean for compression (if true compress, if false then not...), experienced mappers will know beforehand if it is usefull (in case you comment accordingly xD).
STARCODE compresses large values as well as small values. The system already combines all your small numbers into one large number.
For example:
I store 1, 1, 0, 1, 0, 1 (6 boolean values).
After STARCODE did it's job you get: "R".
6 booleans = 2^6 = 64
7 booleans = 2^7 = 128
Since we only have 90 different characters at our disposal we've already reached the maximum number of booleans possible.
If we want to add more, the system needs to add a second character.
The remaining space of 26 (90-64) will be used in this case, too. This way, when you expand the code to 2 characters of length, you can even store 13 booleans (2*6 + the remaining space which is enough for one more), instead of only 12 (2*6).
Long story short, there is no way of further improving the compression by doing something with the numbers. It already achieves maximum compression.
Yesterday we've been talking about data compression to fit as many value as possible inside banks.
Banks capacity are very limited therefore there is a need for compacting data as much as possible. Strings can be used for this purpose since they have a dynamic arbitrary length. However there are no native way to access each character as an integer. (Introduction stolen from vjeux :P)
So here I present ...
STARCODE v1.4
STARCODE is a library designed to compress integers into strings. You can give as many integers to STARCODE as you want to and it will turn them into a string as small as possible. These strings can then be stored in banks and later retrieved, decompressed and the integers can be read again.
STARCODE offers the mathematically best compression rate possible (when disregarding compression using probability distribution) and is very easy to use.
Current Features:
STARCODE reduces the size of information through packaging to it's smallest possible outcome.
STARCODE can convert decimal strings into any other base and back.
This way STARCODE can utilize almost the full range of string characters for best possible compression.
STARCODE can encrypt any string with a special key which makes it impossible to read correct data from the string without entering this key.
STARCODE uses checksums in order to be able to spot arbitrary code modification or an incorrect
encryption key. More info here
Integration
Documentation
Credits
Thank you guys for your dedication to the community. Great work.
@s3rius: Go
Interesting... right now I'm doing my hero-saving by going through and taking the hero's level, converting it to a num/letter/symbol (I think around 95 characters I'm using from 0-10, a-z, A-Z, and all symbols, so base-95). I do this with abilities, items, and so on.
Would this tool here assist me in anyway? Everything works for me atm with converting my hero's data into a string and putting anti-cheat and saving it in the bank... but what concerns me of course is that I'm pretty sure my anti-cheat sucks. I haven't fully looked at this sexy tool, but could it be used for my case to store stuff like hero level, class index, rank of abilities, indexes of items carried, etc, into a string with GOOD encryption?
@OneTwoSC: Go
This tool would indeed help you. It is made for saving integers, so exactly what you want to. In terms of encryption it is a bit lacking. The key encryption itself itself cannot be cracked (and even the cleartext would be undecypherable without the used alphabet and the exact order and size of all stored values). But there is something like a CRC check amiss, which checks whether the code is wrong or modified. My plan is to add this with the next version.
But just to throw it in: Even the best encryption won't help you if somebody is able to open your map and look at the map script. You might have the option to protect the map, but that doesn't mean that you can't crack open that protection eventually.
@s3rius: Go
Such is true, thanks for the response.
Tremendous work. Beat me to publishing by a day lol. Its good though yours works better than mine and is more efficient not to mention overall more elegannt.
Thanks!
Now just need to implement some kind of compression for strings (Huffman would do, as per vjeux).
Nice :)
A hash is a small amount of bits that "represents" a data. But in a one-way only, you cannot get your data back through that value. For example md5 or crc are hashing methods.
You may have been confused by the name of my lib SmallHash. In fact, Hash in this case is the part of the url after the # :) You probably want to name it Packing.
What's the highest base you can go in the compression? As far as i know, the game strings are unicode, so you should be able to use all sorts of funky characters.
@MindWorX: Go
Unfortunately not. Strings basically have the possibility to store 255 different characters within them, but all apart from the alphanumeric (including special characters) need to be entered using a code like \x5F.
This works well within SC2, but when you want to store such a string in a bank, it'll store "\x5F" and not it's unicode equivalent. It'd need 4 characters to store a single one. This is obviously bad. Right now it converts into base 87. I know of 92 different characters, but needed to exclude 2 and I forgot 3 others, which will be included in the next version.
This is it:
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!$%&/()=?,.-;:_^#+*' @{[]}|>"
I'd be grateful if anyone knows if I'm missing any.
@vjeux: Go
Yea, you got me there. I'll add some hashing, too, though. Right now there's no way of identifying whether a code can be decompressed correctly or not (after encryption).
Also, vjeux, I'd be very happy if you could write down your thoughts about adding Huffman as compression. I still don't get where you could achieve good compression with it. Where and how should it be used, in your opinion? (maybe I'm just not overlooking all possibilities. It's been a while since I used it).
I wouldn't think that Huffman coding would work too well unless you could figure out a suitable tree beforehand. If you had to store the tree in the bank, it'd most likely cancel out most of the benefit of the compression…
Yea, well. The question is whether the gain is greater than the loss.
In my opinion it isn't, unless someone stores.. like... 20 times the same value with the same max value. Which can happen. But the values will change and the usefulness of Huffman will decrease.
But vjeux knows his math better than I do, so if he says Huffman can be of use, I'll not discard it lightly :P
@s3rius: Go
So you can't like copy these chars and use them? ÆØÅæøåóÓ
@MindWorX: Go
Correct.
Oh yeah about huffman, I remembered there were variable-length values but I thought thas was from the input and not the output ... It reduces the usefulness of what I thought.
Anyway, I am conviced that a compression will give some gain anyway. Because the values won't be completly random, ie most player will only play the map one or two time, so the values will more likely be near 0. If you use a binary vector to tell if you have an item or not, again there will be many 0. In the general case, values are not going to be uniformly distributed, so there come the compression!
Now, we are working on a really limited space, so the compression overhead may be important. It needs to be tested :)
do i get it right that tis is in fact nearly the same as the old systems used in wc3 to generate savecodes? (e.g. with 64 chars)
its very useful and i used smth similar for my rpg, the problem is however, that most of the space needed in my savecodes were the quest states, witch are saved by an integer with values from 0 to 3. so to compress the quest info you whould have to add 5 quests or so together to one integer value and then convert it to the new system.
so my suggestion: maybe you could implement a function that allows you to combine 5 or 6 boolean values or integer values with only one digit (0-9) to one big number witch is then converted to the XX char sys. cause booleans and integers with only one digit might be used very often, too.
for example 6 quests with different states would be:
023001 and this integer would be converted later.
correct me if i get smth wrong^^
i think this would make sense because the system is only useful for big numbers, as less digits as less effective is the "compression".
You could just add a boolean for compression (if true compress, if false then not...), experienced mappers will know beforehand if it is usefull (in case you comment accordingly xD).
@Mille25: Go
STARCODE compresses large values as well as small values. The system already combines all your small numbers into one large number.
For example:
I store 1, 1, 0, 1, 0, 1 (6 boolean values).
After STARCODE did it's job you get: "R".
6 booleans = 2^6 = 64
7 booleans = 2^7 = 128
Since we only have 90 different characters at our disposal we've already reached the maximum number of booleans possible.
If we want to add more, the system needs to add a second character.
The remaining space of 26 (90-64) will be used in this case, too. This way, when you expand the code to 2 characters of length, you can even store 13 booleans (2*6 + the remaining space which is enough for one more), instead of only 12 (2*6).
Long story short, there is no way of further improving the compression by doing something with the numbers. It already achieves maximum compression.
Can this code convert text to string?
@RodrigoAlves: Go
Nothing can convert text to string. Not even God.