Yesterday we've been talking about data compression to fit as many value as possible inside banks. I want to make a little write up of it.
Banks capacity are very limited therefore there is a need for compacting data as much as possible. In order to compact data it is convenient to work within a binary stream. Strings can be used for this purpose since they have a dynamic arbitrary length. However there are no native way to access each character as an integer.
Ord and Chr
We have found a trick to do the two basic operations: get the integer code of a character (ord) and write a character represented by the integer (chr). We need a string that will hold the alphabet. The position of the character in the alphabet determines its value. In order to create a character from the value, we just take the character at the offset "value" in the alphabet. To get the value we use StringFind to give us the position of the character in the alphabet.
There are some limitations here: not all characters are valid in a string literal. s3rius said there was about 96 characters out of the 256 that did not make a script error.
We tried another technique which is using the hexadecimal value: "\x01" to "\xff" (\x00 is terminating the string like in C). It works great but when being written down in the xml file, it gets written as \x01 so it takes up 4 characters. We didn't test but we believe that it takes 4 characters in the bank limit. So this is not a viable solution.
Packing Data
Now that we have a solution to work on a binary stream, we can start packing data. What we want to store ultimately are bounded values. For example if you want to save the unit level, it's a number between 1 and 10. We need to store 10 different values, it means that it requires at least ceil(log2(10)) = 4 bits (2^4 = 16). And we want to store if he is dead or not so it takes 1 other bit.
Each character can hold 96 different values so it is 5 bits for us to work with (2^5 = 64). We will not use the range 64-96 as it does not fix a power of 2. We want to store a 4 bits data and a 1 bit data inside.
So you probably get the algorithm, for each value we shift by the number of slots we need and then we add the value. If you don't want to use a shift operation, you can multiply by 2^n. (x << n == x * (2 ^ n)).
In order to get the data back, we need to use modulo and division (this time we need to use powers of 2).
Now that you have an example it is quite easy to write the algorithm. However we are using power of two here but this is not needed at all in fact. Instead of multiplying and dividing by 2 ^ n you can do the operation with the maximum of values you want to store. For example if you want to store 10 different levels, just multiply and divide by 10!
The following is a javascript pseudo code of the algorithm:
What you have to know is that each character gives you 96 possibilities. In order to know how many space you need, you have to multiply all the number of possibilities of each value. In our example 10 (Level) * 2 (Dead) = 20. We could insert one more value from 1 to 4. It would take 10 * 2 * 40 = 80. We cannot insert more in the character.
Beyond one character
We can now insert as many values as possible in one character, but as you can see, in the last example there are still 16 unused values. We want to use them along with the next character. It is possible to do so! But quite complicated in fact.
Instead of considering our character as a whole value, we want to view it as a "digit". One character holds 96 values. Therefore two characters hold 96 * 96 values, and so on. If we have 10 characters at our disposition, it will hold 96^10 values. You can multiply all the ranges to know the space you need, and you can get the characters needed.
But I didn't said how to accumulate characters. You have to consider the characters as a big number. You have to recode the basic operations + * / % for these numbers. Addition and subtraction aren't that complicated, however division and multiplication using elementary school algorithm are a bit tougher to code.
Epilogue
The last method gives you the best (yes the best!) possible packing. The downside is that it requires a big integer library which is not trivial to implement. I have coded it in Javascript: the SmallHash library.
I am pretty sure that applying the Huffman algorithm on top of it will gives an even better compression as your data are not likely to have an uniform distribution.
Sadly I don't have the time to code it. However the whole algorithm is described here and is working perfectly. If you want to work on it and don't understand some details, just ask me :)
Well, back to topic:
I am working on a the library for that.
I have already finished the SmallHash functions and they work fine.
My BigNum seems to work good too.
I'm just working on a base conversion, however there's still a bug somewhere in there (possibly in connection with BigNum).
And I have also added a basic key en-/decryption.
vjeux, the idea of adding a Huffman algorithm is not a bad one, but it could potentially make the code longer. It's hard to predict the final outcome.
Have you any links or could you say how exactly banks are limited plz ?
I've found one tutorial but it's a video, and i'm not found of this support, especially because English is not my native language ...
- If you put too much stuff into a bank it will crash the map during loading.
- Right now you can save strings with at max 817 letters in a bank before it crashes your map.
- However, the maximum size for a string is 781. So you can have one string of size 781 and one of size 36 (781+36=817).
- If you try to save things like Units in a bank, it will use up A LOT of space.
- I think banks don't have limits when you're not playing online. But over battle.net these limits apply.
- In phase 1 the maximum limit of 817 was for ALL PLAYERS TOGETHER. We couldn't test whether it is still like this.
vjeux, the idea of adding a Huffman algorithm is not a bad one, but it could potentially make the code longer. It's hard to predict the final outcome.
Sorry I did this on the train, my internet connection was too sloppy for me to check :( Anyway it's fixed!
The packing is already optimal given that you use all the values from the range equally. However this is not happening in real life applications. Therefore using the huffman algorithm will yield the best possible output since it's hard to know the repartition (LZW is great because we know that the recurring values are stored on 7 bits).
I just don't understand blizzard's thinking sometime, they've delt with crap like this before, you'd think they'd figure some shit out or listen to us?
Blizzard:
"We've created a fantastic new method for saving data between maps. By data we of course mean like 8 alphanumeric digits. That should be plenty right?"
For my RPG I still have a bit of room so I'm okay... but I just stored each players level/abils/items/etc. in a 95-character string code. Of course, it has some anti-cheat stuff inserted into it... with 6 players in my rpg thats 6x95-char strings... I think I'm safe :D
It's funny I should say that and like a couple hours later S3rius and friends release a tool that literally converts my 8 digit stored data example into like 700 digits of useable data haha.
Hey guys do you know the official amount that will load to the map?
It seems like if I try to load too many variables the map wont load, but if I load too many characters, the map loads fine but the string gets cut off.
Are the 817 characters shared between all 16 players?
We should be able to test this out pretty easily if any of you want to give it a go, I have a map that does nothing but -save and -load stuff.
I'm Char name: Hoki Code: 118
I've discovered some things that I noticed when I was not getting the 36 character limit on the second string.
My first attempt to maximize characters in the file was with a file named "tacoSaveLoad.SC2Bank", section name "tacos", and the key names "taco" and "taco2".
I was able to cram 827 characters total into the bank.
Curious that my total did not match your 817, I wondered if it may be a difference in our section and key names. (I was planning to see if we could exploit those names for data purposes anyways)
I renamed my section name to "a" and key names to "b" and "c". I was then able to cram in 838 characters.
Still curious about whether or not the game was counting characters, or examining the size of the file, I renamed my bank to "a.SC2Bank", and was able to fit in an additional 11 characters for a total of 849 characters.
The size of my file when maxxed out is 1.07KB(1,102bytes) and 4.00KB on disk (4,096bytes on disk).
I can't make any sense of it. If I add another key value which adds 38 characters in itself, it only reduces the amount of characters I can fit in by 9, and the size of the file grows well beyond the size of the file if I were to only have 2 keys and only add 1 more character, which breaks the map.
However
2 keys = smaller file size, fewer characters, breaks the map at a lower threshold. More usable characters.
3 keys = larger file size, more characters, breaks the map at a higher threshold. Fewer usable characters.
If I add as many whitespace, tabs, comments, return characters and the game doesn't blink an eye, it just loads fine.
Pushed the size of the file up to 17.9KB(18,363bytes) and 20.0KB on disk (20,480bytes on disk) with comments and the file loads.
It seems as if it may be measuring the size of the file in memory perhaps? Maybe its measuring the size of the data after serialization?
4kB is limit for bank files of all player all together.
If you want to use starcode to its full potential:
First of all check size of each created string before compresion. You shouldn't try to create strings longer than 100 chars. (to avoid trigger execution time limit during decompresion) Check it by storing maximum value of value you are storing to get maximum string possible.
Create as many of uncompresed strings as you need and compres them one after another.
Remove "space" from characters used by starcode.
Create one big string from all strings you have created adding "space" between them.
Save it as a one string in bank file. (remember to name bank section with only one char becouse it takes space too)
When you load just load this one big string and extract each of substing using "get word" function.
Decompres each sting separatly and here you go.
Remmember to decompres each string in separate trigger to avoid trigger time execution limit.
When you create Bank file check it's size. Multiply it by amount of players. And you can check if you will fit into 4kB limit.
Yesterday we've been talking about data compression to fit as many value as possible inside banks. I want to make a little write up of it.
Banks capacity are very limited therefore there is a need for compacting data as much as possible. In order to compact data it is convenient to work within a binary stream. Strings can be used for this purpose since they have a dynamic arbitrary length. However there are no native way to access each character as an integer.
Ord and Chr
We have found a trick to do the two basic operations: get the integer code of a character (ord) and write a character represented by the integer (chr). We need a string that will hold the alphabet. The position of the character in the alphabet determines its value. In order to create a character from the value, we just take the character at the offset "value" in the alphabet. To get the value we use StringFind to give us the position of the character in the alphabet.
There are some limitations here: not all characters are valid in a string literal. s3rius said there was about 96 characters out of the 256 that did not make a script error.
We tried another technique which is using the hexadecimal value: "\x01" to "\xff" (\x00 is terminating the string like in C). It works great but when being written down in the xml file, it gets written as \x01 so it takes up 4 characters. We didn't test but we believe that it takes 4 characters in the bank limit. So this is not a viable solution.
Packing Data
Now that we have a solution to work on a binary stream, we can start packing data. What we want to store ultimately are bounded values. For example if you want to save the unit level, it's a number between 1 and 10. We need to store 10 different values, it means that it requires at least ceil(log2(10)) = 4 bits (2^4 = 16). And we want to store if he is dead or not so it takes 1 other bit.
Each character can hold 96 different values so it is 5 bits for us to work with (2^5 = 64). We will not use the range 64-96 as it does not fix a power of 2. We want to store a 4 bits data and a 1 bit data inside.
Insertion
So you probably get the algorithm, for each value we shift by the number of slots we need and then we add the value. If you don't want to use a shift operation, you can multiply by 2^n. (x << n == x * (2 ^ n)).
Extraction
In order to decode here is how it works:
In order to get the data back, we need to use modulo and division (this time we need to use powers of 2).
Now that you have an example it is quite easy to write the algorithm. However we are using power of two here but this is not needed at all in fact. Instead of multiplying and dividing by 2 ^ n you can do the operation with the maximum of values you want to store. For example if you want to store 10 different levels, just multiply and divide by 10!
The following is a javascript pseudo code of the algorithm:
What you have to know is that each character gives you 96 possibilities. In order to know how many space you need, you have to multiply all the number of possibilities of each value. In our example 10 (Level) * 2 (Dead) = 20. We could insert one more value from 1 to 4. It would take 10 * 2 * 40 = 80. We cannot insert more in the character.
Beyond one character
We can now insert as many values as possible in one character, but as you can see, in the last example there are still 16 unused values. We want to use them along with the next character. It is possible to do so! But quite complicated in fact.
Instead of considering our character as a whole value, we want to view it as a "digit". One character holds 96 values. Therefore two characters hold 96 * 96 values, and so on. If we have 10 characters at our disposition, it will hold 96^10 values. You can multiply all the ranges to know the space you need, and you can get the characters needed.
But I didn't said how to accumulate characters. You have to consider the characters as a big number. You have to recode the basic operations + * / % for these numbers. Addition and subtraction aren't that complicated, however division and multiplication using elementary school algorithm are a bit tougher to code.
Epilogue
The last method gives you the best (yes the best!) possible packing. The downside is that it requires a big integer library which is not trivial to implement. I have coded it in Javascript: the SmallHash library.
I am pretty sure that applying the Huffman algorithm on top of it will gives an even better compression as your data are not likely to have an uniform distribution.
Sadly I don't have the time to code it. However the whole algorithm is described here and is working perfectly. If you want to work on it and don't understand some details, just ask me :)
THANK YOU, been waiting for someone to explain how to do this....even though i still dont get it at all
I know it's hard to remember.. but it's s3rius ;P
Well, back to topic:
I am working on a the library for that.
I have already finished the SmallHash functions and they work fine.
My BigNum seems to work good too.
I'm just working on a base conversion, however there's still a bug somewhere in there (possibly in connection with BigNum).
And I have also added a basic key en-/decryption.
vjeux, the idea of adding a Huffman algorithm is not a bad one, but it could potentially make the code longer. It's hard to predict the final outcome.
Have you any links or could you say how exactly banks are limited plz ? I've found one tutorial but it's a video, and i'm not found of this support, especially because English is not my native language ...
Basically what we know is:
- If you put too much stuff into a bank it will crash the map during loading.
- Right now you can save strings with at max 817 letters in a bank before it crashes your map.
- However, the maximum size for a string is 781. So you can have one string of size 781 and one of size 36 (781+36=817).
- If you try to save things like Units in a bank, it will use up A LOT of space.
- I think banks don't have limits when you're not playing online. But over battle.net these limits apply.
- In phase 1 the maximum limit of 817 was for ALL PLAYERS TOGETHER. We couldn't test whether it is still like this.
Sorry I did this on the train, my internet connection was too sloppy for me to check :( Anyway it's fixed!
The packing is already optimal given that you use all the values from the range equally. However this is not happening in real life applications. Therefore using the huffman algorithm will yield the best possible output since it's hard to know the repartition (LZW is great because we know that the recurring values are stored on 7 bits).
I just don't understand blizzard's thinking sometime, they've delt with crap like this before, you'd think they'd figure some shit out or listen to us?
Cause this is a pain in the fucking ass.
Blizzard: "We've created a fantastic new method for saving data between maps. By data we of course mean like 8 alphanumeric digits. That should be plenty right?"
@Aenigma: Go
Now it is: http://forums.sc2mapster.com/development/galaxy-scripting-and-trigger-lib/5091-library-starcode-v1-0/ ;D
@s3rius: Go
Awesome vjeux.
For my RPG I still have a bit of room so I'm okay... but I just stored each players level/abils/items/etc. in a 95-character string code. Of course, it has some anti-cheat stuff inserted into it... with 6 players in my rpg thats 6x95-char strings... I think I'm safe :D
It's funny I should say that and like a couple hours later S3rius and friends release a tool that literally converts my 8 digit stored data example into like 700 digits of useable data haha.
Hey guys do you know the official amount that will load to the map? It seems like if I try to load too many variables the map wont load, but if I load too many characters, the map loads fine but the string gets cut off.
Are the 817 characters shared between all 16 players? We should be able to test this out pretty easily if any of you want to give it a go, I have a map that does nothing but -save and -load stuff. I'm Char name: Hoki Code: 118
I've discovered some things that I noticed when I was not getting the 36 character limit on the second string. My first attempt to maximize characters in the file was with a file named "tacoSaveLoad.SC2Bank", section name "tacos", and the key names "taco" and "taco2". I was able to cram 827 characters total into the bank.
Curious that my total did not match your 817, I wondered if it may be a difference in our section and key names. (I was planning to see if we could exploit those names for data purposes anyways) I renamed my section name to "a" and key names to "b" and "c". I was then able to cram in 838 characters.
Still curious about whether or not the game was counting characters, or examining the size of the file, I renamed my bank to "a.SC2Bank", and was able to fit in an additional 11 characters for a total of 849 characters. The size of my file when maxxed out is 1.07KB(1,102bytes) and 4.00KB on disk (4,096bytes on disk).
What region are you on? If you're in Europe I'd gladly help figuring out whether the limit goes for one or both players.
Also, the banks are probably counted in size, not the actual chars in it. Exactly 4kb sounds like the magic limit.
@s3rius: Go
I can't make any sense of it. If I add another key value which adds 38 characters in itself, it only reduces the amount of characters I can fit in by 9, and the size of the file grows well beyond the size of the file if I were to only have 2 keys and only add 1 more character, which breaks the map. However
2 keys = smaller file size, fewer characters, breaks the map at a lower threshold. More usable characters. 3 keys = larger file size, more characters, breaks the map at a higher threshold. Fewer usable characters.
If I add as many whitespace, tabs, comments, return characters and the game doesn't blink an eye, it just loads fine. Pushed the size of the file up to 17.9KB(18,363bytes) and 20.0KB on disk (20,480bytes on disk) with comments and the file loads. It seems as if it may be measuring the size of the file in memory perhaps? Maybe its measuring the size of the data after serialization?
I'm in EST USA btw, can we play together?
Bank Files:
4kB is limit for bank files of all player all together.
If you want to use starcode to its full potential:
First of all check size of each created string before compresion. You shouldn't try to create strings longer than 100 chars. (to avoid trigger execution time limit during decompresion) Check it by storing maximum value of value you are storing to get maximum string possible.
Create as many of uncompresed strings as you need and compres them one after another.
Remove "space" from characters used by starcode.
Create one big string from all strings you have created adding "space" between them.
Save it as a one string in bank file. (remember to name bank section with only one char becouse it takes space too)
When you load just load this one big string and extract each of substing using "get word" function.
Decompres each sting separatly and here you go.
Remmember to decompres each string in separate trigger to avoid trigger time execution limit.
When you create Bank file check it's size. Multiply it by amount of players. And you can check if you will fit into 4kB limit.
Do the length of the Key and Section Names count towards the limit of the bank file that can be loaded?
edit: Sorry, reading above it appears that it is.
Sadly we can't. Different regions.
Maybe someone else can help you test it.
I want to make a minor correction, that is that we have 6 bits per character to work with, not 5. (2^5 = 32, not 64)
Also I wanted to offer this exhaustive list of usable characters (to my knowledge), 90 in total. Mind the space near the end:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz!@#$%^*()_+-=[]\;,./{}|: `?
@TheZizz: Go U can use all ascii char...