Pointer magic for efficient dynamic value representations
I think it’s worth the time to look at those neat little tricks for two reasons: Firstly, some of them are also applicable in other situations (though most of them admittedly are not). And secondly, because they remind you of how various data is represented at the low level (stuff you may have learned in your CS courses at some point - or maybe not).
So, let’s start!
The trivial approach: Tagged unions
As you can see the idea is very simple. You have a type tag, which specifies which type the value currently has. And a union that’s used to store the various types. Examples to store integers and objects would look like that:
The problem with this straightforward approach is that the size of the above object would be 16 bytes = 128 bits (on a 64 bit machine), even though the actual information content is much lower (any individual value can be stored in 64 bits or less). Especially if this value structure is stored in an array, the overhead quickly adds up. Even outside of arrays, simply passing around this structure will already take up two 64 bit registers, and so on.
So what we want is to make values smaller, ideally reducing them to just the size of the payload. Before we get to that for the general case, let’s first have a look at a technique called pointer tagging:
An interlude: Pointer tagging
Have a look at the above structure again. It consists of a union, which has a size of 8 bytes, and an integer which could fit into a single byte (a
char). So theoretically the total size should be 9 bytes. But it’s not - it’s 16 bytes. This is because the compiler adds padding to the data structure in order to align it in memory.
This is done for performance reasons: The CPU can access the memory only in word sized chunks. So if our data always starts at a word it can be fetched efficiently. If it were to start somewhere in the middle of a word the CPU would have to apply masks and shifts in order to get the data - which would be too slow.
A word on a 64 bit machine is 64 bit = 8 bytes (obviously ^^). So if all pointers to our object will be aligned to 8 bytes it means that they will be multiples of 8. A pointer thus can be 8, 16, 24, 109144, etc. But it can not be 7 or 13. Or speaking in binary: 0b1000 (=8), 0b10000 (=16), 0b11000 (=24), 0b11010101001011000 (=109144).
As you can see the lower three bits are always zero. So those three bits are basically free to use. We can use them to store a “tag”, which is an integer between 0 (0b000) and 7 (0b111).
pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppTTT ^- actual pointer three tag bits -^
Here is a sample implementation of how to do so:
The code is fairly straightforward I think. You could then use it like this:
The technique described above is useful in many contexts. Basically it’s applicable to any situation where you want to add a small bit of info to an aligned pointer. And this is also the situation we are in :)
But before we get to that let’s first have a look at another, very similar trick:
Storing integers in the pointer
Value class above I have a distinct int32 type and not just a double type.
Especially for integers the above
Value approach seems like overkill: The 16 byte
Value structure only stores 4 bytes of actual data, one byte of type information, and the remaining 11 bytes being padding.
The solution obviously is to use something similar to the tagged pointers introduced above. You can say: “If the last bit of the pointer is 1 then it actually isn’t a pointer at all, but it’s an integer.”
Here is how a pointer would look like:
pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppTT0 ^- actual pointer two tag bits -^ ^- last bit 0
And this is how the integer would look like (note that as we need one bit for distinguishing the type the integer has only 63 bits):
iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiii1 ^- actual integer ^- last bit 1
To get the actual integer value we just need to shift off the 1 bit using
integer >> 1.
Here again a sample implementation:
Now using the above technique we have a powerful way to optimize the original Value class. Integers will be stored directly in the pointer; doubles, strings and objects will be stored as a pointer with a tag identifying their type (e.g 0 = 0b00 = object, 1 = 0b01 = string, 2 = 0b10 = double). Booleans could be stored similarly to ints (and fetched using a simple
bool >> 3):
00000000|00000000|00000000|00000000|00000000|00000000|00000000|0000b110 actual bool value -^ ^- last bit 0 (as it isn't an int) tag = 3 = 0b11 to identify that it's a bool -^^
So what have we gained overall? Quite a bit: The size of the value structure is back to 64 bits, making it half as large. However, there is also a cost: Doubles now need to be stored as a pointer, which also implies that they have to be allocated on the heap. As such, we have improved the situation for most types, but introduced a large regression for doubles.
Wouldn’t it be nice if we could avoid that heap allocation for doubles as well?
Pointers in the NaN-Space
Recap: How do doubles look like?
Doubles in JS are following the IEEE 754 Standard for Floating-Point Arithmetic. In C++ doubles aren’t specified to follow any standard, but we assume that they use IEEE 754, too (which, practically speaking is a pretty good assumption. In general this whole article is nothing more than a large set of evil assumptions that any self-respecting C++ developer would murder you for.)
As you probably know IEEE 754 doubles are internally represented using the following bit sequence:
seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm ^- exponent ^- mantissa ^- sign bit
The resulting float is basically (-1)^s * m * 2^e (speaking oversimplifyingly).
What is much more important to us are some special values that doubles can have:
Positive zero (0.0): seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm 00000000|00000000|00000000|00000000|00000000|00000000|00000000|00000000 ^- sign bit 0 (= +), all other bits also 0 Negative zero (-0.0): seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm 10000000|00000000|00000000|00000000|00000000|00000000|00000000|00000000 ^- sign bit 1 (= -), all other 0
IEEE defines two zeros: +0 and -0. Both have exponent and mantissa zeroed out and are distinguished by the value of the sign bit. This may seem pointless at first - zero is zero after all - but it allows for some subtle distinctions. For example 1.0 / 0.0 is +INF whereas 1.0 / -0.0 is -INF. There are also other operations where the sign of the zero makes a difference.
One interesting behavior of the zeros is that they are considered equal in comparison. I.e. 0.0 == -0.0. The only way to find out whether a number is negative zero is to check whether the sign bit is set:
Positive infinity (+INF): seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm 01111111|11110000|00000000|00000000|00000000|00000000|00000000|00000000 ^- exponent bits all 1 mantissa bits all 0 -^ ^- sign bit 0 (= +) Negative infinity (-INF): seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm 11111111|11110000|00000000|00000000|00000000|00000000|00000000|00000000 ^- exponent bits all 1 mantissa bits all 0 -^ ^- sign bit 1 (= -)
Infinity has all exponent bits set and a zero mantissa. Again there is +INF and -INF, distinguished by the sign bit.
Signaling NaN: seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm s1111111|11110ppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp ^- first mantissa bit 0 everything else is "payload" -^ ^- exponent bits all 1 and mustn't be all-zero (as it ^- any sign bit would be INF then) Quiet NaN: seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm s1111111|11111ppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp ^- first mantissa bit 1 everything else is "payload" -^ ^- exponent bits all 1 ^- any sign bit
NaNs represent values that are Not a Number. E.g. 0.0/0.0 = NaN. NaNs also have interesting comparison semantics, as any comparison with NaN will be false (including NaN == NaN).
The representations of NaNs also have all exponent bits set (like infinity) but have a non-zero mantissa. The sign bit is irrelevant for NaNs.
There are two types of NaNs: Signaling NaNs (sNaN) and quiet NaNs (qNaN). They are distinguished by the first bit after the exponent: If it is 1 then the NaN is quiet; if it is 0 it is signaling. Signaling NaNs - in theory - should throw an invalid operation exception (EM_INVALID) when they are operated upon, whereas quiet NaNs should just be left alone. Practically this doesn’t seem to be used much, at least in MSVC this is disabled by default and needs to be enabled by compiler option +
Where it starts to get interesting for us is that NaNs additionally encode a 51 bit “payload” in the mantissa. This payload was originally designed to contain error information.
But deceitful as we are, we will misuse that NaN payload to stuff other things in it - like integers and pointers.
64 bit is a lie
On 64 bit architectures pointers have a size of 64 bits, obviously. But think about that again: How much is 64 bit actually? That’s 2^64 = 1.84467441 * 10^19 addressable memory bytes. That’s 16 EiB (Exabytes). Or talking in more convenient terms that’s 17179869184 Gigabytes. I don’t know about you, but I only have 16GB of memory. Some high-end server setups may have 4 TiB of memory, but even those only need a 42 bit address space (2^42 = 4 TiB).
Unsurprisingly we aren’t the first to notice this: the x86-64 architecture utilizes only the lower 48 bits (which still allows 256 TiB) of a pointer. Additionally bits 63 through 48 must be copies of bit 47. Pointers that follow this pattern are called canonical.
This leads to a strange looking address space (you can find a nicer version of this image on Wikipedia):
0xFFFFFFFF.FFFFFFFF ... <- Canonical upper "half" 0xFFFF8000.00000000 ... ... <- Noncanonical ... 0x00007FFF.FFFFFFFF ... <- Canonical lower "half" 0x00000000.00000000
The operating system normally assigns only pointers from the lower half to applications, reserving the upper half for itself. As always there are outliers - e.g. Solaris assigns addresses from the upper half under some circumstances (mmap). But we’ll just ignore that and assume that only the lower half from 0x00000000.00000000 to 0x00007FFF.FFFFFFFF is used. (Again, all these evil assumptions…)
At this point you should see what I am driving at: If pointers are really only 48 bits large they fit perfectly into our 51 bit payload space.
Here is an implementation stub for doing so (just implements doubles, ints and void pointers; a real implementation would look similar, just with more types):
Again, the code should be easy enough to understand. As an aid, here are the three consts from the top in binary:
Maximum double (qNaN with sign bit set without payload): seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm 0xfff8000000000000: 11111111|11111000|00000000|00000000|00000000|00000000|00000000|00000000 32 bit integer: seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm 0xfff9000000000000: 11111111|11111001|00000000|00000000|iiiiiiii|iiiiiiii|iiiiiiii|iiiiiiii ^- integer value Pointer: seeeeeee|eeeemmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm|mmmmmmmm 0xfffa000000000000: 11111111|11111010|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp|pppppppp ^- 48 bit pointer value
A few more considerations
The above code makes doubles accessible directly, without further operations and accesses pointers using a bit mask (that’s what Mozilla does). An alternative implementation could do it the other way around and make pointers accessible directly and doubles using an offset (i.e. you would add
0x0001000000000000 on inserting a double and subtract it when retrieving it). The latter is what WebKit does.
On 32 bit machines one can access the lower 32 bits of the 64 bit value directly. So both pointers and integers can be accessed directly in any case. On the other hand, whereas the tagged pointer with embedded integer approach would need only a 32 bit value on 32 bit machines, the NaN approach will need a 64 bit value in both 32 bit and 64 bit environments. So on 32 bit two words need to be passed around, instead of just one (that’s why Mozilla calls this approach fatvals).
Still, considering the overall wins it is worth it: Both integers and doubles (and for that matter also bools and all other “small” values) can be stored directly in the value, without accessing the heap.