For those that write in C/C++/Objective-C, you might already know that unions are a good way to store data efficiently. Using the power of unions, you can leverage their functionality by accessing data more easily. Here’s a Tip o’the Day to show you how.
A Refresher
If you haven’t used unions before, the description you’ll find online will be something like this: unions are are C/C++ data structure that can store items of different types, but can store only one of them at a time.
For me, this is both misleading and a little inaccurate. Calling it a ‘data structure’ is confusing because ‘structures’ (that is to say, structs) already exist. To me, unions are quite different than structs, and as we’ll see in a minute, saying that “only one of them can be stored at a time” is misleading.
The description is true when you have ‘normal’ union use. ‘Normal’ union use is something like this:
typedef union { char ch; int i; long l; float f; double d; } my_union_type; my_union_type my_variable; my_variable.ch = 1; my_variable.i = 2; printf("ch = %d\n", my_variable.ch);
When you declare a variable of this type, all of the members occupy the same place in memory. So the output of the above code is this:
ch = 2
Not 1 like you might expect.
When you do a sizeof() on a union’d type, it’s always the largest member of that union. For example, with my_union_type, the size is 8 because sizeof(double) is 8.
Exploitation
You can take advantage of this behavior by creating a special ‘merged’ data type that is nothing but unions. Have a look at the code below to see what I mean:
typedef union { u32 m_u32; s32 m_s32; void * m_ptr; struct { u16 m_DC; u16 m_BA; } m_u16; struct { s16 m_DC; s16 m_BA; } m_s16; struct { u8 m_D; u8 m_C; u8 m_B; u8 m_A; } m_u8; struct { s8 m_D; s8 m_C; s8 m_B; s8 m_A; } m_s8; } merged32;
This is a special 32bit data type that I created to make it very easy for me to access certain parts of those 32bits without resorting to bitmasking and shifting (which can be quite error-prone). Here’s an example to show you how easy this becomes:
merged32 my_merged_data; my_merged_data.m_u32 = 0xC0DEF00D; printf("m_u32 = 0x%08X\n\n", my_merged_data.m_u32); printf("m_u8.m_A = 0x%02X\n", my_merged_data.m_u8.m_A); printf("m_u8.m_B = 0x%02X\n", my_merged_data.m_u8.m_B); printf("m_u8.m_C = 0x%02X\n", my_merged_data.m_u8.m_C); printf("m_u8.m_D = 0x%02X\n\n", my_merged_data.m_u8.m_D); my_merged_data.m_u8.m_C = 0xD0; printf("m_u32 = 0x%08X\n", my_merged_data.m_u32);
Gives the following output:
m_u32 = 0xC0DEF00D m_u8.m_A = 0xC0 m_u8.m_B = 0xDE m_u8.m_C = 0xF0 m_u8.m_D = 0x0D m_u32 = 0xC0DED00D
Because I have defined a data member in every position of the entire 32bits, I can access any 8 bits very easily.
A Caveat
One thing you have to keep in mind when using unions is the endianness of the hardware the code is running on. In my example, notice that I have to change m_C in order to get D00D. If the code was running on a Motorola CPU, I would have to change m_B because the order of the bytes is reversed.
The Code
Here is the full code for you to use as you wish. I have made a few other typedefs as well as other merged data types.
#if !defined(__MERGEDTYPE_H__) #define __MERGEDTYPE_H__ #if !defined(s32) typedef signed long s32; #endif #if !defined(u32) typedef unsigned long u32; #endif #if !defined(s16) typedef signed short s16; #endif #if !defined(u16) typedef unsigned short u16; #endif #if !defined(s8) typedef signed char s8; #endif #if !defined(u8) typedef unsigned char u8; #endif typedef union { u32 m_u32; s32 m_s32; void * m_ptr; struct { u16 m_DC; u16 m_BA; } m_u16; struct { s16 m_DC; s16 m_BA; } m_s16; struct { u8 m_D; u8 m_C; u8 m_B; u8 m_A; } m_u8; struct { s8 m_D; s8 m_C; s8 m_B; s8 m_A; } m_s8; } merged32; typedef union { u16 m_u16; s16 m_s16; struct { u8 m_B; u8 m_A; } m_u8; struct { s8 m_B; s8 m_A; } m_s8; } merged16; typedef union { u8 m_u8; s8 m_s8; } merged8; #endif // __MERGEDTYPE_H__