Tip o’the Day™ 4: Take Advantage of unions

For those that write in C/C++/Objective-C, you might already know that unions are a good way to store data efficiently. Using the power of unions, you can leverage their functionality by accessing data more easily. Here’s a Tip o’the Day to show you how.

A Refresher

If you haven’t used unions before, the description you’ll find online will be something like this: unions are are C/C++ data structure that can store items of different types, but can store only one of them at a time.

For me, this is both misleading and a little inaccurate. Calling it a ‘data structure’ is confusing because ‘structures’ (that is to say, structs) already exist. To me, unions are quite different than structs, and as we’ll see in a minute, saying that “only one of them can be stored at a time” is misleading.

The description is true when you have ‘normal’ union use. ‘Normal’ union use is something like this:

typedef union
{
    char   ch;
    int    i;
    long   l;
    float  f;
    double d;
} my_union_type;

my_union_type    my_variable;

my_variable.ch = 1;
my_variable.i  = 2;

printf("ch = %d\n", my_variable.ch);

When you declare a variable of this type, all of the members occupy the same place in memory. So the output of the above code is this:

ch = 2

Not 1 like you might expect.

When you do a sizeof() on a union’d type, it’s always the largest member of that union. For example, with my_union_type, the size is 8 because sizeof(double) is 8.

Exploitation

You can take advantage of this behavior by creating a special ‘merged’ data type that is nothing but unions. Have a look at the code below to see what I mean:

typedef union
{
    u32     m_u32;
    s32     m_s32;

    void *  m_ptr;

    struct
    {
        u16 m_DC;
        u16 m_BA;
    } m_u16;

    struct
    {
        s16 m_DC;
        s16 m_BA;
    } m_s16;

    struct
    {
        u8  m_D;
        u8  m_C;
        u8  m_B;
        u8  m_A;
    } m_u8;

    struct
    {
        s8  m_D;
        s8  m_C;
        s8  m_B;
        s8  m_A;
    } m_s8;
} merged32;

This is a special 32bit data type that I created to make it very easy for me to access certain parts of those 32bits without resorting to bitmasking and shifting (which can be quite error-prone). Here’s an example to show you how easy this becomes:

merged32 my_merged_data;

my_merged_data.m_u32 = 0xC0DEF00D;
printf("m_u32    = 0x%08X\n\n", my_merged_data.m_u32);

printf("m_u8.m_A = 0x%02X\n", my_merged_data.m_u8.m_A);
printf("m_u8.m_B = 0x%02X\n", my_merged_data.m_u8.m_B);
printf("m_u8.m_C = 0x%02X\n", my_merged_data.m_u8.m_C);
printf("m_u8.m_D = 0x%02X\n\n", my_merged_data.m_u8.m_D);

my_merged_data.m_u8.m_C = 0xD0;
printf("m_u32    = 0x%08X\n", my_merged_data.m_u32);

Gives the following output:

m_u32    = 0xC0DEF00D

m_u8.m_A = 0xC0
m_u8.m_B = 0xDE
m_u8.m_C = 0xF0
m_u8.m_D = 0x0D

m_u32    = 0xC0DED00D

Because I have defined a data member in every position of the entire 32bits, I can access any 8 bits very easily.

A Caveat

One thing you have to keep in mind when using unions is the endianness of the hardware the code is running on. In my example, notice that I have to change m_C in order to get D00D. If the code was running on a Motorola CPU, I would have to change m_B because the order of the bytes is reversed.

The Code

Here is the full code for you to use as you wish. I have made a few other typedefs as well as other merged data types.

#if !defined(__MERGEDTYPE_H__)
#define __MERGEDTYPE_H__

#if !defined(s32)
    typedef signed long     s32;
#endif
#if !defined(u32)
    typedef unsigned long   u32;
#endif
#if !defined(s16)
    typedef signed short    s16;
#endif
#if !defined(u16)
    typedef unsigned short  u16;
#endif
#if !defined(s8)
    typedef signed char     s8;
#endif
#if !defined(u8)
    typedef unsigned char   u8;
#endif


typedef union
{
    u32     m_u32;
    s32     m_s32;

    void *  m_ptr;

    struct
    {
        u16 m_DC;
        u16 m_BA;
    } m_u16;

    struct
    {
        s16 m_DC;
        s16 m_BA;
    } m_s16;

    struct
    {
        u8  m_D;
        u8  m_C;
        u8  m_B;
        u8  m_A;
    } m_u8;

    struct
    {
        s8  m_D;
        s8  m_C;
        s8  m_B;
        s8  m_A;
    } m_s8;
} merged32;

typedef union
{
    u16 m_u16;
    s16 m_s16;

    struct
    {
        u8  m_B;
        u8  m_A;
    } m_u8;

    struct
    {
        s8  m_B;
        s8  m_A;
    } m_s8;
} merged16;

typedef union
{
    u8  m_u8;
    s8  m_s8;
} merged8;

#endif  // __MERGEDTYPE_H__
Tip o’the Day™ 4: Take Advantage of unions