How to generate intermediate files with clang/gcc to read Ruby's MRI implementation

I am currently diving into the MRI implementation of Ruby. It is really interesting to see how your beloved language concepts are actually implemented in plain C. The MRI code is actually pretty easy to read and you can read the implementations for Array or Hash to get an idea.

If you already read a bit C code, you probably noticed tons of #define directives preferably at the beginning of header files. These configure mostly platform specific things. I came to this point when I digged through the array implemention and found this line:

# array.c https://github.com/ruby/ruby/blob/439224a5904411b288e441096e21a41244ddd1d6/array.c#L25
VALUE rb_cArray;

This shows a custom type definition in C. To know how VALUE is defined, I had to read the ruby.h file:

# include/ruby/ruby.h https://github.com/ruby/ruby/blob/439224a5904411b288e441096e21a41244ddd1d6/include/ruby/ruby.h#L103-L124

#if defined HAVE_UINTPTR_T && 0
typedef uintptr_t VALUE;
typedef uintptr_t ID;
# define SIGNED_VALUE intptr_t
# define SIZEOF_VALUE SIZEOF_UINTPTR_T
# undef PRI_VALUE_PREFIX
#elif SIZEOF_LONG == SIZEOF_VOIDP
typedef unsigned long VALUE;
typedef unsigned long ID;
# define SIGNED_VALUE long
# define SIZEOF_VALUE SIZEOF_LONG
# define PRI_VALUE_PREFIX "l"
#elif SIZEOF_LONG_LONG == SIZEOF_VOIDP
typedef unsigned LONG_LONG VALUE;
typedef unsigned LONG_LONG ID;
# define SIGNED_VALUE LONG_LONG
# define LONG_LONG_VALUE 1
# define SIZEOF_VALUE SIZEOF_LONG_LONG
# define PRI_VALUE_PREFIX PRI_LL_PREFIX
#else
# error ---->> ruby requires sizeof(void*) == sizeof(long) or sizeof(LONG_LONG) to be compiled. <<----
#endif

VALUE could be one of uintptr_t, unsigned long or unsigned LONG_LONG. To find out what it actually is, I could dig deeper in the code and see how all the ifs resolve on my platform. But to make it easier for myself I can also use the compiler to create so called intermediate files which already set all the defines.

I managed to generate a ruby.i file on OSX with the following command:

clang -I .ext/include/x86_64-darwin15/ -I include/ -x c -S -save-temps include/ruby/ruby.h

The -x c was necessary for me (probably not with gcc), otherwise clang fails with the following error:

error: invalid value 'c-header-cpp-output' in '-x c-header-cpp-output'

After adding this option, clang generated the i-file which contains the actual defines:

# 110 "include/ruby/ruby.h"
typedef unsigned long VALUE;

The full ruby.i file is in this gist.

Yay! Now I can continue reading the source code of MRI.

Note: Since this is a general feature of the compiler, you can do it with every C code you want to understand better.

Happy reading!