Where the printf() Rubber Meets the Road
After ignoring StackOverflow for a while, I decided to check up on it a bit lately. Someone asked a question that’s one of those kind of fundamental curiosity issues that I enjoy explaining. He said:
I always thought that functions like printf() are in the last step defined using inline assembly. That deep into stdio.h is buried some asm code that actually tells CPU what to do. Something like in dos, first mov beginning of the string to some memory location or register and than call some int. But since x64 version of Visual Studio doesn’t support inline assembler at all, it made me think that there are really no assembler-defined functions in C/C++. So, please, how is for example printf() defined in C/C++ without using assembler code? What actually executes the right software interrupt?
Obviously the answer is going to depend on the implementation. Yet I thought that with the open-sourced GNU C Library, it would be pretty straightforward to show how most of it is in C but it bottoms out at syscall. But it really was quite a maze to connect all the dots without doing any hand-waving! So I found that my explanation just kept growing until it was so long that a blog entry was a more fitting format.
So read on, fearless explorers, as we dig into the complicated answer to a seemingly simple question…
First Steps
We’ll of course start with the prototype for printf, which is defined in the file libc/libio/stdio.h
extern int printf (__const char *__restrict __format, ...);
You won’t find the source code for a function called printf, however. Instead, in the file /libc/stdio-common/printf.c you’ll find a little bit of code associated with a function called __printf:
int __printf (const char *format, ...) { va_list arg; int done; va_start (arg, format); done = vfprintf (stdout, format, arg); va_end (arg); return done; }
A macro in the same file sets up an association so that this function is defined as an alias for the non-underscored printf:
ldbl_strong_alias (__printf, printf);
It makes sense that printf would be a thin layer that calls vfprintf with stdout. Indeed, the meat of the formatting work is done in vfprintf, which you’ll find in libc/stdio-common/vfprintf.c. It’s quite a lengthy function, but you can see that it’s still all in C!
Deeper Down the Rabbit Hole…
vfprintf mysteriously calls outchar and outstring, which are weird macros defined in the same file:
#define outchar(Ch) \ do \ { \ register const INT_T outc = (Ch); \ if (PUTC (outc, s) == EOF || done == INT_MAX) \ { \ done = -1; \ goto all_done; \ } \ ++done; \ } \ while (0)
Sidestepping the question of why it’s so weird, we see that it’s dependent on the enigmatic PUTC, also in the same file:
#define PUTC(C, F) IO_putwc_unlocked (C, F)When you get to the definition of IO_putwc_unlocked in libc/libio/libio.h, you might start thinking that you no longer care how printf works:
#define _IO_putwc_unlocked(_wch, _fp) \ (_IO_BE ((_fp)->_wide_data->_IO_write_ptr \ >= (_fp)->_wide_data->_IO_write_end, 0) \ ? __woverflow (_fp, _wch) \ : (_IO_wint_t) (*(_fp)->_wide_data->_IO_write_ptr++ = (_wch)))
But despite being a little hard to read, it’s just doing buffered output. If there’s enough room in the file pointer’s buffer, then it will just stick the character into it…but if not, it calls __woverflow. Since the only option when you’ve run out of buffer is to flush to the screen (or whatever device your file pointer represents), we can hope to find the magic incantation there.
Vtables in C?
If you guessed that we’re going to hop through another frustrating level of indirection, you’d be right. Look in libc/libio/wgenops.c and you’ll find the definition of __woverflow:
wint_t __woverflow (f, wch) _IO_FILE *f; wint_t wch; { if (f->_mode == 0) _IO_fwide (f, 1); return _IO_OVERFLOW (f, wch); }
Basically, file pointers are implemented in the GNU standard library as objects. They have data members but also function members which you can call with variations of the JUMP macro. In the file libc/libio/libioP.h you’ll find a little documentation of this technique:
/* THE JUMPTABLE FUNCTIONS.
* The _IO_FILE type is used to implement the FILE type in GNU libc,
* as well as the streambuf class in GNU iostreams for C++.
* These are all the same, just used differently.
* An _IO_FILE (or FILE) object is allows followed by a pointer to
* a jump table (of pointers to functions). The pointer is accessed
* with the _IO_JUMPS macro. The jump table has a eccentric format,
* so as to be compatible with the layout of a C++ virtual function table.
* (as implemented by g++). When a pointer to a streambuf object is
* coerced to an (_IO_FILE*), then _IO_JUMPS on the result just
* happens to point to the virtual function table of the streambuf.
* Thus the _IO_JUMPS function table used for C stdio/libio does
* double duty as the virtual function table for C++ streambuf.
*
* The entries in the _IO_JUMPS function table (and hence also the
* virtual functions of a streambuf) are described below.
* The first parameter of each function entry is the _IO_FILE/streambuf
* object being acted on (i.e. the 'this' parameter).
*/So when we find IO_OVERFLOW in libc/libio/genops.c, we find it’s a macro which calls a “1-parameter” __overflow method on the file pointer:
#define IO_OVERFLOW(FP, CH) JUMP1 (__overflow, FP, CH)The jump tables for the various file pointer types are in libc/libio/fileops.c
const struct _IO_jump_t _IO_file_jumps = { JUMP_INIT_DUMMY, JUMP_INIT(finish, INTUSE(_IO_file_finish)), JUMP_INIT(overflow, INTUSE(_IO_file_overflow)), JUMP_INIT(underflow, INTUSE(_IO_file_underflow)), JUMP_INIT(uflow, INTUSE(_IO_default_uflow)), JUMP_INIT(pbackfail, INTUSE(_IO_default_pbackfail)), JUMP_INIT(xsputn, INTUSE(_IO_file_xsputn)), JUMP_INIT(xsgetn, INTUSE(_IO_file_xsgetn)), JUMP_INIT(seekoff, _IO_new_file_seekoff), JUMP_INIT(seekpos, _IO_default_seekpos), JUMP_INIT(setbuf, _IO_new_file_setbuf), JUMP_INIT(sync, _IO_new_file_sync), JUMP_INIT(doallocate, INTUSE(_IO_file_doallocate)), JUMP_INIT(read, INTUSE(_IO_file_read)), JUMP_INIT(write, _IO_new_file_write), JUMP_INIT(seek, INTUSE(_IO_file_seek)), JUMP_INIT(close, INTUSE(_IO_file_close)), JUMP_INIT(stat, INTUSE(_IO_file_stat)), JUMP_INIT(showmanyc, _IO_default_showmanyc), JUMP_INIT(imbue, _IO_default_imbue) }; libc_hidden_data_def (_IO_file_jumps)
There’s also a #define which equates _IO_new_file_overflow with _IO_file_overflow, and the former is defined in the same source file. (Note: INTUSE is just a macro which marks functions that are for internal use, it doesn’t mean anything like “this function uses an interrupt”)
Are we there yet?!
The source code for _IO_new_file_overflow does a bunch more buffer manipulation, but it does call _IO_do_flush:
#define _IO_do_flush(_f) \ INTUSE(_IO_do_write)(_f, (_f)->_IO_write_base, \ (_f)->_IO_write_ptr-(_f)->_IO_write_base)
We’re now at a point where _IO_do_write is probably where the rubber actually meets the road: an unbuffered, actual, direct write to an I/O device. At least we can hope! It is mapped by a macro to _IO_new_do_write and we have this:
static _IO_size_t new_do_write (fp, data, to_do) _IO_FILE *fp; const char *data; _IO_size_t to_do; { _IO_size_t count; if (fp->_flags & _IO_IS_APPENDING) /* On a system without a proper O_APPEND implementation, you would need to sys_seek(0, SEEK_END) here, but is is not needed nor desirable for Unix- or Posix-like systems. Instead, just indicate that offset (before and after) is unpredictable. */ fp->_offset = _IO_pos_BAD; else if (fp->_IO_read_end != fp->_IO_write_base) { _IO_off64_t new_pos = _IO_SYSSEEK (fp, fp->_IO_write_base - fp->_IO_read_end, 1); if (new_pos == _IO_pos_BAD) return 0; fp->_offset = new_pos; } count = _IO_SYSWRITE (fp, data, to_do); if (fp->_cur_column && count) fp->_cur_column = INTUSE(_IO_adjust_column) (fp->_cur_column - 1, data, count) + 1; _IO_setg (fp, fp->_IO_buf_base, fp->_IO_buf_base, fp->_IO_buf_base); fp->_IO_write_base = fp->_IO_write_ptr = fp->_IO_buf_base; fp->_IO_write_end = (fp->_mode <= 0 && (fp->_flags & (_IO_LINE_BUF+_IO_UNBUFFERED)) ? fp->_IO_buf_base : fp->_IO_buf_end); return count; }
Sadly we’re stuck again… _IO_SYSWRITE is doing the work:
/* The 'syswrite' hook is used to write data from an existing buffer to an external file. It generalizes the Unix write(2) function. It matches the streambuf::sys_write virtual function, which is specific to this implementation. */ typedef _IO_ssize_t (*_IO_write_t) (_IO_FILE *, const void *, _IO_ssize_t); #define _IO_SYSWRITE(FP, DATA, LEN) JUMP2 (__write, FP, DATA, LEN) #define _IO_WSYSWRITE(FP, DATA, LEN) WJUMP2 (__write, FP, DATA, LEN)
So inside of the do_write we call the write method on the file pointer. We know from our jump table above that is mapped to _IO_new_file_write, so what’s that do?
_IO_ssize_t _IO_new_file_write (f, data, n) _IO_FILE *f; const void *data; _IO_ssize_t n; { _IO_ssize_t to_do = n; while (to_do > 0) { _IO_ssize_t count = (__builtin_expect (f->_flags2 & _IO_FLAGS2_NOTCANCEL, 0) ? write_not_cancel (f->_fileno, data, to_do) : write (f->_fileno, data, to_do)); if (count < 0) { f->_flags |= _IO_ERR_SEEN; break; } to_do -= count; data = (void *) ((char *) data + count); } n -= to_do; if (f->_offset >= 0) f->_offset += n; return n; }
Now it just calls write! Well where is the implementation for that? You’ll find write in libc/posix/unistd.h:
/* Write N bytes of BUF to FD. Return the number written, or -1. This function is a cancellation point and therefore not marked with __THROW. */ extern ssize_t write (int __fd, __const void *__buf, size_t __n) __wur;
(Note: __wur is a macro for __attribute__ ((__warn_unused_result__)))
Functions Generated From a Table
That’s only a prototype for write. You won’t find a write.c file for Linux in the GNU standard library. Instead, you’ll find platform-specific methods of connecting to the OS write function in various ways, all in the libc/sysdeps/ directory.
We’ll keep following along with how Linux does it. There is a file called sysdeps/unix/syscalls.list which is used to generate the write function automatically. The relevant data from the table is:
- File name: write
- Caller: “-” (i.e. Not Applicable)
- Syscall name: write
- Args: Ci:ibn
- Strong name: __libc_write
- Weak names: __write, write
Not all that mysterious, except for the Ci:ibn. The C means “cancellable”. The colon separates the return type from the argument types, and if you want a deeper explanation of what they mean then you can see the comment in the shell script which generates the code, libc/sysdeps/unix/make-syscalls.sh.
So now we’re expecting to be able to link against a function called __libc_write which is generated by this shell script. But what’s being generated? Some C code which implements write via a macro called SYS_ify, which you’ll find in sysdeps/unix/sysdep.h
#define SYS_ify(syscall_name) __NR_##syscall_nameAh, good old token-pasting :P. So basically, the implementation of this __libc_write becomes nothing more than a proxy invocation of the syscall function with a parameter named __NR_write, and the other arguments.
Where The Sidewalk Ends…
I know this has been a fascinating journey, but now we’re at the end of GNU libc. That number __NR_write is defined by Linux. For 32-bit X86 architectures it will get you to linux/arch/x86/include/asm/unistd_32.h:
#define __NR_write 4The only thing left to look at, then, is the implementation of syscall. Which I may do at some point, but for now I’ll just point you over to some references for how to add a system call to Linux.

March 21st, 2010 at 4:00 pm
Hi, well, I originally asked this question on stackoverflow. And this is exactly the answer I was looking for. Thank you.
April 1st, 2010 at 4:00 pm
That’s quite an interesting journey. I’ve been wondering about this too. Thanks!
May 14th, 2010 at 4:09 am
Excellent answer!
However, I think you made the wrong decision to publish it only on your blog instead of on StackOverflow as well. The whole point of StackOverflow is to collect questions and answers in one place, not have a link collection to answers that will eventually disappear in the next blog redesign/not be properly cached/archived.
There is no real limit to the size of a posted answer, not everything can be explained in short stubs so feel free to use as many chars as you need. (The vim-question on SO is a great example of a highly appriciated [some might even say epic] answer spanning multiple screen pages.) StackOverflow users are not daunted by the size of a reply, they are there to learn.
Please edit your answer on StackOverflow to include the full blog posting.
Again; Excellent blog post!
May 14th, 2010 at 4:50 am
I think this answer is a) misleading, b) needlessly complicated.
a) The very next step which you ommited is the implementation of syscall, which on x86 is most commonly: mov , %eax mov <second argument, %ebx …. int $0×80 So, at the depth you reached, which is still in libc and hasn’t “met the road” yet it’s all C, but the very next step is to interrupt the CPU in order to transfer control to the kernel, which must obviously be done with assembly.
b) GNU libc is maybe the *most* complicated libc out there. You could find much simpler libc implementations to describe how would one implement printf on a modern operating system. And a toy example implementation could be 2-3 lines of code plus vsprintf.
May 14th, 2010 at 6:40 am
I wonder why anyone would be interested in maintaining that source, seems like a nightmare project. I couldn’t read your whole explanation, it got way too deep too quickly (not suggesting it’s your fault, the code is just too snakey).
Reminds me of Visual Studio 2010 which is supposed to have 1.5 million source files (not lines of code, files) and takes 16 hours to build unsigned (or 61 for a release build).
Never having worked on source of this scale I can’t comprehend how people manage to stay motivated and keep it bug free.
May 14th, 2010 at 7:13 am
IIRC, there’s some magic that happens in GCC to do some optimizations. e.g. printing a constant string vs. one with format chars, etc.
I learned this when trying to target an embedded platform without specifying some build flag to gcc in the pre-3.0 days.
May 14th, 2010 at 7:25 am
Very Interesting. Thanks.
May 14th, 2010 at 8:24 am
uclibc is a better version to see, glbic is insanely complicated. the uclibc version is somewhat crazy as well, but that’s because printf is a fairly complicated function with all the conversions built into it, but is basically contained in a single file.
you can see this in other places with how glibc does crazy stuff when doing simple things such as wrapping system calls.
May 14th, 2010 at 8:30 am
This is crazy. I am sticking with managed development to stay out of this hell you call system programming…
May 14th, 2010 at 8:52 am
I really did like assembler programming back in the mid-1990s…
May 14th, 2010 at 9:55 am
Ahh, Nuclear…
If only it was that simple. Your computer doesn’t run assembly: Assembly gets translated to Machine code. Some C compilers (very rare though) may even skip over assembly and go straight to machine code.
The thing is, most modern processors don’t even run machine code: it is translated into processor microcode, which (usually) actually executes on the processor. Granted, the microcode translation is usually done ON the processor itself.
So the answer really depends on what level of the system you’re looking at, and how far down the rabbit hole you want to go. He stopped at the C libraries: printf is not inline assembly in the C libraries: the assembly is in the Kernel (at least under linux). As a library implementer, you don’t have to know assembly to program printf(). You go one step further, and say that to implement the functionality at the kernel level, you need to know assembly. I’m going one step further and saying in order to get your assembly to run on the processor you have to know machine code, and to implement the assembly on the processor you have to understand microcode.
In the end, it gets converted into electrons running around little electrical paths… you could say in the end to implement printf() you have to know physics