Protocol Independent Programming

Author: Kazu Yamamoto
Email: kazu mew.org
Created: 26 Dec 2003
Modified: 9 Apr 2009
Keyword: IPv4, IPv6, socket API, UNIX, gethostbyname(), getaddrinfo()

Introduction

Since IPv6 is now supported in many UNIX(-like) operating systems, including MacOS/Linux/BSD variants, you may want to modify your IPv4-only program to support both IPv4 and IPv6.

You can easily find new socket API defined in RFC3493, "Basic Socket Interface Extensions for IPv6". If we use getaddrinfo() and getnameinfo(), defined in RFC3493, we can implement "protocol independent" programs. This new network programming style makes code simple and flexible.

Unfortunately it seems that it is quite difficult to understand the heart of "protocol independen" programming and how to use getaddrinfo()/getnameinfo(). Thus I worte this page so that you can understand the heart of "protocol independent" programming quickly.

Why Protocol Independent Programming?

The reasons why you should take the protocol independent progoraming style are as follows:

Simplicity: In many cases, your code becomes simpler than before. Yes, this explanation is subjective. So, please look at an example code below and see whether or not it is simple from your point of view. It is important to note that it's hard to maintain complex code but it's not difficult to maintain simple code. The simpler, the better.
Flexibility: It's quite hard to expect the future. Suppose that another new protocol, say IPv7, comes. If your program is protocol dependent, you have to modify the code to support IPv7. On the other hand, you need not to modify your code if it is written in the protocol independent programming style. When underlying functions such as getaddrinfo() come to support IPv7, your program will be automatically IPv7-ready.

Protocol Independent Client

To make a story easy, this document focuses on clients only and does not talk about servers.

The stragety of the protocol independent programming is simple:

Use getaddrinfo() instead of gethostbyname() to translate a host name to its addresses.
Even in the environment where getaddrinfo() is not provided, use your supplied getaddrinfo().

Please look at an example of protocol dependent client. You can use this command like this:

% command <server> <port>

This client resolves a list of addresses of <server> with gethostbyname(). And try to make a connection from the first address to the last on the list. If a connection is established on a server address, it reads byte stream from the server. This is a typical client code. Of course, this supports IPv4 only. Don't you feel this is complex?

We can modify this code with getaddrifno() and make it "protocol independent".

Please look at an example of protocol independent client. Basically the behavior of this clinet is the same as the protocol dependent client.

Don't you think the code become simpler than before? You should understand that this code can treat both IPv4 and IPv6.

The magic is getaddrinfo(). If "hists.ai_family" is "AF_UNSPEC", getaddrinfo() returns a list for any kinds of addresses, including IPv4 and IPv6.

You should pay attention to the socket() system call. It now locates in the "for" loop. Suppose getaddrinfo() returns a list of IPv4, IPv6, IPv4 ,... First of all, in the "for" loop, an IPv4 socket is opened and this code tries to make an IPv4 connection. If fails, the IPv4 socket is closed. The next candidate is IPv6. An IPv6 socket is opened and this code tries to make an IPv6 connection. Such and such.

Note also that the getnameinfo() is used to display each address. Since getnameinfo() is a protocol independent function, it can translate both IPv4 addresses and IPv6 addresses to human readable representation.

Packaging

To support the environment where getaddrinfo() is not provided, you should provide your own getaddrinfo() in your package. Note that gethostbyname() is used in your own getaddrinfo(). You should understand that getaddrinfo() itself is protocol dependent while code calling getaddrinfo() is protocol independent.

But implementation of getaddrinfo() is a boring job.

You can re-use a free implementation of getaddrinfo() found in Portable OpenSSH version 3. Its file name is "openbsd-compat/fake-rfc2553.c".

"configure" should check whether or not getaddrinfo() and getnameinfo() exist. For example, you should add the following lines to "configure.in" and "config.h.in" and type "autoconf":

"configure.in"::

AC_CHECK_FUNCS(getaddrinfo)
AC_CHECK_FUNCS(getnameinfo)

"config.h.in"::

/* Define if you have the getaddrinfo function.  */
#undef HAVE_GETADDRINFO
/* Define if you have the `getnameinfo' function. */
#undef HAVE_GETNAMEINFO

Of course, if HAVE_GETADDRINFO is not defined, your getaddrinfo() should be defined. Likewise if HAVE_GETNAMEINFO is undefined, your getnameinfo() should be defined. Note that "openbsd-compat/fake-rfc2553.c" is written so.

Protocol Dependent Functions

Here is summary of protocol dependent functions, which you should not use anymore. This summary should be read like getaddrinfo() should be used instead of inet_pton(), for instance.