Some minor thoughts about Voice over IP, SIP, RTP, etc.


The situation is almost ideal: You have several distributed, physically separated locations or offices. They are connected to each other by gigabit ethernet or even 10 GigE. It does not matter if this interconnection is implemented as a star network, ring network, or as a full mesh. There is just one bridgehead which connects our internal network to the internet and to the POTS. A telephony company provides us with one to N T1/E1 trunks with direct inward dialing (DID) so that we are free in assigning (internal) phone numbers. An internet provider or even an internet registry provides us with some official IP addresses which we will also use for our internal network to avoid aching NAT/PAT in our VoIP setup.




Using static IP addresses for each VoIP device is not recommended beacuse it leads to massive problems during rollout. Since mobile clients or guests use already dynamic IP addresses, we should use DHCP throughout the VoIP network. So VoIP devices are located by dynamic IP address. There has to be a mapping between these addresses and the corresponding username. During the registration process of a VoIP device, the username/password combination has first to be checked against a user database, and then the IP address is stored along with the username in a dynamic lookup database, which we will call the location server. In a distributed environment you can use either a central location server or one per site. In the latter case finding VoIP users could be seen as a classical routing problem, which requires something like OSPF for SIP, where each site announces the list of usernames and ip addresses it has currently registered. A network failure or the crash of a location server only affects the corresponding site. Using a central location server obviously is more simple as we do not have to worry about how to reach a converged state. But high availability and failsafe operation has to be done by the database system which is not available on opensource database systems and even not by most commercial databases as we need either a master-master system or a master-slave system with automatic failover and synchronisation, both with multiple IP addresses. So we have to life with this drawback or we must implement our own new routing protocol for SIP, which would be a nice diploma thesis of its own (or give me two spare weekends ;-)


Transition from and to the plain old telephone system is handled by a couple of simple DSPs without any routing logic. They have just to forward SIP/RTP stream from the interior network to the POTS and vice versa. These DSPs together with another SIP proxy will form the so called bridgehead.

Server location

VoIP clients locate SIP servers (I do not distinguish between registar, proxy, or redirect roles as this is quite braindead) by multiple methods. First of all, we setup a DNS SRV RR for our domain, eg. 86400 IN SRV 1 1 5060 Secondly, that hostname ( resolves not to an unicast address, but to an anycast address, which is almost the same. But that IP address is configured on each SIP server (along with a unique unicast address, of course) and announced as a host route by each OSPF router near by that SIP server. Furthermore, we can use ENUM in a very elegant way by setting up a catchall NAPTR RR for our DID prefix that resolves to the (anycast) host record (

Music on Hold

MoH could be implemented either as SIP REDIRECT or as a RTP proxy, which stays in the middle of the voice stream. The latter is not an option since it puts a high load on each proxy and because we going to use OpenSER, which is "just" a SIP proxy. A MoH server maybe reached through a multicast address which keeps the network load at a minimum, or through several anycast servers, which enables us to use different music streams per user. Note that a MoH server should stand close to the client that is put on hold, whereas a SIP proxy should stand near by the client initiating the phone call.


As long as we want plain voice-to-mail services, we could implement the voicemail servers as anycast service, too. Since each anycast server has access to the user database, an incoming call will be recorded and sent to the user by mail, regardless of which server accepted the call. If we want to have real voicemail boxes which users can monitor by phone, then we have to setup several voice mail servers and to add the address of each server to the corresponding user profile. Generic anycast voicemail servers may run on the same computer as anycast MoH servers. We might use Asterisk for both scenarios.


We have to ensure that our well known VoIP devices (phones and servers) trust each other and no one else. Phones have to accept incoming calls from our SIP servers only. This could be achieved if we setup a public key infrastructure in which every server gets a key pair and its certificate. Phones have to accept calls from servers which exhibit such a certificate only. Additionally, we could use 802.1x for assigning QoS parameters or VLAN membership on the switch the telephone is connected to. A confidential SIP/RTP stream is established by SSL/TLS afterwards. Although we have a quite strict policy, we are also able to use softphones or mobile clients. We leave it up to them how they react on unsigned calls.


Accounting is done on the bridgehead as we assume that this is the only point where traffic (to the POTS) is billed.


Unfortunately there are a couple of things that have to be reconsidered:


We need a static database for user names, password, telephone numbers, possibly voice mail servers. This could be a replicated 1 master, N slaves database as we read that data only (LDAP, SQL, Radius, anything OpenSER can access). But we also need a dynamic user location database, either a central server or one on each site. This might be a SQL server, but could also be a DBM style database, as long as we have a suitable API to access it.
VoIP devices may range from simple softphones to hardphone with support for TLS and certificates. We need the latter ones for establishing a secure and trustworthy environment.


...are said to say more than 1000 words sometimes:

concept 1

concept 2

concept 3


a not so complete literature list