While the topic of secure voice communication on mobile is hardly new, it has been getting a lot of media attention following the the official release of the Blackphone, Consequently, this is a good time to go back to basics and look into how secure voice communication is typically implemented. While this post focuses on Android, most of the discussion applies to other platforms too, with only the mobile clients presented being Android specific.
Voice over IP
Session Initiation Protocol
REGISTERrequests from a set of clients in the domain(s) it is responsible for, and offers a location services to interested parties, much like DNS. Registration is dynamic and temporary: each client registers its SIP URI and IP address with the registrar, thus making it possible for other peers to discover it for the duration of the registration period. The SIP URI can contain arbitrary alphanumeric characters (much like an email address), but the username part is typically limited to numbers for backward compatibility with existing networks and devices (e.g.,
A SIP call is initiated by a UA sending an
INVITE message specifying the target peer, which might be mediated by multiple SIP ‘servers’ (registrars and/or proxies). Once a media path has been negotiated, the two endpoints (Phone A and Phone B in the figure below) might communicate directly (as shown in the figure) or via a one or more media proxies which help bridge SIP clients that don’t have a publicly routable IP address (such as those behind NAT), implement conferencing, etc.
SIP on mobiles
Because SIP calls are ultimately routed using the registered IP address of the target peer, arguably SIP is not very well suited for mobile clients. In order to receive calls, clients need to remain online even when not actively used and keep a constant IP address for fairly long periods of time. Additionally, because public IP addresses are rarely assigned to mobile clients, establishing a direct media channel between two mobile peers can be challenging. The online presence problem is typically solved by using a complementary, low-overhead signalling mechanism such as Google Cloud Messaging (GCM) for Android in order to “wake up” the phone before it can receive a call. The requirement for a stable IP address is typically handled by shorter registration times and triggering registration each time the connectivity of the device changes (e.g., from going from LTE to WiFi). The lack of a public IP address is usually overcome by using various supporting methods, ranging from querying STUN servers to discover the external public IP address of a peer, to media proxy servers which bridge connections between heavily NAT-ed clients. By combining these and other techniques, a well-implemented SIP client can offer an alternative voice communication channel on a mobile phone, while integrating with the OS and keeping resource usage fairly low.
Most Android devices have included a built-in SIP client as part of the framework since version 2.3 in the
android.net.sip package. However, the interface offered by this package is very high level, offers few options and does not really support extension or customization. Additionally, it hasn’t received any new features since the initial release, and, most importantly, is optional and therefore unavailable on some devices. For this reason, most popular SIP clients for Android are implemented using third party libraries such as PJSIP, which support advanced SIP features and offer a more flexible interface.
SIP is a transport-independent text-based protocol, similar to HTTP, which is typically transmitted over UDP. When transmitted over an unencrypted channel, it can easily be intercepted using standard packet capture software or dumped to a log file at any of the intermediate nodes a SIP message traverses before reaching its destination. Multiple tools that can automatically correlate SIP messages with the associated media streams are readily available. This lack of inherent security features requires that SIP be secured by protecting the underlying transport channel.
A straightforward method to secure SIP is to use a VPN to connect peers. Because most VPNs support encryption, signalling, as well as media streams tunneled through the VPN are automatically protected. As an added benefit, using a VPN can solve the NAT problem by offering directly routable private addresses to peers. Using a VPN works well for securing VoIP trunks between SIP servers which are linked using a persistent, low-latency and high-bandwidth connection. However, the overhead of a VPN connection on mobile devices can be too great to sustain a voice channel of even average quality. Additionally, using a VPNs can result in highly variable latency (jitter), which can deteriorate voice quality even if jitter buffers are used. That said, many Android SIP clients can be setup to automatically use a VPN if available. The underlying VPN used can be anything supported on Android, for example the built-in IPSec VPN or a third-party VPN such as OpenVPN. However, even if a VPN provides tolerable voice quality, typically it only ensures an encrypted tunnel to a SIP proxy, and there are no guarantees that any SIP messages or voice streams that leave the proxy are encrypted. That said, a VPN can be a usable solution, if all calls are terminated within a trusted private network (such as a corporate network).
Because SIP is transport-independent it can be transmitted over any supported protocol, including a connection-oriented one such as TCP. When using TCP, a secure channel between SIP peers can be established with the help of the standard TLS protocol. Peer authentication is handled in the usual manner — using PKI certificates, which allow for mutual authentication. However, because a SIP message typically traverses multiple servers until it reaches its final destination, there is no guarantee that the message will be always encrypted. In other words, SIP-over-TLS, or secure SIP, does not provide end-to-end security but only hop-to-hop security.
SIP-over-TLS is relatively well supported by all major SIP servers, including open source once like Asterisk and FreeSWITCH. For example, enabling SIP-over-TLS in Asterisk requires generating a key and certificate, configuring a few global tls options, and finally requiring peers to use TLS when connecting to the server as described here. However, Asterisk does not currently support client authentication for SIP clients (although there is some limited support for client authentication on trunk lines).
Most popular Android clients support using the TLS transport for SIP, with some limitations. For example the popular open source CSipSimple client supports TLS, but only version 1.0 (as well as SSL v2/v3). Additionally, it does not use Android’s built-in certificate and key stores, but requires certificates to be saved on external storage in PEM format. Both limitations are due to the underlying PJSIP library, which is built using OpenSSL and requires keys and certificates to be stored as files in OpenSSL’s native format. Additionally, server identity is not checked by default and the check needs to be explicitly enabled in order for server identity to be verified, as shown in the screenshot below.
When a secure SIP connection to a peer is established, VoIP clients indicate this on the call setup and call screens as shown in the CSipSimple screenshot below.
Securing the media channel
When a voice channel is encrypted using SRTP the transmitted data looks like random noise (as any encrypted data should), as shown below.
SRTP defines a pseudo-random function (PRF) which is used to derive the session keys (used for encryption and authentication) from a master key and master salt. What SRTP does not specify is how the master key and salt should be obtained or exchanged between peers.
cryptoand can contain a crypto suite, key parameters, and, optionally, session parameters. A
cryptoattribute which includes a crypto suite and key parameters might look like this:
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:VozD8O2kcDFeclWMjBOwvVxN0Bbobh3I6/oxWYye
AES_CM_128_HMAC_SHA1_80 is a crypto suite which uses AES in counter mode with an 128-bit key for encryption and produces an 80-bit SRTP authentication tag using HMAC-SHA1. The Base64-encoded value that follows the crypto suite string contains the master key (128 bits) concatenated with the master salt (112 bits) which are used to derive SRTP session keys.
SDES does not provide any protection or authentication of the cryptographic parameters it includes, and is therefore only secure when used in combination with SIP-over-TLS (or another secure signalling transport). SDES is widely supported by both SIP servers, hardware SIP phones and software clients. For example, in Asterisk enabling SDES and SRTP is as simple as adding
encryption=yes to the peer definition. Most Android SIP clients support SDES and can automatically enable SRTP for the media channel when the
INVITE SIP message includes the
crypto attribute. For example, in the CSipSimple screenshot above the master key for SRTP was received via SDES.
The main advantage of SDES is its simplicity. However it requires that all intermediate servers are trusted, because they have access to the SDP data that includes the master key. Even though the SRTP media stream might be transmitted directly between two peers, SRTP effectively provides only hop-to-hop security, because compromising any of the intermediate SIP servers can result in recovering the master key and eventually session keys. For example, if the private key of a SIP server involved in SDES key exchange is compromised, and the TLS session that carried SIP messages session did not use forward secrecy, the master key can easily be extracted from a packet capture using Wireshark, as shown below.
ZRTP aims to provide end-to-end security for SRTP media streams by using the media channel to negotiate encryption keys directly between peers. It is essentially a key agreement protocol based on a Diffie-Hellman with added Man-in-the-Middle (MiTM) protections. MiTM protection relies on the so called “short authentication strings” (SAS), which are derived from the session key and are displayed to each calling party. The parties need to confirm that they see the same SAS by reading it to each other over the phone. As an additional MiTM protection, ZRTP uses a form of key continuity, which mixes in previously negotiated key material into the shared secret obtained using Diffie-Hellman when deriving session keys. Thus ZRTP does not require a secure signalling channel or a PKI in order to establish a SRTP session key or protect against MiTM attacks.
On Android, ZRTP is supported both by VoIP clients for dedicated services such as RedPhone and Silent Phone, and by general-purpose SIP clients like CSipSimple. On the server side, ZRTP is supported by both FreeSWITCH and Kamailio (but not by Asterisk), so it its fairly easy to set up a test server and test ZRTP support on Android.
ZRTP support in CSipSimple can be configured on a per account basis by setting the ZRTP mode option to “Create ZRTP”. It must be noted however, that ZRTP encryption is opportunistic and will fall back to cleartext communication if the remote peer does not support ZRTP. When the remote party does support ZRTP, CSipSimple shows an SAS confirmation dialog only the first time you connect to a particular peer and then displays the SAS and encryption scheme in the call dialog as shown below.
In this case, the voice channel is direct and ZRTP/SRTP provide end-to-end security. However, the SIP proxy server can also establish a separate ZRTP/SRTP channel with each party and proxy the media streams. In this case, the intermediate server has access to unencrypted media streams and the provided security is only hop-to-hop, as when using SDES. For example, when FreeSWITCH establishes a separate media channel with two parties that use ZRTP, CSipSimple will display the following dialog, and the SAS values at both clients won’t match because each client uses a separate session key. Unfortunately, this is not immediately apparent to end users which may not be familiar with the meaning of the “EndAtMitM” string that signifies this.
The ZRTP protocol supports a “trusted MiTM” mode which allows clients to verify the intermediate server after completing a key enrollment procedure which establishes a shared key between the client and a particular server. This features is supported by FreeSWITCH, but not by common Android clients, including CSipSimple.