Why won’t anybody build far-end echo cancellation into their VoIP phones and ATAs?

Nobody VoIP Phone or ATA on the market offers talker far-end echo cancellation. They should.

Natural Rock Formation that amplifies echo.


Some background: echo is when you hear yourself talking. It’s usually the caused by the device on the other end of the call, but it’s exacerbated in VoIP networks because they have long delays.

Suppose you have a phone call that includes a VoIP device (such as a PolyCom SoundPoint 650 IP SIP Phone, or a Cisco 7260 SIP phone, or a Cisco/LinkSys/Sipura ATA, or an Aastra 57i).

There’s a VoIP phone plugged into some sort of IP network. In many cases, it’s an Ethernet network with a DS1 (T1) connection to the VoIP service provider / ISP. There’s a VoIP-PSTN gateway connected there, such as a MetaSwitch VP3510, or a Lucent Compact Switch (LCS), or an AudioCodes, or a Cisco AS5400. It has an Ethernet interface and TDM interfaces, such as DS3, DS1. Maybe it does ISUP and has SS7 A-Links, or ISDN PRI. Then there’s the PSTN, which includes traditional telephone switches — and actually a lot of VoIP hidden in there too. Finally, there’s a PSTN phone.

When the VoIP phone user says something, he may hear an echo of his own voice coming back.

Cisco has a nice WAV file demonstrating what echo sounds like. (This page is known to some insiders as the Cisco Duck Quack page.)

Why does C3P0 hear his own voice coming back?

The PSTN Phone isn’t perfect. Some of the electrical voice signal that enters it may be “reflected” back. This is sometimes called the “Two-wire-to-four-wire” conversion: a normal phone line is a two-wire electrical circuit with both sides of the conversation on it, but the handset itself has two wires for the speaker, and two wires for the microphone. If this isn’t built just perfectly, some of the sound signal that’s sent to the speaker will be reflected back down the wire. Many phones are not perfect.

The PSTN Phone may be on speakerphone, or pick up other acoustic echo. Normal room walls echo sound back to us. It’s there, but we don’t normally notice it.

The VoIP Network is long.We won’t normally notice echo in a room, unless we’re in a big room. Our brain is pretty good at filtering out sound that’s echoed back very quickly; if we hear an echo less than 100 [ms] or so after we say it, we don’t even notice it normally. So if the room is big enough that sound takes a long time to echo back, then we might notice it.

(How big would a room need to be? Around 55 feet / 17 meters across would work nicely to reflect off the far wall. Sound travels 340 meters / second, and we’ll hear an echo if our voice takes around 100 [ms] to get back to us. That means it needs to be 0.1 * 340 meters from my mouth to my ear, or half that from one side of the room to the other because the sound travels that distance twice.)

It can take a long time for a sound signal carried via VoIP to get from the talker C3P0 back to his ear. 100 millisecond round-trip-time is easily achievable. Why? Because packets sit in buffers along the way. In traditional non-VoIP TDM networks, digital voice data is rarely “buffered” to any extent. But in VoIP networks, buffering always happens. This “buffering” means that packets are sitting inside devices — phones, routers, switches — effectively adding to the delay between the talker and his echo.

Put echo cancellation in the VoIP Phone.

This is all very annoying for the talker who hears himself. VoIP users suffer from this much more than PSTN users, because of this delay. Technically, the PSTN phone is creating the echo, it seems silly to just point blame. Technically, VoIP networks should be free of jitter (variation in packet transit delay) — but networks aren’t perfect, so VoIP phones have jitter buffers built in. It’s standard equipment, like seat belts in cars.

But echo cancellation — the sort that’s really needed, that would cancel out the echo received back across the VoIP network — just is not put into VoIP phones normally. Some phones, such as Polycom, have limit acoustic echo cancellation capability to prevent the VoIP phone itself from echoing back. That’s nice, but that’s not the main problem.

It seems VoIP phone vendors are spending their time building in G.722 wideband codec support (“HD Voice”) , so conversations within your office building will sound nice. Whoop-ee.

VoIP Echo Cancellation is possible. Vendors are routinely building limited echo-cancellation ability into the PSTN-VoIP gateway device. (E.g., General Bandwidth G6, MetaSwitch). But it can’t always be used, and it’s not always effective. For example, if you connect to the PSTN through Level(3), then you don’t get a gateway with echo cancellation. And on some gateways, using echo cancellation limits the number of calls you can make through the box.

But is echo cancellation in the VoIP gateway sufficient? Apparently not. If it were, I wouldn’t hear about many complaints from my client base. And Ditech would have no reason to make a VoIP-only echo cancellation box. But as it is, VoIP carriers have lots of trouble with echo.

And, in general, centralizing the echo-cancellation capability doesn’t seem optimal either. Putting lots of work and intelligence in one place tends to make one really-expensive place.

VoIP Phone and ATA Vendors should add echo cancellation to their devices. Polycom, Cisco, Aastra, Linksys, Snom, Adtran, etc. listen up: your device should cancel out the echo received back in the RTP stream from the VoIP network. It can’t be that hard! I know there are DSPs that can help with this. Add this feature to your top-of-the-line phone! Charge more for it! Make it a premium add-on license if you must!

Some people just want a super-cheap VoIP phone. But some people are trying to replicate the services of traditional phone systems, but using VoIP. For them, the echo problem is real, and serious. It can’t always be solved with a gateway. And an extra 25 percent in the cost of the CPE might be hard to swallow, but not as hard as having no solution at all.