VoIP Drove down the cost of making phone calls. We love that about VoIP: free long distance! In the telecom industry now, the idea that calls within a country would cost a retail user more than local calls seems quaint.

But the low cost of VoIP calling has brought a new adversary: “Robocalling,” so called because “Robots” (computers with pre-recorded messages) place phone calls. Many of these calls are placed in violation of existing laws. According to US Federal Trade Commission official Lois Greisman, lawbreakers “can place robocalls for less than one cent per minute,” and I suspect that only counts the calls that are actually answered. And though the call may be dialed by an automaton, it may be connected to a live person when its answered.

Elderly Victimized by VoIP Robocalling

The main ones who suffer are non-VoIP Users, disproportionately the elderly. Older people see no reason to switch from the same telephone service they’ve had most of their lives, and they retain legacy, POTS-based phone service. In rural areas, this is often served by feature-poor TDM equipment with only “Custom Local Area Signal Services” (CLASS) features, developed in the 1980s in the rotary phone era. The best available Robocall blocking uses Simultaneous Ring, which is not available in the CLASS feature set in classic telephone service.


Because the pain of these automated calls is felt so acutely by elderly citizens in America, the US Senate Special Committee on Aging is pushing for laws to force Service Providers to Provide Blocking. That the Aging Committee is pushing to regulate telephone services serves as evidence of the gravity.

By building efficient networks, VoIP technologists unwittingly created a special problem for users of legacy technology. (Hint: if you have aging relatives, be sure to upgrade their phone service to a modern option.)

“We now have this onslaught—it’s terrible,” said Indiana Attorney General Greg Zoeller, who added that the scams are  “primarily directed at seniors.” Scammers attempt to trick seniors into paying for taxes they don’t owe, or providing banking information for lotteries they haven’t won.

The Wall Street Journal reports on one 88-year-old Arkansan who lost $110,932. And often, blocking or changing the Caller ID of the Robocaller is a key part of the scam. Another 57-year-old person lost nearly $250,000 in a scam relying on a “device that conceals the identity and location of the caller.”

The Nature of the Problem

Attackers setu

Scammers and other illegal robocallers send calls through multiple networks, sometimes using Least Cost Routing, or stolen phone service.

p computers to launch outbound attacks, undoubtedly through VoIP protocols, and send them through grey-market PSTN Termination Providers. These are firms that offer low rates to accept a SIP telephone call and deliver it to a PSTN phone. But if they do nothing to ensure that the caller IDs used are accurate, and may be respond slowly to complaints against their customers. Further, they may choose to not participate in CDR-based Call Traces, the de facto industry standard method used to determine the source of a call. These can be anywhere in the world, and these firms need not exist for long.

The PSTN Termination Providers typically don’t actually operate PSTN Gateways themselves. That is, they have no ability to directly send the traffic into the legacy TDM SS7/C7 telephone network to reach any phone in the world. So the PSTN Termination Providers buy services from more legitimate PSTN Gateway (PSTN GW) Providers. These PSTN Gateway Providers then have direct connections to the major telephone carriers of the world, and paths to reach all the smaller ones as well.

AT&T is an example of the PSTN Gateway (GW). In an FCC Hearing on Robocalling in September, 2015, Adam Panagia of AT&T reports that they often receive a single coordinated Robocalling campaign through 20 or more wholesale carriers. If even a legitimate call center in India is routing calls the US, AT&T may receive calls from that single call center through 30 to 50 wholesale customers.

VoIP Fraud Plays a Role

In addition to the use of PSTN Termination Providers to perpetrate Robocalling fraud, some attackers use exploited VoIP Telephone Providers. In these cases, a VoIP provider has a security vulnerability, such as a SIP account with a simple password, or a customer device that is not protected by a firewall.

When attackers get access to this service provider, they use it as one of the paths to send calls to the victims. The defrauded Service Providers whose service is stolen are victims along with those who are called and then scammed. The poor security of many VoIP Service Providers contributes to the Robocalling problem, but Service Providers have the power to improve their security.

Solving the Problem Technically

Blocking Historically Disallowed Historically, Telephone Service Providers in the US were prevented from blocking calls sent to a subscriber. If you bought service from one, and had a telephone number, then the Service Provider was obligated to deliver every call. The FCC changed that in June, 2015, in a Declaratory Ruling, and by September of that year they were running workshops to encourage the proper kind of blocking.

The FCC made a welcome change, and though some service providers were already providing some types of blocking before the June 2015 ruling, it opens options for blocking calls even among the scrupulous. So the question is now: which calls should be blocked?

Caller ID Trusted Currently, Caller ID fundamentally trusted from the original caller. Unlike modern email, there is no technical mechanism to confirm that the caller ID provided on a call is actually genuine. VoIP technology, such as SIP From, “Trust Domains” P-Asserted-Identity, and SIP “Identity” have not helped because they only apply to limited areas of the network, and don’t provide any certainty as a call travels from a call center in India to a retiree in Iowa.

Verifiable Caller ID is a key requirement to block scammers, according to Scott Mullen, VP Technology of Bandwidth.com. They’re a major PSTN termination and origination firm in the USA with extensive SS7/TDM interconnects, noted in the FCC workshop that toll-free fraud is a major problem they deal with. Even International calls come through in a way that tricks the US recipient into believing it’s a call originating from the US.

Problem Calls Distributed Further, calls from scammers are distributed across many networks. One PSTN GW Provider has some data, while another has different data. It’s not easy to coalesce the information into a unified view to make smart decisions about the call.

Can’t scan a phone call before delivery Another difficulty for scanning telephone calls lies in the real-time nature: unlike an email that can be analyzed in its entirety before it is deposited into your mailbox, only a few bits of information are available to a call-blocking system: (a) Time of call, (b) Claimed Calling party number, (c) Called party number, (d) Input source (such as a particular wholesale customer link).

I should note that we can learn about calls after they are delivered: Robocalls that are answered are usually disconnected very quickly. Short call durations provide some clues after some of the calls are delivered, which may be used to improve the go/no-go decision for future calls.

Technology Types Calls do flow across and SIP and TDM networks: but scam calls almost always originate as SIP. The cost of maintaining TDM infrastructure appears to be too high for scammers, as it probably invalidates the business model.

Give me a call, never

For the moment, Call Blocking based on Caller ID reputation is providing some relief. These work by using a database of known scammer caller IDs, then blocking the calls that are in that database.

The industry de facto standard for Personal Blocking Services is NOMOROBO, released in 2013 after winning a US Federal Trade Commission competition for technical solutions to Robocalling. At the September 2015, NOMOROBO was that annoying better sibling, with the FCC mom asking all the other participants, in effect, “Why can’t you be more like NOMOROBO?”

Nomorobo is free, and blocks calls from Caller IDs that are repeatedly used for annoying calls. It’s available to anyone who has Simultaneous Ring, a feature found on Broadworks, Metaswitch, and most other modern telecom platforms.

How Grandma Gets Attacked

Calls normally route through one or more PSTN Termination Providers before they reach an ordinary telephone company and their victim.

Let’s first look at the normal call path from an attacker to the Victim. In this case, we’ll say that the victim is a subscriber of “Adams Rural Telephone Company.” The Attacker launches calls, dialing outbound calls, and some of those calls route to “PSTN Termination Provider #1”. This company determine where the calls should route, and, in those cases where Adams’ subscribers are the victim, routes the calls to Adams’. This supposes that both Adams and the Attacker do business with the same PSTN Termination Provider, but in many cases there will be several intermediate PSTN Providers.

How Grandma Blocks the Attack

NOMOROBO depends on the Simultaneous Ring feature. To use it, Grandma would setup her calls to ring both her phone and the Personal Blocking Service at the same time.

A Personal blocking service, like NOMOROBO, receives the call through Simultaneous Ring / Call Forking setup on the Victim’s phone. If the Caller ID is in the blacklist, NOMOROBO answers the call, which prevents the Victim from answering.

In this way, NOMOROBO receives an inbound call with the Caller ID provided by the Attacker. If NOMOROBO finds this Caller ID in its blacklist database, it can answer the call – thus ending the call for the Victim. The victim experiences only a single telephone ring.

NOMOROBO is clearly a market leader. They are benefiting from the “Network Effect,” where their popularity makes the service work better by aggregating data. NOMOROBO certainly receives many simultaneous calls during Robocall campaigns, and can exploit the concurrency to make smarter decisions.

But NOMOROBO must depend on the caller ID and called ID only, and cannot do anything to improve the quality of the data. NOMOROBO does not know when calls end (i.e., the duration of answered calls), so it cannot improve its decisions based on calls that turn out to be undesirable and are quickly ended.

NOMOROBO also cannot analyze the audio of the phone call. In the US, many Robocalls begin with dead silence, which is unnatural for ordinary person-to-person calls. Normal humans use this to decide which calls to hang up quickly. This mismatch between robocalls and human-dialed calls should be useful to an automated blocking system, if it were available.

Service-Provider Based Blocking

What happens when the victim does not have the Simultaneous Ring service, or lacks the skills to set it up? Or what if a Service Provider would like to block Robocalls for all of its subscribers? Telephone Service Providers can also block calls by using a network-based service.

Consider the network path shown. Normally calls flow from the Attacker, to an intermediate, and finally to the victim’s service provider.

A Service-Provider Based Blocking Service can provide protection for every customer of a service provider. Using SIP, the calls can be routed through an intermediate device that checks the caller ID. Or, potentially, it could analyze the audio or look for other signatures of fraud.

Service Providers (SPs) can provide protection by using SIP call routing to route calls through an intermediate service. Instead of immediately sending every call to the victim, the SP can route the call to an intermediate service (or device) that checks the blacklist database. To borrow the term from email, only the calls that are not spam are ultimately delivered to the end user.

The Metaswitch SIP Robocall Blocking Service is a current market example of a service using SIP to address this problem. Service Providers can route their calls to the service, which blocks calls coming from caller ID numbers in the blacklist database.

This network model can provide robocall-blocking for an entire Service Provider, and not just for a single user. It’s also far more efficient than Personal Call Blocking, because it doesn’t require all the setup for a media path through the legacy PSTN to check the database for each call.

It also has opportunities for future development: with a stateful SIP proxy, a SP robocall blocking service could know when calls start and end. And once privacy concerns are handled, this approach analyze the audio for hints, such as dead-silence at the start of a call.

But like Personal Call Blocking, this approach still relies on the caller ID, which can be faked for each call.

Nothing is any good if other people like it

Blacklist based blocking services work today precisely because they are not popular. Today’s blocking services rely on calling party ID, as if that’s trustworthy. Scammers do have some incentive to place calls from the same caller ID repeatedly: once they find a telephone number with a matching CNAM caller name that people will answer – with names like “Internal Rev Service,” or “EDUC LOTTERY” – they seem to stick with that same number.

But Robocallers are already adapting to robocall blocking services. Some are calling from randomly-selected working, legal telephone numbers. This method completely defeats simplistic blacklist databases.

This means we really need a trustworthy caller ID. And some in the industry are working to provide it.

Engineers in the Internet Engineering Task Force (IETF) and the Alliance for Telecommunications Industry Solutions (ATIS) have developed a standard called STIR, “Secure Telephony Identifiers Revisited”. When used as designed, each call using STIR will include a signature as evidence that the calling party has the right to call from the telephone number they’re using.

This would be implemented at the entry to the SIP-PSTN network, ideally at the customer’s PBX or at their first Service Provider Interface. For example, a BroadWorks service provider may use SIP authentication to confirm the identity of a caller, and then create the STIR cryptographic signature to confirm the legitimacy of the caller ID.

Don’t strip that header

Currently, there are no SIP headers that must be retained end-to-end through the VoIP networks. All headers can be reconstructed at each step, though a few elements are reused (such as the calling and called party numbers). STIR assumes that VoIP carriers will be able to pass a SIP header through their network from the origin to the terminating carrier. This is certainly technically feasible, but will require substantial coordination – and likely a few SBC software updates for some carriers.

STIR promises a world where you can be certain of the calling party while your phone is ringing. But will it happen? STIR requires substantial technical work on VoIP network infrastructure. Practically every SIP carrier peering/trunk on every SBC deployed will have to be updated.

STIR will require establishment of a Certificate Authority (CA) who can provide the certificates verifying the right to use telephone numbers. We already have Certificate Authorities in the industry servicing the Web industry, so this should not be a major hurdle. You can expect big carriers to desire to be CAs for themselves – likely a smart solution for many cases. For example, AT&T has been, in effect, the “owner” of millions of telephone numbers for years, though they were permaently assigned to their subscribers. It makes sense for AT&T to be the CA for the numbers it already “owns”.

Who Goes There?

To succeed, STIR will have to engage the business models of the modern VoIP PSTN. Private companies and government entities alike use the flexibility of the PSTN to route their calls through any carrier that is convenient. If STIR requires evidence that the telephone number is being used legitimately, then credentials to use the telephone numbers must be distributed to all of the owners of telephone numbers.

Video Relay Service. For example, at the SIP Forum SIPNOC meeting in June 2016, one major Video Relay Service (VRS) for the Deaf and Hard-of-Hearing community commented that they effectively place calls on behalf of their users. A full STIR implementation will require the VRS providers to place the calls outbound for these users, even though the audio portion is actually connected to a Sign-Language Interpreter.

Government Agencies. Government agencies using COTS platforms like BroadWorks sometimes use a variety of paths for routing their calls outbound. They, too, will need the tools and technology to prove their right to use the caller ID, because preventing spoofing of calls from public institutions is one of hopes for STIR.

Call Center Services. Today it’s also common for a firm to hire a call center service to place outbound calls, representing a firm. STIR will require the Call Center firms to be capable of providing a signature showing the right to place calls from that entity.  For example, if the Call Center for Delta Airlines needs to call you, then the Call Center service will need credentials (like a password) to allow them to place that outbound call from Delta’s telephone number. The Call Center will need to be upgraded to be capable of generating the STIR Identities.

Ten Years of Work

Once Caller IDs are verifiable, telephone users can make smart decisions about which calls they want to allow.

Call Control from Kedlin Company operates on the phone endpoint.

So How long will it take to block fraudulent caller ID, and get true caller ID?  It took about seven years to prevent spoofing of domain names, like “gmail.com”, from the start of the IESG SPF experiment  in 2005 to the implementation of DMARC in 2012. The PSTN moves even slower.

But solving the spoofed-caller-ID problem – and the Robocalling it enables – are worth doing. Service Providers should monitor the progress of STIR, and Vendors (like BroadSoft, Oracle Communications, Metaswitch, Genband, Sonus) should plan to support STIR.

  • Push for STIR Identity Support from your SIP Software Vendor, especially SBC, Feature Servers, and SIP Trunking Devices
  • Configure your equipment to allow the new Identity header to flow through
  • Find out how you can confirm and display the identity of calls you receive using STIR
  • Plan to enable users to legitimately delegate authority to place outbound calls

Committing time to answer questions is the crucial first step

This is Part 3 in my Series on Supporting/Managing Engineers

Configuring bridging, building bridges

Unlike software, systems, network, and voice engineering, regulated engineering disciplines require licensing. According to the National Society of Professional Engineers, a college engineering graduate candidate can “begin to accumulate qualifying engineering experience”. What happens during these four years of post-graduate experience? “The experience must be supervised. That is, it must take place under the ultimate responsibility of one or more qualified engineers.” Further, the experience is expected to be “high quality, requiring the candidate to develop technical skill and initiative in the application of engineering principles.”

These are the engineers building the bridges you drive over, the flammable electrical parts you install in your walls, and the explosive power plants you live near. Society expects high standards because of the risks to health and safety.

So to advance in their field, those engineers seek out supervision. And because of the licensing, senior engineers must participate in the mentoring process. They have a valuable structure of mentoring.

But what about the “engineers” who run email servers, build voice networks, and write software? Generally, these information technology (IT) and computing disciplines are unregulated. Anybody can do IT as badly as they like, and we have no mentoring structures in place.

ASCII question, but got no ANSI

Computing has grown without formal, legally-mandated mentoring requirements because computers serve as self-checking devices: if the new programmer tries to do something crazy or foolish, it won’t compile or it won’t work.  Unlike faulty bridge designs, computing systems are relatively easy to test. If you try to build a network but don’t know what you’re doing, that network won’t function.

But in IT/computing fields, mentoring is still needed:

  1. IT Engineers often have questions that cannot quickly be googled or tested by experimentation
  2. IT Engineers need to learn from the mistakes & experience of others
  3. IT Engineers need review from other brains to help check their own ideas

It can be hard for a junior engineer to get a solid answer to a question. The best engineers are always busy, and they’d rather spend their time with computers, not people. For example, the Myers-Briggs personality type INTJ is  used to link introverted personality types to Computer Programmers and Engineers. For these introverts, answering your question is draining. “If you’re a true introvert, networking is excruciating,” writes Susan Adams in Forbes 2014.

So if you’re learning something new,  how do you get your questions answered if you’re among people who’d rather avoid people?

A Mentor commits to answer questions

With so many demands on a skilled professional’s life, the key scarcity is the willingness to provide answers when questioned. So if you’re going to be a mentor engineer,  be sure you’re available to hear questions, and provide good answers.

That is, mentoring in IT Engineering is first about the mentor’s commitment. The mentor has to be willing to take questions from junior engineers, and commit to answer them.

As a professional programmer, Jud McCranie answered hundreds of this author’s written questions about programming through 1989-1993. He’s a great example of a mentor, investing in answering questions of a curious mentee.

I had some great mentors as I was learning. Jud McCranie is a professional programmer I met through through a university-operated  Bulletin Board System in the early 1990s. He answered a thousand questions from me on programming, and even decrypted my amateurish encryption algorithms. He was a mentor because he took my questions and didn’t ignore them. He challenged my thinking, often asking me questions I couldn’t answer myself.  I’m always thankful for his patient consideration of numerous questions from a teenager. (McCranie is cited in one of Knuth’s new volumes, the Art of Computer Programming.)

Another superb question-answering mentor was the late Jon Hamlin, an ex-CIA intern and graduate of Valdosta State University and University of Georgia. Jon was far more interested in system administration, networking and UNIX, and in his role as computer science lab manager at Valdosta State University, Jon had access to extensive SunOS/Solaris resources and the time to think about my questions. He setup a Linux computer and gave me access, and helped me clean up a few messes.

Both of these men put in hours of their lives to answer questions I wrote through email, and they’re part of my motivation to answer questions for others.

Why it’s so hard to get somebody to really answer a question

I just claimed that there’s a scarcity of willingness to answer questions, so the first role of a mentor is to commit to be available to answer questions. Why is that?

  1. There are lots of ignorant people willing to give bad answers. But you don’t want them to be your mentor.
  2. 10481690626_b1be89f8cc_o
    Competent Computing/IT Engineers are very busy. Getting a senior engineer to commit the time to mentor is a big deal. Photo: Tim Regan.

    It takes real time and expertise to answer questions. As Erica Friedman writes, sometimes you’re expecting a simple answer about a complex system.”Some answers attempt extremely top-level analysis, but few people will have time or expertise to answer a truly complex simple question.”

  3. Competent engineers usually prefer to stay busy engineering, not doing chit-chat. According to  John Hales of Global Knowledge, communication and explanation are not key character attributes for successful IT Professionals.
  4. IT Engineers are often expected to make progress quickly, so they can’t wait long for answers. As one contributor said on StackExchange, there’s no time pressure to answer questions in online forums, and so it’s easier to get questions answered there.

Because engineers who prefer results over talking, and the demands on competent engineers, it’s hard to get a timely answer from a competent engineer.

The curious case of the unasked question

richard-feynman-1“I would rather have questions that can’t be answered than answers that can’t be questioned.”  ~ Richard Feynman

Once a mentor is available to answer questions in a timely way, the mentee must be willing and able to ask questions. Often they are not.

Michael Adams of Quizbean identifies several common reasons junior staff don’t ask questions. For engineers, some key reasons they don’t ask questions are:

  1. Nervousness. “They don’t want to be embarrassed in front of their boss or co-workers,” Adams writes. This can be caused by simple anxiety, or excessive pride if they don’t want to admit there’s something they don’t know. Antidote: So the mentor must make it easy and unthreatening to ask questions by readily enjoying the curious discovery of new facts. Computing fields move famously fast, so there are always new things to learn.
  2. Avoiding annoyance. They don’t want to ask questions because they perceive it to be inconvenient. Antidote: Therefore the mentor must actively invite the questions.
  3. Bewilderment. The mentee doesn’t know where to start because everything feels so unfamiliar. Antidote: The mentor should set milestones of capability to encourage steady improvements in comprehension; e.g., crawl, then walk, then run. E.g., Login to the server first. Then locate the logs. Then read the logs. Then understand the logs.
  4. Previous trauma. The world is full of jerks, and many people have been criticized by those jerks for asking good questions. The mentee may need to recover from unhealthy work environment where legitimate questions were met with caustic attitudes.
  5. Lack of curiosity. One of the most dangerous problems is a lack of curiosity in the subject matter. Mauricio Porfiri, a robotics researcher at New York University, says that “Being creative and being curious is more important than being the smartest or the best at equations if you want to be a great engineer.” Albert Einstein said that “The important thing is not to stop questioning. Curiosity has its own reason for existing.” The lack of curiosity leads to a dangerous complacency, causing the mentee not to care enough to bother to formulate questions or challenge their own assumptions, but to muddle through with the current level of ignorance. Computing is great because it encourages curiosity. Have a question? Try it out. A chemist or physicist or mechanical engineer needs equipment for a lab, but the Computer Scientist needs only a computer and the means to program it. Ayodeji Awosika writes about the dangers of suppressed curiosity in “The Theory of Nothing: Why Lack of Curiosity Leads to Mediocrity.”

Beyond Q&A: The Weekly Mentoring Meeting

To support growth of a mentee, mentors can schedule regular time with the mentee. Just like the commitment to answer questions, the mentor must make make these meetings a commitment. Commonly, this happens as a weekly meeting to ask questions and plan progress.

Jim Anderson’s approach to mentoring includes a weekly meeting. The advisee’s next steps were documented on his office whiteboard for clarity and easy reference.

Jim Anderson, a Computer Scientist at UNC-Chapel Hill, followed a simple model of tracking progress for his advisees: he wrote the next steps for each of his mentees on the whiteboard in his office. Then they could easily see what was expected. And in each weekly meeting, he could easily recall their responsibilities.

Plan the route out of ignorance. Even in non-supervisory mentoring, it’s helpful for the mentor to plan and track progress so the mentor ensures that the incomprehensible is coming into focus for the mentee. Without this guidance, the mentee may be trapped in a complicated area without a path forward, and without the ability to ask questions to get out of it.

Review recent accomplishments. The weekly meeting is also a good time to review samples of the mentee’s work. The mentor can praise progress and identify the most important improvements the mentee can work on next. But even when identifying the next growth area, the mentee should recognize the mentor’s achievements.

For example, for Computing/IT professionals, reviewing work can mean:

  • Discussing interesting troubleshooting problems and the approach to troubleshooting.
  • Reviewing system configurations to see how a mentee’s task was accomplished.
  • Reviewing source code.


Establishing Healthy Mentorship

To begin healthy IT/Computing mentoring,

  • Get experienced engineers genuinely willing to answer questions: call them Mentors.
  • Get other engineers with curiosity, who are willing to ask questions.

Photo: Merrimack College Mentoring Program.

Sometimes junior technical staff are starved for interesting work while senior staff are overworked

This is Part 2 in my Series on Supporting/Managing Engineers

If a team has lots of technical work to do, and only a few brilliant engineers available, how do you get work to the right people?

In this article, I discuss methods for managing work in IT and technology teams, such as those doing Network Operations or Engineering, Devops, Security Management, System Administration when you have a mixture of skill levels, including “star engineers”  with 10+ years of experience. I would give different advice to a team focused on developing software.

Call the Brain Surgeon!

A hospital is full of medical staff.  But when a life-threatening case rolls in, you want the most experienced physician available to handle it.  And if you’re doing important, high-profile, mission-critical technical work, you might want the most experienced technician to do the work.

But if the problem at hand is to restore the telephone service for the hospital, or to enable a failed 911 Emergency calling service, then it makes sense to put the best available people on it.

…to suture this wound

Yet often, the risks are not so high, and the need is not actually so urgent. You really don’t want your most-senior staff doing relatively low-risk simple work. Escalating all work to the most-skilled person robs the lesser-skilled staff of experience and deprives the organization of good ideas.

How do you prevent all the hard work from being done by the best and brightest?  How do you prevent all the work from flowing up to the highest-skilled people?

  • Make junior staff work at the highest level possible
  • Make senior staff support and include junior staff in strategy and planning
  • Define the barriers between junior and senior staff
  • Define the workflow for projects

Excelling as a Junior Engineer

Suppose you’re a junior engineer in a team, and all of the challenging work goes to the senior engineers. You know you could do that work, but it’s always going to the senior guy. How do you get the chance to work on hard problems?

It’s possible your management is fumbling, but be sure you’re doing a great job on the problems you have. It’s easy to dream about solving challenging problems, but you have to be sure you’re doing fabulous work in your current tasks:

  • Technically:
    • Are you doing as well as anyone in the world?
    • Are you aware of how other people solve this problem?
    • Are you documenting your methods?
    • Are you listening to the ideas of your colleagues?
    • Are you clearly explaining your results?
    • Are you recording your questions and other unknowns, then working to get answers to those questions?
  • Organizationally:
    • Are you confounded by how difficult it is to make progress?
    • Are you losing track of a detail, date, times, or meetings?
    • Are you recording results of the project so others can benefit from it later?
    • Are you making progress before the deadlines, so that you have substantial progress to demonstrate, and questions to ask?
  • Inter-personally:
    • Are you communicating clearly, with the requester of the work (the “customer”) and others with whom you collaborate?
    • When you have a chance to talk about the project, are you engaged? Are you contributing ideas?
    • Are you doing what you can to be pleasant, even fun, to work with?

Make Yourself Available to Hard Problems. Find out how the interesting projects come in, and look for ways to be involved.

  • Try to make sure your work environment isn’t too hidden and isolated.
  • Ask to join conference calls on interesting projects
  • Take notes in meetings, and offer to provide those notes to other people.
  • Ask questions about everything you don’t understand.

Senior Engineers Plan Projects

What is the role of the Senior Engineers?

  • Make design decisions.
    (Anything that can be done either better or worse is a design decision.)
  • Analyze the problems and projects
  • Write plans for the projects
  • Guide the solutions toward doing the best thing (“what should be done”), minimizing involvement in method chosen.
  • Solve problems when no-one else will handle them
  • Mentor the Junior Engineers

The distinction of a star technician is perspective and context: they know how things are done and why. They can consider pros and cons of various options for change. They know potential for risks. They should know the organization’s strategy and mission.

With this context, a senior engineer should be planning projects:

  • What are the objective and the goals?
  • What are each of the phases of work along the way?
    Tip: keep each phase <4 hours of work
  • What kind of data needs to be collected before proceeding?
  • How will we know when we’re done?
  • How long should it take to complete each phase?

A star technician is as mature as the plans they can provide to others to execute.

Define the Junior/Senior Boundary

What kind of work should be done by senior engineers, and what kind by junior engineers?

Senior Engineers are obligated to:

  • Maintain or improve architectural unity of the system.
  • Provide opinions based on mature technical aesthetic
  • Research the available options for solving problems (i.e., not just their favorite options)
  • Write plans for accomplishing the work
  • Determine which things to do (E.g.: Install Apache on a Linux VM, or buy an F5 Local Traffic Manger?)
  • Answer questions raised by junior staff
  • Review and approve the work done by junior staff
  • Be aware of the skills and capabilities of the junior staff
  • Do the work that no one else is able or willing to do

Junior Engineers should:

  • Follow the plan to accomplish the work
  • Debate and question the plan if they see better ways
  • Ask good questions to fully understand the rationale behind the plan
  • Discuss the problems with Senior Engineers when they find something interesting or unexpected.

Define the Workflow

Send all work to the Junior Staff?

A popular remedy for Work-Flows-Up Syndrome is to send all problems to the junior staff. For example, all tickets come in and start at Tier 1, then the hardest 80% escalate to Tier 2, then only the worst get escalated to Tier 3.

Hacks.  My experience with this method for actual problem solving is poor: Junior staff are prone to develop Workarounds, Tricks, and Hacks (collectively known as “WTH Solutions”) without adequate context or insight into the best way to solve problems. Without knowing the long-term risks, WTH Solutions make the system more fragile.

Blockages. Just as a star engineer can be a bottleneck, junior staff can block the successful completion of work. To know when to ask for help is a learned skill, and junior staff are often prone to keep poking around the edges of a problem. They can tend to sit on interesting problems too long.

Lack of Senior Insight. Senior staff need the insight on the real-world problems with their system. When every problem starts at the junior staff,  the senior staff may be unaware of the problems.

For example: In one case, the network had a serious, but minor issue, occurring  that could be worked-around through an easy procedure. The senior staff gave instructions for the junior staff in how to solve it, expecting the junior staff to need to do it no more than weekly.

But the problem grew more frequent. The junior staff were using the workaround dozens of times per day, faithfully following their instructions. But the senior staff were unaware of the scope of the problem because the junior staff were dutifully executing the procedure.

In this case, the senior staff needed to be aware of the scale to develop a more permanent solution. In that case, when a problem was mostly-delegated to the junior staff, the senior staff couldn’t see what was happening and rethink their proposed “solution.”

Plan Before Proceeding

Rather than “fill all the work from the bottom” skill levels, projects should be planned from the beginning. The first step in processing every new problem should be writing a project plan.

What’s a project?

Anything that has several steps is a “project” as I’m using it here.

What should be designed?

A design decision is anything that can be done incorrectly but still appear to work. For example, you could build a network using one large subnet in one big ethernet, or you could set up the network with routing and subnets. Both may appear to work, but one will be better for the situation.

Writing the project plan may require 10% to 20% of the total work time on the task, and should be done by Senior Engineers. The Senior staff is thus responsible for considering options and maintaining architectural and conceptual integrity of the system.

After the plan is developed, the project can be assigned to somebody at the right skill level.

  • Could the execution of the project affect architectural integrity of the solution? Senior Engineers
  • Does the junior staff have the skills to do it, or can they learn? Delegate to Junior Engineers

Senior / Junior Followup

Senior engineers need to routinely follow-up with junior staff to talk about their projects.

Most projects cannot be completed start-to-finish without interruptions. Sometimes we need a new license file; or we need to open a ticket with the vendor; or we need access to a protected site; or information from the customer. Due to these interactions with other vendors and customers, I find many technicians need to have 3-5 projects in order to stay busy.

Senior Engineers should have routine conversations with junior engineers to discuss the projects, and how they are going. The Project Plans will never be perfect, so discussing the project as it proceeds is a healthy part of mentoring.

Work Flows Where You Pump It

With the proper distribution of tasks, even a team of two engineers or technicians can be more effective. Junior staff have responsibilities to grow technically and professionally, and Senior staff have an opportunity to increase their team’s effective throughput.

Photo Credit: Ewan Cross


Most experienced professionals value the opportunity to mentor others; not so for some elite technologists

To enable more people in a technical team to do work, more people have to know how to do it. But for many engineers, training doesn’t come natural.

This is Part 1 in my Series on Managing Engineers

Is one-on-one training a rational activity, or just a feel-good strategy from the HR department? Many engineers don’t believe in the value of training and mentoring junior people because they perceive no personal benefit in training junior people. But there are benefits for everyone involved.

The reasons high-tech engineers have for not mentoring others are distinctive to the field.

  • Time Calculus: The time and effort to do a task they’ve done is less than the time and effort to explain it to somebody else then to ask them to do it. For example: suppose a 2-hour routine-cognitive task requires three hours of training. It’s easier just to do it than to train someone else in how to do it.
  • Mathematical, Non-Verbal People: For some people, it’s much easier to control a computer and get the problem done than clearly express ideas verbally.
  • Hero Syndrome: The senior person likes to be the one to solve problems because they get their self-worth. And the Customers (those who need the help) would rather just get the problem solved the fastest way possible.
  • Too Much Variety in tasks. Some organizations rarely solve the same problem twice. In these cases, the trainable skills are relatively few.
  • Weak Junior Staff. Sometimes the people available to train aren’t trainable (brains) or coachable (pride). This is the responsibility of organization to find appropriate staff.
  • Job Protection. There’s a myth that some people won’t train others because the would-be trainer would thus be subject to losing their job. I’m not sure I’ve seen real evidence of this.
  • Technology changes too fast. Technical skills have a very short lifetime: what you learn today may be good for about two years, according to IT World.
  • My other trainee is a compiler. Many senior technicians argue that automating the task is a better than training someone else to do it; often that is true. But our desire to write code must be tempered by reality: code doesn’t last very long, and will need humans to maintain it. As Alan Perlis famously said, “It is easier to write an incorrect program than understand a correct one.” Without another human’s review of the problem, you may be solving the wrong problem.

Good Reasons to Train and Mentor Junior Engineers

“In learning you will teach, and in teaching you will learn.”
― Phil Collins

Despite the objections listed above, there are ample good reasons to train and mentor junior engineers.

  • Explaining a practical art to another human is good for humanity, by enabling more people to do more useful work.
  • The Teacher Learns.  Explaining and answering questions increases the capabilities of the trainer substantially. The effort to verbally explain a system or technology forces the explainer to crystalize a model of the system they may have not had earlier. (My knowledge of SIP, VoIP Call Control, and RTP grew vastly as I was forced by teaching to verbalize my knowledge, and confront areas I didn’t understand.)
  • Senior Staff should focus on the Hardest Problems. After the training is done, the time/effort for the senior staff is substantially reduced. Suppose that two hour task required three hours of training; now when the task needs to be repeated, the junior staff can complete it. This makes the senior staff more available for the challenging problems for which they are distinctively qualified. When senior staff are doing the easy stuff, they’re usually leaving the harder problems unconsidered.
  • Technology is fundamentally about improving efficiency through use of effective tools, and having more processors capable of completing a task makes the clients more efficient than having to wait on a single-threaded processor. By having only one person capable of providing the work, you’re creating a bottleneck and failure mode for the rest of the world.
  • Redundant Array of Imperfect Humans (RAIH): Everybody gets sick or needs a holiday. By mentoring someone else, you’re making it easier for yourself to get a break. You may not want to take a break, but eventually you’re going to get sick, die, take vacation, or get a better job.
  • Pair Programming Benefits. Even when automating a task, you can get benefit by bringing in somebody else. There are many documented benefits of pair programming; the Mentoring benefit is key. (Yes, pair programming takes more person-hours — but only about 15%. And for that 15% you get better results.)
  • Fundamental principles change little. While technology moves fast, key fundamentals that don’t: the Buffer (1950s); Interrupt (1950s); Relational Database (1960s); Packet-Switched Networking (1960s); Structured Programming (1960s); multi-user Operating Systems and Security (1970s); Digital Audio and Video (1960s – 1980s).

How to Accomplish Mentoring & Training

“You cannot help people permanently by doing for them, what they could and should do for themselves.”
― Abraham Lincoln
When you decide to start training another technician, start with the right assumptions.
  1. I cannot learn for you.
  2. I cannot replace your own curiosity.
  3. If you won’t ask questions, you’re probably not teachable.
  4. You can’t learn by watching or listening: you have to learn by doing.

More on this in a future post.

Photo Credit: Guido Gloor Modjib

If you use BroadWorks with CSV CDRs, you’re probably accustomed to reading these:

00255943365CF3FC1CF15820160404194616.3061-040000,ECG,Normal,+12296543428,+19125293400,Originating,+12296543428,Public,+12296543409,20160404194616.306,1-040000,Yes,20160404194624.106,20160404194650.684,016,VoIP,,3409,private,,,,local,Group,,PCMU/8000,,fd390cb4-eb6d3d15-f8fb7fd2@,,,,Hermes Communications,,,,,,,,,,n,,,17552886061:0,17552886091:0,Ordinary,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2296543428@vwave.net,Chris Brice,Public,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,33.292,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Group,,,,,,,,,,,,,,,,+19125293400,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2296543428@vwave.net,Primary Device,33.292,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,PolycomSoundPointIP-SPIP_650-UA/,,,,,,,,,,,,,,,,,,,,,,,,,,10626401:2,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Should you tire of counting commas, here’s a CDR decoder meant to run on Mac or Linux systems with gawk: bwcdrdecoder script

With it, you can get output like this:


bwcdrdecoder was made to understand BroadWorks Release 21 CDRs; and generally older CDRs as well.

How To Make bwcdrdecoder Your Own

  1. Download bwcdrdecoder script
  2. Save it as “bwcdrdecoder” someplace in your path, or in your home directory, ~
  3. Change the permission on the file to allow execution, e.g.,
    chmod o+x ~/bwcdrdecoder
  4. Execute it to read in some BroadWorks CSV CDRs, e.g.,
    ~/bwcdrdecoder BW-CDR-20160501000000-2-5CF3FC1CF158-001455.csv


If you don’t have GNU awk “gawk” installed, you’re probably using Mac OS X, and you should install it. I recommend using “brew”

If gawk is installed in another location besides /usr/local/bin/gawk, then you’ll need to edit bwcdrdecoder to change the first line. For example, if bwcdrdecoder is in /usr/bin, you can change the first line to reflect that.


The Polycom SoundPoint IP SIP Phones and Adtran IADs are used for Hosted IP PBX Access Devices, managed by the BroadWorks platform. In a non-geographically-redundancy network, the devices use SIP to register to a single SIP SBC IP address.

To support geographic redundancy of SBCs, the devices must support registration to multiple IP addresses. It must select the proper IP address in each case to maintain its service and operation on the platform.

polycom_adtran_sbc_georedundancy_1In a conventional non-redundant design, each SIP Access device registers with only a single SBC. In a geo-redundant environment the SIP Access Device has to decide properly when and if to use each of the two sites.

The behavior of a SIP Access Device controls the effectiveness of geo-redundant failover. There are no hard and fast standards on the proper behavior. For example:

  • How does the access device determine that the primary site is unavailable?
  • After determining the primary site is unavailable, what should happen to calls that had already been started through that site?
  • How long should the access device wait before attempting to register with the secondary site?
  • After successfully registering with the secondary site, when should the access device check the status of the primary site?
  • Should the access device check the status of the primary site with a SIP registration, or some other SIP method?
  • What happens to SIP subscriptions setup on the access device during a failover to a secondary site?

This TR documents the best results determined for supporting this service on the Polycom SIP phones and geo-redundant SBCs, supported by BroadWorks.

Failover Retransmission Timing

Polycom –

Failover is based on a lack of response for both Polycom 3.x and 4.x software. It uses a 2-exponential backoff starting at 500 ms with a maximum delay of 2000 ms.

Assuming a SIP REGISTER 200 OK Contact expires=30 value causing the phone to re-register every 30 seconds, the worst-case timeline for failover is as follows, for both Adtran TA900 and Polycom phones.

  • Time <0: Device Under Test (DUT )receives 200 OK for SIP REGISTER
  • Time 0: DUT is successfully registered.
  • Time 15: DUT transmits REGISTER to lab-sd1
  • Time 15.5: DUT retransmits REGISTER to lab-sd1
  • Time 16.5: DUT retransmits REGISTER to lab-sd1
  • Time 18.5: DUT transmits REGISTER to lab-sd2
  • Time >18.5: DUT successfully registers via lab-sd2

Times are given in seconds.

Adtran –

The IAD uses a 2-exponential backoff starting at 500 ms with a default maximum delay of 2000 ms.

Assuming default settings, the worst-case timeline for failover is as follows.

  • Time <0: Device Under Test (DUT )receives 200 OK for SIP REGISTER
  • Time 0: DUT is successfully registered.
  • Time 15: DUT transmits REGISTER to lab-sd1
  • Time 15.5: DUT retransmits REGISTER to lab-sd1
  • Time 16.5: DUT retransmits REGISTER to lab-sd1
  • Time 18.5: DUT transmits REGISTER to lab-sd2
  • Time >18.5: DUT successfully registers via lab-sd2

Times are given in seconds.


Polycom –

SIP Subscriptions are lost when the DUT re-registers with the secondary SBC. They are re-established after an hour. DUT does not re-SUBSCRIBE when it switches to a new SBC. BroadWorks does not keep the subscriptions coupled to the registration Contact which would route through the user’s current SBC.

Because subscriptions are not maintained between SBCs, features such as these will not be fully functional during the hours:

  • Busy Lamp Field
  • Shared Call Appearance
  • Message Waiting Indicator

Adtran –

SIP Subscriptions are not applicable when dealing with Adtran IADs.

Polycom SIP Version 3

The Polycom 3.x software is the newest software available for many popular phones, including the SoundPoint IP 330 and 500. Therefore, for networks that include these and related generations of phones, the geo-redundancy behavior of these devices affects the core network operation significantly.

In the tests below, the DUT is a Polycom SoundPoint IP 330 running 3.1.2c software.

Configuration and DNS Lookups

The DUT was configured to perform DNS lookups for “lab2.e-c-group.com”, and configured for transport=”DNSnaptr”.

@ORIGIN lab2.e-c-group.com.
_sip._udp 600        IN        SRV 20 10 5060 lab-sd2
_sip._udp 600        IN        SRV 10 10 5060 lab-sd1
lab-sd1   600        IN        A    
lab-sd2   600        IN        A    

Fault Detection

DUT detects the fault only when it fails to receive a reply to a SIP REGISTER.

If the SBC returns a SIP 400 or SIP 503 response to DUT, DUT does not attempt lab-sd2.

Affinity for Active SBC

Every new registration request restarts on the primary SBC.

And, even after registered on the secondary SBC, lab-sd2, every new INVITE is attempted on the primary SBC.


Because every new request is attempted in the primary SBC, each new REGISTER or INVITE will cause an attempt to revert to the primary SBC.

Key Findings

Geographic Failover Should Work

The Polycom 3.x software should provide a functional failover option.

Overload-After-Recovery Risk

Polycoms running 3.x will attempt to register with a recovered SBC during the registration expiration interval. For example, if all devices are configured to re-register every 30 seconds, and the Polycoms re-attempt registration at half of the registration expiration time, then every Polycom 3.x device will attempt to register with the primary SBC, after its recovery, in the space of 15 seconds. This is likely to cause an overload on the newly-recovered SBC.

Polycom SIP Version 4

The Polycom 4.x software is available for many of Polycom’s newer SoundPoint IP Phones, such as the 550 and 670. This software provides several additional configuration options for proper support of failover.

Parameter Explanation Default Recom-
authOptimizedInFailover If set to 1, when failover occurs, the first new SIP request is sent to the server that sent the proxy authentication request.

If set to 0, when failover occurs, the first new SIP request is sent to the server with the highest priority in the server list.

0 1
onlySignalWithRegistered If 1, the phone determines if the user is registered (voIpProt.SIP.outboundProxy.failOver.RegisterOn must be enabled). 1 1
failRegistrationOn If 1, the phone will silently invalidate an existing registration at the point of failing over. 1 1
failOver.failBack.mode The mode for failover failback.


newRequests – all new requests are forwarded first to the primary server regardless of the last used server.

DNSTTL – the phone tries the primary server again after a timeout equal to the DNS TTL configured for the server that the phone is registered to.

registration – the phone tries the primary server again when the registration renewal signaling begins.


duration – the phone tries the primary server again after the time specified by …failOver.failBack.timeout expires.

newRequests DNSTTL
reRegisterOn If 1, the phone will first attempt to register with (or via) the server to which the signaling is to be diverted, and only if the registration succeeds (200 OK with valid expires) will the signaling diversion proceed with that server. 0 1

Configuration and DNS Lookups

The DUT was configured to perform DNS lookups for “lab2.e-c-group.com”, and configured for transport=”DNSnaptr”.

We tested with both TCP and UDP on the Polycom 4.x software.

TCP Configured

@ORIGIN lab2.e-c-group.com.
@         600        IN        NAPTR 50 10 "S" "SIP+D2T" "" _sip._tcp
_sip._udp 600        IN        SRV   20 10 5060             lab-sd2
_sip._udp 600        IN        SRV   10 10 5060             lab-sd1
lab-sd1   600        IN        A                  
lab-sd2   600        IN        A                  

UDP Configured

@ORIGIN lab2.e-c-group.com.
_sip._udp 600        IN        SRV 20 10 5060 lab-sd2
_sip._udp 600        IN        SRV 10 10 5060 lab-sd1
lab-sd1   600        IN        A    
lab-sd2   600        IN        A    

Fault Detection

DUT detects the fault only when it fails to receive a reply to a SIP REGISTER.

If the SBC returns a SIP 400 response to DUT, DUT does not attempt lab-sd2.

If the SBC returns a SIP 503 response, the DUT attempts to re-register with lab-sd2. But it does not stay registered properly with lab-sd2; it allows the registration to expire. In effect, when the primary SBC returns a SIP 503, the DUT stays registered only part of the time.

Affinity for Active SBC

Using the settings that we recommend in this document, DUT continues to consistently use a specific SBC until it fails, or the DNS TTL timer expires.


Using the recommendations of this document, DUT will continue to use the secondary SBC for the duration specified by DNS TTL value. After this expires, DUT will re-attempt the primary SBC.

Key Findings

Geographic Failover Should Work

The Polycom 4.x software should provide a functional failover option. Because the affinity/revert function is less aggressive than the 3.x software, the 4.x software should provide better functionality for a geographically-redundant system.

Adtran TA900E IAD

Testing began with firmware version A4.07.00E. However, due to an issue with this version of the firmware the failover SIP register command is malformed. Thus, a firmware upgrade is required to allow the Adtran DUT to support failover. The firmware was upgraded to the most recent version of R10.5.0E.

Configuration and DNS Lookups

The DUT was configured to perform DNS lookups for “lab2.e-c-group.com”. However, the Adtran IADs only support “SRV” lookup and have no support for “DNSnaptr”. The DUT does correctly prioritize the SBCs based on the DNS lookup.

Fault Detection

DUT detects the fault primarily when it fails to receive a reply to a SIP REGISTER. If the SIP Trunk setting “sip-server rollover service-unavailable-or-timeout” is set then failover on a SIP 503 response can occur for requests other than SIP REGISTER.

If the SBC returns a SIP 400 response to DUT, DUT does not attempt lab-sd2.

Affinity for Active SBC

Every new registration request restarts on the primary SBC.

And, even after registered on the secondary SBC, lab-sd2, every new INVITE is attempted on the primary SBC.


Because every new request is attempted in the primary SBC, each new REGISTER or INVITE will cause an attempt to revert to the primary SBC.

Key Findings

Geographic Failover Should Work with updated firmware

The A4.07.00E software sends malformed the failover SIP messages. However, the R10.5.0EE firmware for the Adtran IAD will provide a functional failover option.

Overload-After-Recovery Risk

Default Adtran IAD configurations will attempt to register with a recovered SBC during the registration expiration interval. For example, if all devices are configured to re-register every 30 seconds, and the IAD re-attempts registration at half of the registration expiration time, then every IAD will attempt to register with the primary SBC, after its recovery, in the space of 15 seconds. This is likely to cause an overload on the newly-recovered SBC.

Lab Testing

Test-By-Test Lab Testing records are available here. In these tests, we used the following equipment:

  • Polycom SoundPoint IP 330 0a1e
  • Polycom SoundPoint IP 601 35b2
  • Polycom SoundPoint IP 550 28ec8e
  • Adtran TA 908e
  • Acme Packet NN4250, 6.2 software, lab-sd1
  • Acme Packet NN4250, 6.2 software, lab-sd2
  • BroadWorks R18 Lab, lab2.e-c-group.com
  • Cisco Small Business SG300-10P PoE Ethernet Switch


This article based on ECG Tech Report TR-ECG15273.
Lab Testing: Matt Keathley



How do you make a display filter that filters out most RTP frames, but leaves a representative sample? Sometimes it’s convenient to see a sampling of RTP frames in Wireshark, without having to see 50 per second.


Rather then see 50 frames per second for every RTP flow, how about one frame every 5 seconds?

Wireshark display filter:

rtp[3:1]==0 or rtp.marker==1

Shows an RTP packet for each RTP stream
— about every 5 seconds
— or when the stream starts afresh

How does it work?

  • The 3rd and 4th bytes of the RTP frame are sequence number
  • The sequence number increases monotonically (40704, 40705, 40706, etc.)
  • rtp[x:y] gives the Y-number of bytes that appear at X-offset in the RTP frame, where the first byte in the packet is at 0 offset
  • rtp[3:1] gives the 1 byte that appears in the 4th byte of the frame (see the “00” in attached screenshot). This is the least-significant byte of the number.
  • Normal VoIP RTP sends 1 frame every 20 millseconds
  • Since the RTP frame is a 2-byte value, then 1 out of every 256 frames will have a least-significant-byte value of 0
  • 256 [sequence numbers] * 20 ms = 5.12 seconds
  • I’m glossing over some details in the previous two points
  • Each time a new RTP flow starts, the sender should send an RTP frame with rtp.marker==1

When you’re connecting to the rest of the world to make and receive phone calls, you have several design options available. Or, more precisely, your Voice Service Providers have many options available.

VoIP via Layer-3 VPN

In this case, a Layer-3 VPN, such as MPLS over the Voice Provider’s own equipment, is used to connect a Voice customer to the Voice service provider. Shared infrastructure is used, but the traffic from the Internet cannot route to the Voice equipment. The same physical links might carry Internet traffic as well, as shown on black hairline from the Internet Service Providers to the Customer Data Network.
In this case, a Layer-3 VPN, such as MPLS over the Voice Provider’s own equipment, is used to connect a Voice customer to the Voice service provider. Shared infrastructure is used, but the traffic from the Internet cannot route to the Voice equipment. The same physical links might carry Internet traffic as well, as shown on black hairline from the Internet Service Providers to the Customer Data Network.

MPLS is the way we see this implemented most commonly. In this case, the customer has a location and an MPLS or Ethernet VPN path back to the Voice Service Provider.


+ Protection against bad days on the Internet. E.g., if global BGP is working poorly. Or if “Cyber Warriors” in a despotic regime decide to launch a Denial of Service (DoS) attack against voice networks.

+ Usually it’s easier to prioritize traffic in a VPN.

+ The Voice Service Provider can ensure end-to-end quality of service if they want to because they own or manage all the queues (i.e., routers and switches) along the path from their equipment to the enterprise.


– You have to depend on the reliability of the MPLS path; it’s usually harder to have redundant links to the Voice service provider.

VoIP via Internet Infrastructure

The Voice Service Provider and the Customer both connect to the Internet and  exchange the IP addresses of their equipment. SIP and RTP are unencrypted. Devices accept or reject SIP calls based on the incoming IP address.
The Voice Service Provider and the Customer both connect to the Internet and exchange the IP addresses of their equipment. SIP and RTP are unencrypted. Devices accept or reject SIP calls based on the incoming IP address.

No special MPLS is used here, but we depend on the same shared routers.


+ The Voice Service Provider is also an ISP, and they manage all of the queues. So if they want to, they can provide high Quality of Service.

+ Usually there is no congestion inside Service provider networks. They can upgrade their links easily an inexpensively to get adequate capacity.

+ Potential for backup options via the public Internet.


– Bad things on the Internet might affect this; e.g., DoS attacks, or BGP malfunctions. However, within a service provider’s own network, the effects can often be mitigate.

– Sometimes harder to do QoS; not for technical reasons, but because ISPs are sometimes no good at prioritizing packets or guaranteeing bandwidth.

VoIP via VPN Over the Internet

In this model, the Voice Service Provider and the customer both connect to the Internet. A VPN, such as IPSEC VPN or permanent TLS connection, is setup across the Internet between the two parties. At least the SIP Signaling traverses the VPN.
In this model, the Voice Service Provider and the customer both connect to the Internet. A VPN, such as IPSEC VPN or permanent TLS connection, is setup across the Internet between the two parties. At least the SIP Signaling traverses the VPN.


+ Get to choose from among any ISP

+ Assuming the Customer has Internet access, there’s no construction time required to setup

+ Protection against IP address spoofing in either direction; so if you receive a SIP packet you can trust it was genuinely sent from the service provider


– No protection against unreliability on the Internet

– No quality of service can be guaranteed

– The links between the Voice service provider and the ISP may be questionable. For example, if a streaming video service, like Netflix, goes into business, certain Internet links that worked in the past may become saturated.

– IPSEC tunnels add extra complexity to the system.

VoIP via Internet

The Voice Service Provider and the Customer both connect to the Internet and  exchange the IP addresses of their equipment. SIP and RTP are unencrypted. Devices accept or reject SIP calls based on the incoming IP address. The Voice Service Provider may not have a ”Provider-Edge” router at all.
The Voice Service Provider and the Customer both connect to the Internet and exchange the IP addresses of their equipment. SIP and RTP are unencrypted. Devices accept or reject SIP calls based on the incoming IP address.
The Voice Service Provider may not have a ”Provider-Edge” router at all.


+ Get to choose from among any ISP

+ Assuming the Customer has Internet access, there’s no construction time required to setup


– No protection against unreliability on the Internet

– No quality of service can be guaranteed

– Risks of poor quality due to congestion on the Internet.

The BroadSoft BroadWorks DBS is a different animal than other BroadWorks servers, and it requires a special set of commands to keep it alive and well. The level of care and feeding required for the database reminds of BroadWorks App Server release R12 and R13; those were not happy days.

Check status of the FRA Disk Group

  • dbsctl diskinfo
  • /etc/init.d/oracleasm listdisks
    • On a healthy, normal system, this should list two entries: DATA and FRA
  • bwBackup.pl
  • srvctl status diskgroup -g FRA
  • As root: /sbin/blkid | grep asm

Installation Logs

These show a lot of what happen when installing the Oracle parts of the DBS:

  • /var/broadworks/logs/installation/oracle*
  • /u01/app/oraInventory/logs/installActions* show

Convert a standalone DBS to be a secondary DBS

  1. Clear ASM DATA and FRA disk groups
  2. Install DBS on DBS2 as a standalone primary
  3. On the primary/active DBS: use config-ssh to mesh both bwadmin and oracle accounts
  4. On the other DBS which will become the standby: sitectl convert bwCentralizedDbX
  5. On the primary/active DBS: peerctl add dbs2

Fix the kernel 2.6.18-400.1.1.el5 name mismatch for OracleASM

Upgrading to the BroadWorks DBS R20sp1 requires a kernel update. As of 2014 December 19, updating the RHEL 5 kernel installs version 2.6.18-400.1.1.el5.

However, only oracleasm-2.6.18-400.el5-2.0.5-1.el5 is available from Oracle.

However, the kernel module appears to be cross-compatible. Here’s how I moved it:

mkdir -p /lib/modules/2.6.18-400.1.1.el5/kernel/drivers/addon/oracleasm
cp /lib/modules/2.6.18-400.el5/kernel/drivers/addon/oracleasm/oracleasm.ko /lib/modules/2.6.18-400.1.1.el5/kernel/drivers/addon/oracleasm
cat /lib/modules/2.6.18-400.el5/modules.dep >> /lib/modules/2.6.18-400.1.1.el5/modules.dep

Submit your favorite tidbits in comments!