Fun in the Lab: Troubleshooting DMVPN Per-Tunnel QoS

Fun in the Lab: Troubleshooting DMVPN Per-Tunnel QoS

In Playing in the Lab: DMVPN and Per-Tunnel QoS  we looked at DMVPN per-tunnel QoS. We looked at how to configure it…. how the vendor private extensions in RFC2332 are used in the NHRP registration request… and how to see, via show dmvpn detail which DMVPN spokes have per-tunnel QoS applied to them.  If you have not yet read Playing in the Lab: DMVPN and Per-Tunnel QoS   I would advise you do so prior to taking it to the next level with troubleshooting.

Ready to play some more?  We are going to have a pretend trouble ticket we have to try to solve.  🙂  I’ll take you along for the ride.  But let’s review together the big picture view of the issue DMVPN per-tunnel QoS is trying address.

CPOC_IWAN_Pod1_OnePoP_MPLS_ONLY3

As we see in the diagram above, the Hub site has a much bigger pipe than the spokes.  Obviously, if the Hub site WAN router isn’t aware of the bandwidth limitations for what the remote branch can receive from its service provider…. this can be a problem.

But this really isn’t anything new or different to us in IT. This has been the case for years.  But back in those days (e.g. Frame-Relay) we typically had subinterfaces up on the Hub WAN router connecting to the spokes.  On those subinterfaces we would configure some type of traffic shaping or rate limiting QoS policy.

Now that you have read Playing in the Lab: DMVPN and Per-Tunnel QoS   you see the how to configure DMVPN per-tunnel QoS and how to verify in show dmvpn detail that the per-tunnel QoS is applied to a spoke.  But what about troubleshooting it?  What if someone called up and said that Branch1 is having intermittent issues with traffic coming from the hub site?  How would you check?  What is the DMVPN per-tunnel QoS equivalent to Frame-Relay’s show policy-map interface Serial x/y.subint command?


 

PLAYTIME!!!!!!

CPOC_IWAN_Pod1_OnePoP_MPLS_ONLY3

  • 3 branches: All 3 branch routers an ISR 4xxx flavor running IOS XE code 3.16.1
  • Hub: ASR-1002x also running IOS XE code 3.16.1
  • MPLS Bandwidth
    • Headend: 10Mbps for both Tx and Rx
    • Branches: all 3 are 1.5Mbps for both Tx and Rx

Trouble ticket comes in – Branch1 is complaining that they have been having intermittent issues recently. Across varying applications at the Branch location. AND they just had another one.

Facts:

  • There is no 2nd link at any of the remote branch locations.  There is only the MPLS
  • All traffic is between the Hub and the Spokes.  No traffic is spoke to spoke.
  • Branch2 and Branch3 are not complaining about issues
  • Branch1 just hired a bunch of more employees.  AND there is a MS Office 365 pilot program going on at that location.

So, given these facts let’s go to the DMVPN head end.

Question: Are we, on the DMVPN head end router, doing per-tunnel QoS to Branch1?

Answer: Yes.

answer1

Question: Is this policy the right policy to apply for this branch location?

Answer: Yes.  It is the IT template nested QoS policy that was created for all the 1.5Mbps attached branch sites.

answer2

Question: Does the head end show we are dropping anything in the QoS policy towards Branch1 in the EF class?


Before I Answer That: 

With frame-relay you just needed to make sure you used the full serial subinterface (show policy-map interface Serial x/y.subint) for that branch location. But DMVPN doesn’t have a subinterface number. Wait… so how do we even apply different tunnels to the different branches.

Answer me this – If the hub were to send unidirectional traffic bound for 10.1.x.y (Branch 1 subnet) would it also show up at Branch2 and Branch3?  Absolutely not. So clearly there must be some IDB (interface description block) type construct that we use to just send traffic destined for 10.1.x.y to Branch1 and only Branch1 despite Branch2 and Branch3 also being accessible out that Tunnel 10 interface at the DMVPN hub.  So if the IDB contruct thingy is already there… why not apply the QoS to that branch right there?  Well, that’s what we do.

What is the command for seeing the QoS to just one branch?  show policy-map multipoint Tunnel {#} {tunnel destination transport address}


For those who now forgot the question.  🙂   Ooooo…look… a pretty shiny squirrel. Oh… right

Question: Does the head end show we are dropping anything in the QoS policy towards Branch1 in the EF class?

Answer:  Hmmmm…. “dropping” as in present tense?  No.  But “have dropped” yes.  The current drop rate is 0 and the drop counters are not increasing.

answer3_morea2

As we can see from the <snip> in the pic above there is more to this output.  Let’s jump down a little and see where the drops look like they were happening at.  Looks like the drops are ONLY happening in our business critical traffic – voice dscp EF.

answer3b

It is always hard to find intermittentnot happening when you look at it type of loss.  So where do we go from here?

Two options

  • Wait for it to happen again and troubleshoot then
  • Go back in time with a tool that can tell you what applications and flows were at what BW and bit rate going to Branch1 with a DSCP of EF.

In my lab, I happen to have something called LiveAction set up and running.  I like it and know how to get around easily on it… so I’ll go see what it saw. It keeps historical information about applications and flows and I find that really quite useful when the problem is not happening right now.  There are other tools out there and I haven’t (honestly) played with them much.  A friend happened to know LiveAction really well… I find it super easy.  So voila.  I use LiveAction.

What really matters (my 2cents) at times like this is to be doing more than just waiting for the failure to happen again.

Okay… off to LiveAction.

First I have a few filters I already have created.  The one I want to use is the filter called EF_Hub_to_Branch1.  But I want to modify it for this.  So I’ll change the Match IP to a match on destination only and sourced from ANY.

LA_filter

Now I want to go to reports within Application.

LA_application1

Where were the drops we saw on the per-tunnel QoS?

Source: iwan-pop1-mpls, Tunnel10, outbound

6 hours: I select past 6 hours cause the trouble ticket was opened less than 2 hours ago and I want a “broad” view first.

Bit Rate: I want to get the report via bit rate cause I’m thinking something was exceeding the threshold we had set for EF and I want that to “pop” easily on the screen.

Lastly… I apply the filter EF_Hub_to_Branch1 that I modified which should match on any flow from any source that is destined to a 10.1.0.0/16 IP address and is marked with DSCP EF.

Click execute report and…. well…. ewwwwwwwwwww.  Yea… well that peak bit rate of 518Kbps is NOT going to be making it thru our per-tunnel QoS policy.

LA_application2

Have you solved the crime and found your “who done it?” Only time will tell.  But you for sure have found a very likely suspect that is at the very least responsible for some of the drops we are saw in the DMVPN per tunnel QoS show results.  And very likely a contributing factor to the most recent impact the people at Branch1 were feeling.

What now? Well now I would start what I call “questioning the suspect”.  Most tools that can show you something like the graph above would also (I assume) allow you (like Live Action) to drill down deeper and get more info about the flows themselves.  You will then have more details about the IP addresses of the varying suspects and you can go start your questioning.

🙂