New Year, New Networking Lab

I’ve decided to start off the new year by wiping the slate clean on my lab configuration and putting together a new configuration that will allow me to work on and test out some of the technologies that are relevant to me currently.  I’ve had many lab configurations over the years, and most of them have been pretty small and focused on working out a particular problem, or for prepping for the exam of the month.

This iteration of my lab will be a little different, in that I am aiming to mock-up, as close as I can with the resources at hand, an enterprise network complete with the traditional network layers, a data center, a dmz, WAN connections to remote offices, DMVPN over the Internet, remote access VPN, etc.  The end-goal is quite large, and it will take some time to completely get it up and running, but it will provide me a testbed for working with many aspects of enterprise networking.

As I work through the setup, I’ll be posting entries on progress, and specific configurations and tests I’ve completed.  Please leave any comments or suggestions for things to try or test out.

Goals for the Lab

Here is a short sampling of things I’m looking forward to setting up in the lab.

  • Cisco AnyConnect 3.0
  • Dynamic Access Policies
  • Secure Mobile Device Access – iPad, Laptop, Android, etc
  • 802.1x
  • MACSec
  • CiscoWorks LMS 4.0, Cisco Security Manager
  • DMVPN WAN Backup
  • Latest IOS Versions (ASA 8.3/8.4, IOS 15.x)
  • Cisco Office Extend Access Point (OEAP)
  • Cisco CleanAir
  • Anything else I can get my hands on

Lab Diagram – Draft

Here is the network diagram I put together for what I’m looking to create in the lab.  It isn’t complete, or fully detailed, but it does provide a good representation of what I’m working on.  I’ll also be using it as a working draft and will update it as the lab comes together.

Wireless Networking, where more is not always better

I was sent out to help a new client with a wireless networking problem that had been progressively getting worse, and which their efforts to resolve where failing short.

The problem network was covering a large warehouse in which Symbol scanners were used by the employees to process their product and shipments. This company had recently been purchased/merged with another company, and as part of the merger, the wireless network was completely changed over from about 12 Cisco Autonomous Access Points to a similar number of Cisco APs connected to a Wireless LAN Controller.

Shortly after making the change in gear, the workers began experiencing a much higher number of scanner disconnects and delays. The internal networking team attacked the problem in a typical way by doubling the number of access points, and raising the power level of all APs to maximum. After making these changes, the problems didn’t improve, and may have actually gotten worse. They then reached out for some external assistance.

My first visit to the site was a quick one to gather some information and do a quick walk through of the space to familiarize myself with the network. Upon deeper inspection, I began to notice that the access points reported significant channel interference measurements, and that the suggested power levels were much lower than the hard set maximum configured. I scheduled another site visit to perform some more in-depth RF analysis of the areas.

What I found during the next visit was in any one area in the warehouse, my scanner picked up very solid signal strength from anywhere between 4 and 8 access points. Being that this network was supporting older 802.11b scanners, this meant that there was significant co-channel interference almost everywhere, as well as the potential for client confusion with so many “good” choices for access points to connect to.

With this information, I suggested re-enabling the automatic power level control (RRM and TPC) available on Cisco’s Wireless LAN Controller. Shortly after making this change, as well as some other best practice adjustments, wireless scans in the area looked much better, and instances of client disconnects dropped to nearly none.

What did I learn…

This is a great example of how in wireless networking, more and stronger can actually have a significant negative impact on network performance. Though intuitively adding access points and raising the power levels would seem to be a good idea, either of these choice can actually cause significant adverse affects to the network.

Key to a healthy wireless network is a good site survey and RF analysis. Today, this is most easily accomplished by using the intelligence built into the wireless control systems. Both Cisco’s RRM and Aruba’s ARM features can make both channel and power level assignments and adjustments very simple.

Cross Vendor Troubleshooting and Bug Finding

I am currently installing a new network core/backbone and campus wide wireless infrastructure at one of my customers.  Within this network and project there are three main technology vendors with equipment in the network that all need to talk with each other.  The new wireless network uses Aruba gear, the new core and backbone is built on HP Procurve, and at the edge of the network are Cisco switches that have been in place for several years.

This network isn’t particularly complicated, but whenever more than one vendor is supply gear, the design must make use of standard protocols for all communications.  With this in mind, the design called for using LACP to build the link trunks between gear.  LACP is defined in 802.3ad and has been around for about a decade and is supported by most vendors and platforms.

My expectation was that implementing this technology would be breeze and a quick check off on the to-do list, I was sadly mistaken.

First, the Aruba to HP trunk…

This trunk is made up of two 10 gigabit fiber connections from a 6000 series Aruba controller to an HP E5400 series switch.  To start off, I only had one connection up and linked.  If you’ve never configured an LACP trunk on a procurve, there isn’t much to the configuration, “trunk B3-B4 trk1 LACP”.  The Aruba uses a very cisco-esq interface subcommand “lacp group 1 mode active”.  Fairly straightforward configuration, and initially all was looking pretty good.

Checking status on the HP revealed

PORT   LACP      TRUNK     PORT      LACP      LACP
----   -------   -------   -------   -------   -------
B3     Active    Trk1      Up        Yes       Success
B4     Active    Trk1      Down      No        Success

but directly after sending some traffic it changed to

PORT   LACP      TRUNK     PORT      LACP      LACP
----   -------   -------   -------   -------   -------
B3     Active    Trk1      Blocked   Yes       Failure
B4     Active    Trk1      Down      No        Success

A blocked connection is never a good thing, and traffic stopped completely.  After double, and triple checking the configurations I opened up a support case with both Aruba and HP Procurve.  While troubleshooting, we tried building the trunk as a protocol-less trunk and were successful with that method, however we were never able to get the LACP trunk up and running.  Though the first goal is to make it work, I still wanted to know why LACP wasn’t working.  In digging deeper on the Aruba, I was found that the Aruba wasn’t receiving any LACPDUs on the link.

LACP Counter Table
Port     LACPDUTx  LACPDURx  MrkrTx  MrkrRx  MrkrRspTx  MrkrRspRx  ErrPktRx
----     --------  --------  ------  ------  ---------  ---------  --------
XG 0/10  12        0         0       0       0          0          0
XG 0/11  0         0         0       0       0          0          0

Though interesting, even with HP and Aruba’s assistance we weren’t able to get the LACP based trunk operational and have left it configured as a simple aggregate link/port-channel.

And the Cisco to HP Trunk…

This network makes use of two separate trunks to connect end to end.  The network looks something like this:

[HP Procuve] ========= [Cisco Catalyst] ========= [HP Procurve]

Between each pair of switches are two gigabit fiber connections.  For the initial configuration here, I dug right in and configured the LCAP trunk on the HPs as above, and used the command “channel-group 2 mode active” on the Cisco switch.  The status on the switches immediately showed something amiss though.

The HP showed

 PORT   LACP      TRUNK     PORT      LACP      LACP
 ----   -------   -------   -------   -------   -------
 21     Active    Trk1      Up        No        Success

And the Cisco logged a message of

1w2d: %EC-5-L3DONTBNDL2: Gi0/24 suspended: LACP currently not enabled on the remote port.

Clearly neither of the switches are successfully seeing each other in this configuration.  Having already opened a ticket with HP related the trunking with the Aruba, I added this problem to the same case.  I much time gathering logs and details and sending them onwards, but eventually Google pointed me in the right direction for this one.

Cisco Bug CSCsh97848 was the culprit here.  The jist of the problem is that though LACP is supposed to use the configured native VLAN for control traffic to build and maintain the link, Cisco switches running code 12.2 only allow vlan 1 to be used as the native vlan across an LACP trunk.  Once I reconfigured the link to use vlan 1 as the native on both sides, the LACP trunk came right up.

What did I learn…

Though definitely not the first time I’ve had this experience, it was another case of Google being one of the best troubleshooting tools out there.  Though I had cases opened with two of the vendors, the resolution ended up coming from my own efforts at running down the problem.  Though to be honest, at least for the HP to Cisco link, had I been able to open a case with Cisco TAC, I expect they would have quickly identified the troublesome bug.

Though I rarely have pushed a single vendor solution for the sake of being single vendor, I can see some truth to the adage “one throat to choke” in this case. Because the problem was with the interoperability between vendor gear, I found this troubleshooting process to be a little slower and I did have a few instances where the vendors seemed to be pointing fingers at each other.  Overall though, I was very pleased with the way the Aruba and HP engineers worked together at sharing information and attempting to resolve the problem.  It would have been even better had the LACP problem between the HP and Aruba devices been resolved.

And lastly, even when using standard protocols, there can be problems and differences in implementation and features.  Assuming that just because the same standard is listed on separate feature lists doesn’t necessarily mean they will work together when connected.