Savvius Blog

The Savvius Network Analysis and Monitoring Blog covers enterprise networking news from recent standards, such as 802.11n, and upcoming technologies, such as 100G, to pressing everyday issues around wireless, VoIP, security, and network performance management.

How-To: Kibana Plugins on Savvius Insight

I am on the team that develops the Savvius Insight Appliance. During the development of Insight 1.0, I helped to define the value proposition, and the use cases for Insight, from there I helped to make choices about the hardware, the analysis, and the reporting. For the reporting we went with Splunk, and I designed the dashboards. For Insight 1.0 we decided to put a Splunk Forwarder on the appliance, but left it up to the user to provide their own Splunk Server.

We recently released Insight 2.0, which mainly added the ELK stack for long term reporting. This is great because with ELK on the Insight appliance, there is a full blown reporting solution built right in. In fact, the default mode for Insight 2.0 is to start capturing network traffic on the inline bridge ports, do the network flow and application analysis, output the analysis to CSV, feed the analysis through Logstash and into Elasticsearch, and make the analysis available through the Kibana dashboards, which if I do say so myself, look pretty nice. Going with the dark theme, was definitely the right choice.

Insight Dash

Our analysis is mostly flow based, with DPI allowing for application (layer 7) analysis. This provides metrics on applications above HTTP, like Salesforce, Google, CNN, Amazon, etc… This allows for a distinction between applications that are critical, and those that are not, and even those that are not allowed.

We also generate expert events on network behavior that may be the cause of network security and performance issues. All of this analysis is written to CSV files at a 1-minute interval. The interval can be changed by the user, but 1-minute is a reasonable default, providing the right balance of performance, history, and granularity for most network monitoring use cases.

But I digress. What I really want to talk about at the moment is our data, and what continues to amaze me about how it can be visualized in Kibana. The first step to this is knowing the data. During Insight 2.0, I thought I knew what data was going into CSV, and being picked up by Logstash pretty well. And I did know it well enough to put together a fairly rich set of dashboards. But in Insight 2.0 I was limited to the built-in visualizations, which limited my thinking about the data.

Since 2.0, I have been looking at Kibana plugins, which have really opened my eyes about different ways to visualize our data. And the great thing about these plugins, and really the reason I am writing this, is that they can be installed directly into an Insight 2.0 appliance, and used to create new and exciting dashboards and visualizations. And if you have Insight 1.0, the hardware for Insight 2.0 did not change, so you can easily upgrade to Insight 2.0 by going to the web config page. If your Insight has access to the Internet, it will inform you that an upgrade is available, and provide a button to push. If your Insight is not on the internet, there are easy instructions on the Insight portal to download the latest version, and upgrade the device.

Back to the Kibana plugins. There are two kinds of plugins that I have been experimenting with on Insight. One type, is an application plugin that has its own UI. Examples of these are Sense, Timelion, and Graph. These plugins cannot be used to create visualizations in a dashboard, but can be used to ask interesting multi-dimensional questions about your data, visualize the result in ways that look amazing, and may also give you some major insight about the behavior or your network. The other type of plugin adds visualizations that can be mapped to your data and added to dashboards. Some of these include Timeline, Sankey, and HTML. I even wrote my own, from instructions of course, that puts a real-time clock into my dashboard. I look forward to writing more of these type of plugins.

Now I am going to talk a little about the plugins I have played with, and the data I used in them. I recommend that you make the most of your Insight device and add these plugins as well. But before you add any of these plugins, you have to enable PERSIST, so that when you reboot, they will still be there. To enable persist, just open the /boot/grub/menu.lst file, and add the word PERSIST to the end of the kernel line, and reboot the device. Also, a word of caution. Installing Kibana plugins does require that you SSH into your Insight device, and run some commands, so you have to know at least some basics about things like Putty and Linux.

A list of both types of plugins I mentioned can be found on github: https://github.com/elastic/kibana/wiki/Known-Plugins. There are lots of others out there as well, and I suspect we will be seeing many more in the near future. So far my favorite visualization plugin is Swimlane. Swimlane was easy to install and apply to our network analysis on Savvius Insight. Below is a screenshot of the Swimlane visualization applied to application best response times.

Looks nice, right? And clearly Dropbox has the worst response time. But how did I create this visualization and map the application response times to it? Well, first I have to know what data is available. To understand that, I can go to the Kibana Discover tab and explore the data. The Savvius data is separated into different types that are prefixed with sv_. For application data, there is a type called sv_expert_apps. If you type ‘type:sv_expert_apps’ into the filter field, you will only see events of this type. You can then open one and see the available fields. For my Swimlanes visual, I just need the Name and Response Time fields. The available response time fields are Best Response Time, Worst Response Time, and Average Response Time.  Since we have filtered the events, let’s go ahead and save it as a search.  To do this, select the Save Search icon in the upper right and give it a name. Mine is called Expert Apps.

Now that we understand the data a bit and have saved a search, let’s head over to the Kibana Visualization tab. I have a couple of monitors, so I usually leave a browser open to the Discover tab showing my data fields, and open a separate browser window on another monitor to create or edit a visualization.  If you have already installed Swimlane, you should see it as a visualization choice in the Visualizations tab.

Select Swimlane, choose “From a saved search” from the “Select a Search Source” window, and select “Expert Apps” from the list of searches, if that is what you called your search for “type:sv_expert_apps”. In the visualization editor, select the Aggregation in the metrics section, which can be any one of the options provided in the pulldown menu. Having said that, Count does not make much sense.

In the Field pulldown, select any of the Response Time fields. And actually, it can be any of the number fields that are in the sv_expert_apps events.

In the buckets section, select Terms from the Aggregation pulldown menu, and Name.raw from the Field pulldown menu. You can also change the number of entries to display and whether they are displayed in Ascending or Descending order. Descending usually makes the most sense.

We are almost there. In the Time field section, use the defaults, which should be Sub Aggregation: Date Histogram, Field: @timestamp, and Interval: Auto.

In any of the sections, you can add a Custom Label and use the Advanced JSON Input to perform further calculations on the displayed data.

Finally, click the Green Arrow at the top. You should see the Swimlane visualization showing some number of application response times over time. Some visualizations have Options. In the Swimlane visualization, you can change the thresholds, or the color that will be displayed at different value ranges.

Now save that visualization, and either add it to an existing dashboard, or create a new one for it. I created a new dashboard, and added separate visualizations for Best, Worst, and Average Response Times. If you want to make your new dashboard easily accessible from the other dashboards, edit the Dashboards panel, and add it right in.

Well, I hope that was as fun and interesting for you as it was for me. I hope it gave you an idea about the power of knowing your data, and trying different Kibana plugins to visualize it. In my next write-up, I will show you how to make really great looking and insightful network graphs with the Kibana Graph plugin. Here is the teaser:

 

Screen 8

Written By :

Chris Bloom, Technology Evangelist, at Savvius

facebooktwitterlinkedinfacebooktwitterlinkedin

How to Analyze Microbursts with Savvius Omnipeek.

A microburst in nature is a localized column of sinking air (downdraft) within a thunderstorm, usually no more than 2.5 miles in diameter, and typically a lot less. Microbursts can cause extensive damage at the surface, and in some instances, can be life-threatening.

In computer networks, a microburst is defined as a short-term burst of traffic, typically lasting only milliseconds, which saturates the link (Ethernet, Gigabit, 10 Gigabit, etc.). A microburst is a serious network concern, since even short-term network saturation means some users are blocked during the period of saturation. Since the de-facto industry standard for the measurement of network utilization is bit per second (bps), microbursts often go undetected since they get averaged out over a second. In most cases, network monitoring systems don´t alert on the saturation because it doesn’t exist over a full second. End-user experience can range from nothing, if enough network traffic is buffered, to performance bottlenecks caused by slower throughput or, worse yet, connection drops.

In order to identify a microburst, precise measurement of the network traffic on a link at microsecond granularity, along with at least millisecond visualization is required. Here’s a real-world example of how to identify a microburst.

In this example, the measurement point is at a TAP inserted into the 10 Gbps link of a data center connection. We measured 45 seconds of network traffic using a Savvius Omnipliance TL. The Expert system of Omnipeek immediately alerts on irregularities on OSI layers 2 to 7. These alerts can be sorted based on any of the available columns, including by count, layer, etc. In this case we sort by count and see TCP retransmissions, “Non Responsive” peer alerts, slow acknowledgements, etc.

blog1

Picture 1: Omnipeek Expert system with flows categorized by protocols/applications and Expert events sorted by number of occurrences.

blog2

Picture 2: Graph of overall utilization at one second granularity along with top applications.

When network utilization is graphed using the typical bps as in Picture 2, the maximum full duplex peak is 2.54 Gbps – nothing to worry about on a 10 Gbps link with a full duplex capacity of 20 Gbps (send and receive – 10 Gbps in each direction).

One thing we notice in the Compass Expert Events summary is that there are a fairly large number of events relating to slow network issues, especially for a 45 second capture. Compass can graph the occurrence of Expert events in time, and by doing so it is clear that there is a similarity in the gradient between the Expert events and the overall network utilization:

blog3

Picture 3: Omnipeek´s Compass feature can graph the occurrence of Expert events over time.

Since the number of slow network events is large, let us go back to the utilization graph and investigate the spikes more closely. We can drill down deeper and see millisecond granularity, and at this granularity we see multiple spikes up to 9.845 Mbit per millisecond. Transferred to seconds (simply multiplied with 1000), this would be 9.845 Gbps, which, if it happens in one direction, is filling up our 10 Gig link completely.

blog4

Picture 4: Network utilization in millisecond granularity with multiple spikes close to 10 Mbit per millisecond.

Interestingly, in Picture 4 the top protocol has changed to CIFS. So what happened?

 

blog5

Picture 5: The usual utilization with TCP traffic is purple; CIFS spikes are brown.

With normal utilization of up to 6 Mbit per millisecond of TCP traffic, CIFS spikes of up to 6 Mbit per millisecond push utilization to 12 Mbit per millisecond, and this simply exceeds the capacity for one direction of a 10Gbps link. At this point switches are not capable of buffering the traffic until the bursts are gone causing packets to drop and ultimately causing TCP retransmissions, as the Expert events show.

Savvius Omnipeek provides a very intuitive and cost efficient way to verify if microbursts occur in your network, and when, where, and how network performance is suffering. To start a free            30-day trial of Omnipeek today visit us here.

Written by: Matthias Lichtenegger

facebooktwitterlinkedinfacebooktwitterlinkedin

Contact Us Savvius Blog Follow Savvius on Twitter Like Savvius on Facebook Follow Savvius on LinkedIn Follow Savvius on YouTube Follow Savvius on Slideshare