Today, IT providers have to maintain and update an infrastructure that is capable of carrying both personal and corporate traffic of unparalleled quantities and simultaneously monitor the composition of this traffic in order to provide both reliable service and an understanding of the underlying traffic patterns.
Open Systems finds itself in this demanding position. As a provider of managed network security services it maintains close to 4000 hosts spread across the global network. These mainly reside in networks of multinational corporations and are operated and maintained around-the-clock in more than 175 countries. From an operational point of view, the transported network traffic has great intrinsic informative value:
- a targeted traffic analysis enables the identification of network bottlenecks
- routing/configuration errors can be detected by examining traffic flows
- observed clients, protocols and applications provide a clearer understanding of what is and has been exchanged by whom in the administered network.
- provide reporting after the fact: identify disallowed traffic to and/or from possibly malicious endpoints
Break it Down
Traffic metadata has to be continuously acquired. Several hosts at central locations forward billions of network packets on a daily basis which renders a per-packet analysis of the traffic infeasible. Raw data contained in the packets must be captured, filtered and broken down to key descriptors that yield a condensed explanation of the underlying data. These are consolidated in flows, which need to be stored on disk to allow for historical analyses.
Scale with Grace
Many network monitoring solutions rely on central collectors to which traffic metadata is forwarded. In Open Systems' case, collecting metadata from 4000 hosts to a central location would strain links and increase the chance of losing metadata in transit. Instead, we store the data locally, that is on the device which forwarded and captured the traffic. The same applies to queries requesting the data.
The existing system is deployed and run on every Linux based host maintained by Open Systems.
goProbe - capture
goProbe continually runs as a background process and captures packets using libpcap and gopacket. From these packets, several attributes are extracted which are used to classify the packet in a flow-like data structure:
- Source and Destination IP (sip, dip)
- IP Protocol (proto)
- Destination Port (if available) (dport)
- Application Layer Protocol (if available) (l7proto)
Available flow counters are:
- Bytes sent and received
- Packets sent and received
It runs as a single process, capturing on each configured interface in a separate thread-like routine (goroutine). Every five minutes, the routines are issued to write out their flow data to disk.
goDB - store
No operator wants to wait forever to query and analyze traffic data. Flows need to be stored efficiently to enable swift anlayses of the data upon request. To achieve this goal, we designed a small database tailored to the flow data produced by goProbe -- goDB. Its major design decisions were:
- Improve IO performance: load only what's actually needed facilitated through time-based partitioning of the data in a columnar arrangement. This allows us to pre-select only those columns which are relevant for the query: if sip and dip are involved in a query, only the sip and dip column files are loaded.
- Save space and load efficiently: flow data for the last three months can take up considerable disk space when written to disk. To salvage this, every 5-minute block of flow data is compressed before it is written. This directly influences data reading performance, which again positively impacts query performance. The trade-off is clear: load as little as possible from (a usually slow) disk and sacrifice some CPU to (quickly) decompress the data in memory.
- Enable concurrent processing of the data: today's systems are capable of parallell processing. Make sure that the data is partitioned in a way that subsets can be independently processed. Block-wise storage of the flow data enables query processing in a Map-Reduce fashion.
goQuery - analyze
goQuery is the front end query tool used to extract and analyze aggregated flow information for a specific period of time. The tool interfaces with goDB and performs aggregations across the provided flow attributes (for example grouping by sip and dip).
Parallell processing is achieved on block level. Each worker grabs a five minute flow block and performs the appropriate aggregations on it. A central merger routine takes the results from the individual workers and aggregates them into a final result, which is later printed for the analyst.
The query front end supports time ranges, complex conditionals on the data (e.g. drill-down into the data set), sorting of results and exporting into both csv and json.
The result is a concise, flipping fast overview of what happened recently in your network:
goquery -i eth0 -f "17.12.2015 09:33" -c "(dport=53 and proto=UDP)" -n 5 sip,dip,proto,dport sip dip proto dport packets % data vol. % 188.8.131.52 184.108.40.206 UDP 53 76.00 21.05 14.63 kB 17.38 220.127.116.11 18.104.22.168 UDP 53 46.00 12.74 14.29 kB 16.96 22.214.171.124 126.96.36.199 UDP 53 48.00 13.30 14.00 kB 16.63 188.8.131.52 184.108.40.206 UDP 53 32.00 8.86 9.83 kB 11.67 220.127.116.11 18.104.22.168 UDP 53 16.00 4.43 4.97 kB 5.90 ... ... Total traffic 361.00 84.21 kB Timespan / Interface : [2015-12-17 09:33:42, 2015-12-23 11:46:52] / eth0 Sorted by : accumulated data volume (sent and received) Query stats : 21 hits in 13ms Conditions : (dport=53&proto=udp)
As the components' names not so subtlely suggest, the system is written in go. This is not coincidental as go includes concurrency built-ins on which goProbe and goQuery heavily rely. The language allowed rapid prototyping and enabled us to write a complex, concurrent system with relatively few lines of code.
goProbe and goQuery are 100% open source. Go git it at open-ch/goProbe.
This system has been engineered in order to collect required information for a master's thesis in coorporation with the Distributed Computing Group at ETH Zurich. The resulting paper of Lennart Elsen, Fabian Kohn, Christian Decker and Roger Wattenhofer can be found here: goProbe: a Scalable Distributed Network Monitoring Solution. It is based on Lennart's thesis Optimized Distributed Network Traffic Monitoring in Global Networks.