Contagent

A package of different web server responses

Dedicated to high quality security solutions, Open Systems constantly seeks out ways to further improve protection against unexpected effects a change of a component might have on the system. Whether that change is a result of an innovation from Labs or of a version upgrade of one of our carefully selected third-party components, we want to be as certain as we can, that the functionality of our products is always guaranteed without restrictions and their security is impeccable at any point in time.

One tool that has resulted from that effort and proved handy when checking a system's manner of handling different web server responses, is Contagent.

General Information

Contagent Overview

Contagent offers persistent responses to the same requests and is totally under your control as you may install it on your own host running Linux. It serves different HTTP responses that are like those you might get from broken or misconfigured servers on the internet, e.g. response headers containing null bytes or very long ones. But it also lets you host servers with insecure certificates of all kinds, e.g. expired ones. On top of that files for testing archive or media type handling and malware detection are hosted on a dedicated server.

Functionality Overview

Testing your client/proxy using Contagent provides you with answers to questions of the following kind:

  • Are expired certificates blocked?
  • Are expired certificates allowed when on the certificate whitelist?
  • Is it possible to download malware through the proxy?
  • What about executables?
  • Or zip files that are compressed recursively 300 times?

As we realized that this server functionality is something which could be useful in many different scenarios, we have decided to make it open source and publish it on github, available for you to clone/download (https://github.com/open-ch/contagent).

How we use Contagent

Contagent Overview

Contagent was developed as a part of a testing environment for the Web Proxy service and is tightly integrated in our automatic testing framework. In addition to our unit tests, every code change triggers an extensive set of end-to-end tests that make use of Contagent’s servers.

How to use Contagent yourself

After installing nginx (the web server), cfssl (for certificate generation), and jq (neat JSON handling tool) on your Linux machine, it should meet all the requirements needed to install Contagent - just clone or download it from github (https://github.com/open-ch/contagent), run the install.sh script, include the freshly installed server configuration in the nginx config file, and you are ready to go.

Ready to perform requests to the server and see if the responses are as expected. You might want to test the behaviour of your sophisticated multi-level intermediate infrastructure or just the response handling done by your web browser. That's up to you now.

For more information on what Contagent has to offer, feel free to have a look at the README in the repository.

Network Traffic Analysis with goQuery

How to make sense of goProbe's network traffic metadata

In a previous post, we introduced our network monitoring infrastructure goProbe/goQuery. On nearly 4000 hosts, the tool captures and stores metadata of packets flowing through global networks. But what good is all this data if you cannot make sense of it?

The true value of these records lies in what they tell you about your network in general and how its being used. To this end, we harness the capabilities of goQuery. They have greatly evolved with the latest iteration (v1.05 to v2.0), aiming to provide valuable insights "at a glance". This post will walk you through the most relevant steps and commands used in an in-depth traffic analysis.

For which interface do I have data?

goquery -list

      Iface    # of flows      Traffic                   From                  Until
  ---------    ----------    ---------    -------------------    -------------------
       eth0      276.64 k      1.92 GB    2016-02-04 10:11:18    2016-03-24 10:49:06
       eth1       14.49 k    263.64 MB    2016-02-04 10:11:18    2016-03-24 10:49:06
       eth2       14.49 k    263.64 MB    2016-02-04 10:11:18    2016-03-24 10:49:06
       eth3       14.49 k    353.02 MB    2016-02-04 10:11:18    2016-03-24 10:49:06
       eth4        0.00        0.00  B    2016-02-04 10:11:18    2016-03-24 10:49:06
       eth5       14.31 k    280.23 MB    2016-02-04 10:11:18    2016-03-24 10:49:06
   tun4_121       18.56 k     31.26 MB    2016-02-04 10:11:18    2016-03-07 15:28:38
   tun4_124       28.24 k     47.42 MB    2016-02-04 10:11:18    2016-03-24 10:49:06
   tun4_125       28.24 k     47.42 MB    2016-02-04 10:11:18    2016-03-24 10:49:06
   tun4_127       28.24 k     47.42 MB    2016-02-04 10:11:18    2016-03-24 10:49:06
   tun4_129       14.12 k     21.54 MB    2016-02-04 10:11:18    2016-03-24 10:49:06
   tun4_134        4.84 k      9.54 MB    2016-03-07 15:33:25    2016-03-24 10:49:06

      Total      456.67 k      3.25 GB

Find out on which interfaces goProbe collected metadata. The number of collected flows and the total traffic volume is indicated together with the data collection time frame.

Dive into the Data

Consider the following command:

goquery -i eth1 -f -1d -c 'snet=10.0.0.0/8' -n 5 sip,dip,dport,proto

Explanation:

  • -i eth1: I am interested in traffic that went over eth1...
  • -f -1d: ...in the last 24 hours (-f: first occurrence)...
  • -c "snet=...": ...and originated from my private IP range.
  • -n 5: Only show me the top 5 results...
  • sip,dip,dport,proto: ...and group them by source, destination IP, destination port and IP protocol.
goquery -i eth1 -f -1d -c 'snet=10.0.0.0/8' -n 5 sip,dip,dport,proto

                                     packets   packets            bytes     bytes
       sip       dip  dport  proto        in       out      %        in       out      %
  10.0.0.2  10.0.0.1      0    ESP   17.05 k   17.05 k  97.40   2.61 MB   2.62 MB  97.31
  10.0.0.1  10.0.0.2    500    ESP  439.00    440.00     2.51  68.88 kB  69.16 kB   2.51
  10.0.0.1  10.0.0.2    500    UDP   10.00     10.00     0.06   2.70 kB   3.46 kB   0.11
  10.0.0.2  10.0.0.1    500    UDP    6.00      6.00     0.03   2.00 kB   1.68 kB   0.07

                                     17.50 k   17.50 k          2.68 MB   2.69 MB

   Totals:                                     35.01 k                    5.37 MB

Timespan / Interface : [2016-03-23 11:12:57, 2016-03-24 11:14:06] / eth1
Sorted by            : accumulated data volume (sent and received)
Query stats          : 4.00   hits in 5ms

There's a lot going on here. We see that communication is exclusively restricted to two endpoints (10.0.0.{1,2}) and that the traffic completely comprises IPSec (VPN traffic): destination port 500 for key negotiation (IKE) and no destination port for the Encapsulating Security Payload (dport 0 and protocol ESP), e.g. the actual encrypted VPN traffic. The traffic totals also show that IKE only comprises a small portion of the exchanged packets.

You will notice that traffic volume/packets are now split into incoming and outgoing. This immediately points out whether the traffic was bi-directional or not. This in turn can help you to identify whether traffic was blocked by either your host or a particular endpoint.

Find the needle in the haystack

One of the most important parameters for targeted analyses is -c used for conditions on the data. goQuery supports a complete set of logical- and comparative-operators to allow you to drill down on the data to extract a very specific set of flows observed in the indicated time frame:

Base    Description            Other representations

COMPARATIVE
   =    equal to               eq, -eq, equals, ==, ===
  !=    not equal to           neq, -neq, ne, -ne
  <=    less or equal to       le, -le, leq, -leq
  >=    greater or equal to    ge, -ge, geq, -geq
   <    less than              less, l, -l, lt, -lt
   >    greater than           greater, g, -g, gt, -gt

LOGICAL
   !    unary negation         not
   &    and                    and, &&, *
   |    or                     or, ||, +

Precedence can be enforced by bracing condition chains appropriately, e.g.

! (( dport = 8080 | dport = 443 ) & proto = TCP )

This will negate the entire conditional and cause the dport conditions to be evaluated before the conjunction with proto.

Case Study

Disclaimer: for obvious reasons of confidentiality, all public IPs below were changed to randomly chosen ones.

During the analysis of a webserver in the DMZ of one of our customers, we noticed a peculiar traffic pattern. The picture shows how the webserver is connected to the internet and the customer's global WAN.

Customer Webserver

In this case study, the webserver's IP is 215.118.162.132. Two regional ISPs are connected on interfaces eth0 and eth2. The DMZ is located behind eth7.

When analyzing the traffic on the DMZ interface, we observed that the client producing the most traffic was 191.189.126.221. We wanted to find out how request and response were routed at the time, so the external interfaces were included in the analysis.

Thanks to added support for multi-interface queries and the finer separation of incoming and outgoing traffic all it took was "one query to rule them all":

goquery -i eth0,eth2,eth7 -f -5d -n 5 -c 'sip = 191.189.126.221 & dip = 215.118.162.132’ iface,sip,dip

                                          packets   packets             bytes      bytes
iface              sip              dip        in       out      %         in        out      %
 eth7  191.189.126.221  215.118.162.132  108.70 k    6.20 k  50.00  148.62 MB  413.83 kB  50.00
 eth2  191.189.126.221  215.118.162.132    0.00    108.70 k  47.30    0.00  B  148.62 MB  49.86
 eth0  191.189.126.221  215.118.162.132    6.20 k    0.00     2.70  413.83 kB    0.00  B   0.14

                                         114.90 k  114.90 k         149.03 MB  149.03 MB

  Totals:                                          229.80 k                    298.05 MB

Timespan / Interface : [2016-02-17 20:08:17, 2016-02-22 09:53:17] / eth0,eth2,eth7
Sorted by            : accumulated data volume (sent and received)
Query stats          : 3.00   hits in 290ms
Conditions:          : sip = 191.189.126.221 & dip = 215.118.162.132Add ‘iface’ column

The first row shows the bi-directional traffic for the DMZ. The other two rows show that requests came in via eth0 (ISP A) and responses were routed via eth2 (ISP B). There's a suitable explanation for this. The webserver's IP was announced via ISP A, while the default route of the customer host pointed to ISP B.

To gain a little more information about the connecting client, the -resolve flag was quite handy

goquery -i eth7 -f -5d -n 5 -c 'sip = 191.189.126.221’ -resolve sip,dip
                                                           sip              dip
                       221.126.189.191.dynamic.wless.ispa.tld.  215.118.162.132 ...

The reverse lookup by goQuery showed that this was a client from a residential WLAN network.

In summary, goQuery enabled us to understand complex routing decisions after the fact. Having access to the complete history of routing paths in your network proves to be a mighty foundation for traffic analyses and understanding the internet as a whole.

Start Analyzing

We just released a major version upgrade to both the goProbe and goQuery software. goProbe and goQuery are 100% open source. Go git them at open-ch/goProbe. We are looking forward to your contributions.

Significant improvements have been incorporated by Lorenz Breidenbach in the scope of an internship at Open Systems. Interested in contributing? Just drop us an email.

This system has been engineered in order to collect required information for a master's thesis in coorporation with the Distributed Computing Group at ETH Zurich. The resulting paper of Lennart Elsen, Fabian Kohn, Christian Decker and Roger Wattenhofer can be found here: goProbe: a Scalable Distributed Network Monitoring Solution. It is based on Lennart's thesis Optimized Distributed Network Traffic Monitoring in Global Networks.

Addendum: type goquery -help for a complete overview of the analytical features

ClientHA

A Linux NSS Module for Client-Side High Availability Failover

At Open Systems we manage more than 3000 hosts acting as proxies, mail gateways, VPN devices, etc., all around the world. These hosts need to communicate with various central services, such as update servers, or monitoring systems.

A challenge that we faced is the optimal selection of these central servers. As an example, let's take ClamAV anti-virus updates: we want that the software doing the updates chooses one of our caching proxies that is online, and as close as possible to where it is running.

We want to have "optimal server selection" for all software that we use on our devices, but not all software has built-in support for that. A general solution that always works is to implement the server selection by influencing the name resolution of the server. If ClamAV wants to fetch updates via an update server called updates.open.ch, we only need to make sure that the returned IP address is good, and we don't need any failover mechanisms in ClamAV itself.

One way to implement that is to use the built-in failover mechanisms of DNS.

DNS-based Failover

DNS based failover for updates

The DNS standard, as defined by RFC 1034 and RFC 1035, mandates that when a DNS zone is resolved, all defined "NS" servers are tried. If one or more of the NS servers are unavailable, the others should be used instead. Also, it is recommended that the selection of the best server is made based on response time.

The DNS-based failover mechanism, described here, uses that fact and defines a DNS zone, called updates.open.ch in this article, which is defined as being served by two NS servers: updates1.open.ch and updates2.open.ch.

If one of the DNS servers, updates1 or updates2, becomes unavailable, then DNS queries will still continue to work thanks to the NS failover behavior implemented in DNS clients.

The trick that we used is to also operate the update server on those same servers, and to configure the DNS zones on both hosts differently, so that they always return their own IP address when queried for the name updates.open.ch (A or AAAA records).

Et voilà, a very cheap and effective failover mechanism, thanks to the built-in mechanisms of the DNS protocol.

This worked well for us, but we still faced some issues:

  • We couldn't easily configure a managed host to use a different set of IP addresses, without using a different name.
  • The mechanism requires access to a functioning DNS. This is not always the case, especially for certain internal firewalls, backend services, IDS systems, etc.
  • The DNS resolver mostly chooses the nearest server, but we couldn't make sure that this was always the case.

That's why we implemented a different approach: ClientHA

ClientHA

ClientHA-based server selection

If you use Linux, you might know the nsswitch.conf file: it defines, among other things, in what databases to look when resolving hostnames, and is typically configured as follows:

hosts:  files dns

With the Linux glibc, this works under the hood by using "NSS modules", which are plugins that take care of the name resolution. There is an "nss_files" module that looks up hostnames in /etc/hosts, and there is an nss_dns module, which uses DNS. nsswitch.conf describes what modules should be used, and in which order.

ClientHA is also an NSS module and the nsswitch.conf on our managed hosts looks as follows:

hosts:  files clientha dns

ClientHA only knows how to resolve a few names such as updates.open.ch, and just ignores any other name resolution request. The beauty of it is that we can implement whatever behavior we want to resolve those names. We implement failover and server selection exactly as we want it to be.

Currently, we have implemented a server selection based on Pingmachine, which pings all servers regularly and chooses the nearest available one.

This solution is simple and generic, but much more flexible than the DNS-based approach. Also we still keep the main failover logic in on the client side, making sure that it is based on real availability measures as seen from the host that needs to use the service.

Maybe it could be useful for you too! Have a look; it's open-source:

Network Traffic Monitoring with goProbe

How to capture and analyze metadata of 50 TB of traffic per day

Today, IT providers have to maintain and update an infrastructure that is capable of carrying both personal and corporate traffic of unparalleled quantities and simultaneously monitor the composition of this traffic in order to provide both reliable service and an understanding of the underlying traffic patterns.

Open Systems finds itself in this demanding position. As a provider of managed network security services it maintains close to 4000 hosts spread across the global network. These mainly reside in networks of multinational corporations and are operated and maintained around-the-clock in more than 175 countries. From an operational point of view, the transported network traffic has great intrinsic informative value:

  • a targeted traffic analysis enables the identification of network bottlenecks
  • routing/configuration errors can be detected by examining traffic flows
  • observed clients, protocols and applications provide a clearer understanding of what is and has been exchanged by whom in the administered network.
  • provide reporting after the fact: identify disallowed traffic to and/or from possibly malicious endpoints

Break it Down

Traffic metadata has to be continuously acquired. Several hosts at central locations forward billions of network packets on a daily basis which renders a per-packet analysis of the traffic infeasible. Raw data contained in the packets must be captured, filtered and broken down to key descriptors that yield a condensed explanation of the underlying data. These are consolidated in flows, which need to be stored on disk to allow for historical analyses.

Scale with Grace

Many network monitoring solutions rely on central collectors to which traffic metadata is forwarded. In Open Systems' case, collecting metadata from 4000 hosts to a central location would strain links and increase the chance of losing metadata in transit. Instead, we store the data locally, that is on the device which forwarded and captured the traffic. The same applies to queries requesting the data.

The Analysis Pipeline

The existing system is deployed and run on every Linux based host maintained by Open Systems.

goProbe - capture

goProbe continually runs as a background process and captures packets using libpcap and gopacket. From these packets, several attributes are extracted which are used to classify the packet in a flow-like data structure:

  • Source and Destination IP (sip, dip)
  • IP Protocol (proto)
  • Destination Port (if available) (dport)
  • Application Layer Protocol (if available) (l7proto)

Available flow counters are:

  • Bytes sent and received
  • Packets sent and received

It runs as a single process, capturing on each configured interface in a separate thread-like routine (goroutine). Every five minutes, the routines are issued to write out their flow data to disk.

goDB - store

No operator wants to wait forever to query and analyze traffic data. Flows need to be stored efficiently to enable swift anlayses of the data upon request. To achieve this goal, we designed a small database tailored to the flow data produced by goProbe -- goDB. Its major design decisions were:

  • Improve IO performance: load only what's actually needed facilitated through time-based partitioning of the data in a columnar arrangement. This allows us to pre-select only those columns which are relevant for the query: if sip and dip are involved in a query, only the sip and dip column files are loaded.
  • Save space and load efficiently: flow data for the last three months can take up considerable disk space when written to disk. To salvage this, every 5-minute block of flow data is compressed before it is written. This directly influences data reading performance, which again positively impacts query performance. The trade-off is clear: load as little as possible from (a usually slow) disk and sacrifice some CPU to (quickly) decompress the data in memory.
  • Enable concurrent processing of the data: today's systems are capable of parallell processing. Make sure that the data is partitioned in a way that subsets can be independently processed. Block-wise storage of the flow data enables query processing in a Map-Reduce fashion.

goQuery - analyze

goQuery is the front end query tool used to extract and analyze aggregated flow information for a specific period of time. The tool interfaces with goDB and performs aggregations across the provided flow attributes (for example grouping by sip and dip).

Parallell processing is achieved on block level. Each worker grabs a five minute flow block and performs the appropriate aggregations on it. A central merger routine takes the results from the individual workers and aggregates them into a final result, which is later printed for the analyst.

Query Processing Pipeline

The query front end supports time ranges, complex conditionals on the data (e.g. drill-down into the data set), sorting of results and exporting into both csv and json.

The result is a concise, flipping fast overview of what happened recently in your network:

goquery -i eth0 -f "17.12.2015 09:33" -c "(dport=53 and proto=UDP)" -n 5 sip,dip,proto,dport

            sip            dip  proto  dport   packets      %  data vol.      %
  210.15.216.97   192.36.72.41    UDP     53   76.00    21.05   14.63 kB  17.38
  210.15.216.97    192.5.12.31    UDP     53   46.00    12.74   14.29 kB  16.96
  210.15.216.97     192.6.18.7    UDP     53   48.00    13.30   14.00 kB  16.63
  210.15.216.97    192.5.25.41    UDP     53   32.00     8.86    9.83 kB  11.67
  210.15.216.97   193.3.12.130    UDP     53   16.00     4.43    4.97 kB   5.90
                                                 ...                 ...
   Total traffic                              361.00            84.21 kB

Timespan / Interface : [2015-12-17 09:33:42, 2015-12-23 11:46:52] / eth0
Sorted by            : accumulated data volume (sent and received)
Query stats          : 21 hits in 13ms
Conditions           : (dport=53&proto=udp)

Start Monitoring

As the components' names not so subtlely suggest, the system is written in go. This is not coincidental as go includes concurrency built-ins on which goProbe and goQuery heavily rely. The language allowed rapid prototyping and enabled us to write a complex, concurrent system with relatively few lines of code.

goProbe and goQuery are 100% open source. Go git it at open-ch/goProbe.

This system has been engineered in order to collect required information for a master's thesis in coorporation with the Distributed Computing Group at ETH Zurich. The resulting paper of Lennart Elsen, Fabian Kohn, Christian Decker and Roger Wattenhofer can be found here: goProbe: a Scalable Distributed Network Monitoring Solution. It is based on Lennart's thesis Optimized Distributed Network Traffic Monitoring in Global Networks.

TLS Certificate Information Collection

Monitoring the Global TLS Landscape

Various attacks and incidents diminished the trust in the TLS public key infrastructure. The major problem is that any certificate authority can issue certificates for any domain. Thus, the weakest link in the system determines the security and trustworthiness of the entire PKI system. At Open Systems, we are monitoring the TLS PKI at a global scope to detect mis-issued certificates and rogue certificate authorities.

Flaws of the Current TLS PKI Ecosystem

Nowadays, browsers and operating systems have certificates preinstalled in its Root Certificate Store. Every certificate in this store acts as trust anchor for validating certificate chains. Hence, the trust in these root certificates is ultimate.

These root certificates sign intermediate certificates of the certificate authorities. Certificate authorities use these intermediate certificates to issue leaf certificates for endpoints. A client needs the entire certificate chain to determine the trust status of a endpoint. Complementary, clients do also consider the validity and revocation status of all certificates.

An attacker must only obtain access to a single trusted issuing certificate to issue arbitrary certificate for his need. Then, he is able to intercept for example HTTPS traffic and still present a valid certificate chain to the client. As there are several thousand intermediate certificates in active usage, obtaining access to a single intermediate certificate is not unrealistic.

Leverage Open Systems Global View

Open Systems secures the Web traffic of many customers with the Web Proxy service. This enables us to passively collect various information from a large number of HTTPS TLS connections on a global scope. At the moment, we extract the following information on all our Web Proxy services:

  • Timestamp: The time of the connection.
  • FQDN: The Fully Qualified Domain Name (FQDN) of the connection.
  • TLS Version: Which SSL/TLS version was used.
  • Cipher: Which cipher was used.
  • Validity: If the certificate chain was deemed valid.
  • Errors: A list of errors, if there were any during certificate chain validation.
  • Chain: A list of fingerprints specifying the certificate chain.

TLS Information Collection

All information is stored locally on every proxy host and eventually reported to a centralized application. There, all TLS connection information is getting stored in an aggregated form for advanced analysis. This enables Open Systems to leverage the global view on the TLS PKI system in order to detect possible attacks and take the required actions. Furthermore, we can also deduce various statstical information about how the TLS PKI system evolves over time.

In feature articles, we will present deeper insights on our findings and the operational benefit this TLS observatory offers us and our customers.

This system has been engineered in order to collect required information for a master's thesis in coorporation with ETH Zurich. The resulting thesis of Fabian Zeindler can be found here: Passive Collection and Analysis of SSL/TLS Connections and Certificates