Shane Alcock's Blog
Finished up the proof-of-concept CAPWAP parser. ITS seemed pretty happy with the results so far, so I will probably be asked to develop a production version at some stage.
Turned my thoughts back to anomaly detection in noisy time series data. Measuring the autocorrelation of errors suggested that Holt-Winters forecasting alone was unlikely to be useful for our purposes in the long run. Started learning about using wavelets to denoise the data so that forecasting techniques might work better. I'm part of the way there -- I can apply a couple of wavelet transformations and get smoother data but I seem to start adding noise if I go any further than that.
Was interviewed by Radio NZ on the topic of my research into Internet usage following the CAA act.
Had a good chat with Sam Russell from REANNZ on Tuesday when he and Steve Cotter came to visit the group.
Wrote an additive Holt Winters forecaster for use with the decomposed time series data. This one attempts to set the initial seasonal components correctly rather than just ignoring the seasonal behaviour in the training data. It's still not great with my test data but might still have uses with non-decomposed time series.
Started writing a proof-of-concept program to analyse CAPWAP traffic and track the amount of traffic observed on a wireless network as the AP, SSID and individual user levels. This is part of a possible project for ITS to allow them to keep historical statistics of the AP usage around campus.
Our paper on libtrace entitled "Libtrace: A Packet Capture and Analysis Library" has been officially published in this month's edition of ACM Computer Communication Review.
It has been a bit of a battle over the years to find a venue that was willing to publish a paper on libtrace, as the direct scientific contribution of libtrace itself is subtle. It was also difficult to articulate exactly how libtrace is so much easier and pleasant to work with compared to other trace analysis libraries. Often the improvements present in libtrace were dismissed out of hand as being nice but not necessary.
For example, capture format agnosticism was dismissed by some reviewers as mostly pointless because they never needed to work with a trace format other than pcap. The performance enhancements were similarly discredited because it was just easier to "buy a faster CPU" or because you could just use a separate zcat process to decompress the trace instead (hence the explicit discussion of the difference between using a separate process + pipe versus the threaded approach employed by libtrace).
As a result, we often had to go back to the drawing board and think more carefully about how to "sell" each of the enhancements in libtrace and clearly explain the reasoning behind each design decision. Eventually we managed to find the right combination of venue and tone that allowed us to finally get a submission accepted.
Hopefully this will lead to more network researchers learning about libtrace and adopting it for use in their own research and analysis tasks.
A copy of the paper can be downloaded from here.
Another rather fragmented week. Continued helping out where I could with the funding proposals, particularly finding references and tidying up some of the wording. Now we just have to wait and see if we actually get any of the funding we're asking for.
Taught 513 this week - we covered the recently published libtrace paper. I think I did a reasonable job of selling the students on libtrace. Wrote a possible libtrace programming assignment for the class which will be set if Richard gives it the go-ahead.
Prepared a 1.0.3 release for libtcpcsm. I've sent the release candidate off to a user who has been using the library quite a bit for testing prior to an actual release.
Started preparing for a new libprotoident release as well.
On the time series front, decomposing the time series seems to produce a trend line that can highlight genuine events in the data but there are still some caveats. In particular, none of our existing detectors work that well with the resulting data and it isn't clear that we can do the decomposition reliably when running live.
Due to the impending deadline for MSI funding proposals, last week was quite a mixed bag of tasks.
Developed another event detector that tries to detect obvious spikes in a relatively constant time series. The likelihood that a spike will be treated as an event is inversely correlated with the amount of noise in the time series, i.e. a spike in noisy data won't register as an event but a smaller spike following a long period of consistency would. Also started looking at decomposing time series with R again.
Wrote a lab exercise for 312 on configuring a DNS server. Spent a couple of hours in R block during the designated 312 lab time to help out students, although they were mostly working on previous labs (or wasting time looking at meme pictures).
Went over the methodology sections of both MSI proposals with Jamie and Brendon. Rewrote the methodologies to better suit the requirements, i.e. more emphasis on the research tasks that we will be carrying out.
Short week - on holiday until Thursday.
Caught up with various support requests once I got back. Had a long chat with Andreas about time series' and how we might be able to get better results when analysing the data produced by AMP and libprotoident.
Concluded that we need to start by making sure we can deal with the more obvious cases properly - in particular, time series where the reported value is mostly constant which we commonly get from AMP. The detectors we have at the moment are based on standard deviation, which doesn't work well when the stddev approaches nil. Developed a detector that works much better in those cases and also started adding code that will use an appropriate detector depending on the type of time series we have observed.
Started looking at Andreas' code in more detail by throwing a few different time series at it and seeing what anomalies it detects. Was not entirely happy with the results and spent a fair bit of time delving much deeper into the code than I would have liked to try and figure out what was going on.
This also involved spending a bit of time with R and its time series decomposition functions to see if that would shed any light on what we should be finding in the time series data.
Spent Thursday and Friday at the cricket.
Released libtrace 3.0.14 - mostly just a bug fix release. I also separated the I/O stuff into a separate library so that it can be used outside of libtrace.
Took a quick look at maji again to see if we can use it as part of the MSI project. Fixed up some bugs that became apparent when exporting lots of flow records. Also decided that maji would work a lot better if it underwent a major design change, but resisted the temptation to do so for now.
Secured the RT exporter connected to the live capture point so that only WAND machines can connect to it - someone from a lightwire address had connected to it and sent something invalid which broke the whole wdcap process. The RT exporter also now handles invalid client responses better :)
Started looking at Andreas' time series anomaly detection code. The existing system only really works with offline data, so the first goal is to get it running against a "live" input source.
Libtrace 3.0.14 has been released.
This release fixes a few bugs in the previous release and adds a few minor improvements. Most notably, libtrace no longer assert fails when reading corrupt pcap trace files.
The full list of changes in this release can be found in the libtrace ChangeLog.
You can download the new version of libtrace from the libtrace website.
Released a new version of BSOD client on Tuesday.
Did some planning with Brendon, thinking about how we're going to bring all the components of the MSI project together into something usable.
Played around with a live libprotoident application, getting it to write results into a postgresql database and an RRD. Postgresql required a fair bit of revision of SQL and database theory. The RRD was much easier to get up and running.
Continued improvements to libprotoident - trying to get that accuracy rate up even further!