Weekly Report -- 27/07/2012
My IMC paper on the effect of the Copyright Amendment Act was accepted! However, it looks like I have a fair bit of work to do on it, mainly softening the conclusions. The reviewers felt the results suggested, but did not prove, that the CAA was the cause of the observed behaviour, which I feel is a fair response.
It was a case of one step forwards, two steps back with the event detection this week. I had added a new dataset to my testing, only to run into an old problem where a sharp change in the time series would cause the ARIMA modelling to perform undesirably. A large residual would enter the prediction calculations, which would cause the next prediction to be way off, which would cause a new large residual to enter the calculations, etc. etc.
Instead, I adjusted the ARIMA modelling to only use a small proportion of large residuals when updating the model. The proportion was calculated using a logarithmic algorithm, so that very large residuals would use a much smaller proportion. This resulted in a much better model that responded to change in the time series in a slower and smoother manner.
Previously, the response was very rapid and we detected events by looking for a single large residual (because the model would adapt so quickly, we usually only got one shot and seeing the change). Now, we tend to get several large (but much smaller than before) residuals as the predictive model slowly caught up with the change in traffic level produced by the event. Unfortunately, this meant that all of my event detection rules I had developed over the past month were useless, but I've been able to quickly adapt to the new approach and am getting results that aren't too different from what I was getting before I made this change.
One benefit of this change that I'm still investigating is that the smoother modelling may mean that we can drop the wavelet transform step. This was used to smooth the original data to remove random noise but had the downside of requiring over 20 measurements ahead to produce the smoothed value for a single point. In practical terms, this meant I couldn't report an event until 20 or more minutes after it had happened (assuming minutely measurements). If this works, I can report events much closer to the time that they happen.