User login

Weekly Report for week ending 13 July 2012

17

Jul

2012

Spent some more time working on building useful groups of events for
RTT/loss data. I'm trying to find a compromise between including all
events that happen about the same time and grouping only those events that
are obviously related, while allowing events to be in multiple groups
where that makes sense. Some of these issues are coming about because my
sample data extraction program doesn't guarantee strictly increasing
timestamps in the warm-up phase while fetching historical data.

Tidied up some error messages in the icmp test in AMP where non-echoreply
responses were being incorrectly examined for the embedded triggering
packet. It should now properly index into those packets and record the
correct error type codes.

Noticed that sometimes the AMP tput test was failing to run in both
directions on some nodes and tried to investigate why. Running the tests
manually works, but scheduling them through AMP often fails to get the
return path test to run. Looks like there is some sort of timing issue
where the connection takes a long time to close and this prevents it from
being re-established in the other direction (the single threaded server is
still waiting for close() to return). Have yet to figure out an answer to
this.

Spent some time with Shane, Brad and Jamie poking at the Network
Diagnostic Tool (NDT) used by perfsonar, mlab and as part of the
nzbt.org.nz broadband test. Some of the results we were getting weren't of
the quality we were expecting, so we put together our own little test lab
to see how it works. Our initial tests using a virtualised server couldn't
sustain gigabit speeds across the network bridge in one direction, despite
working fine in the other (and NDT performed less than half as good as
iperf). With two physical, directly connected machines we finally managed
to get the expected TCP performance but the extra analysis that NDT
performed was still bogus - it reports network limits that are much lower
than they actually are (and lower than what the test just observed!), RTT
values that are 500 times larger than they actually are, etc.