Many of you are pretty much aware of the fact, that you should never judge the performance / load times of your site testing from the Local LAN. This is actually pretty common knowledge, as the results may be skewed due to the fact, that your LAN is often connected via a big pipe to the site you are working on.
But, there is actually more to it. The results might be be skewed in the opposite direction as well, and here I would like to point out, what reasons there might be. And also, why you should care anyway, even though that you already know, that testing from your LAN is not recommended.
So let’s answer the second question first. I am responsible for the second time in my career for a rather large portal. And the second time, it was much slower form the local LAN compared to what normal customers see. The reason we did and do care is simply the reason of doubt of our INTERNAL customers. Being in a tech department, our internal customers are Marketing and Customer Services. And these employees (as the rest of our company. Like the CEO for example) of course might (and some indeed are) thinking: “WTF, they are celebrating how fast our portal is, and even though I am almost directly connected to it, it is f*king slow!”
There are times, when you have luck, and they confront you with that. And then you might have some good WPT.org Videos under your belt, “proving” that the customer experience is much better. But I can assure you, doubts will remain (“They came back with some lame techie excuses”). And sometimes they don’t confront you with that. So you don’t even have the chance to defend yourself. We just had that recently, when we had a relaunch of our Portal, announcing big performance improvements, and we got some pretty harsh responses by our colleagues. So this is the reason you maybe SHOULD care about it, that it is at least not SLOWER than customers perception.
After we covered now the motivation, let’s have a look now at the root causes:
Debugging this was difficult, as workstations in the LAN a) rarely do have admin priviliges so some of your tools might be difficult to get running and b) are under the protection of data privacy laws, so tools like Wireshark might be forbidden. In our case most of the analysis was done using Fiddler.
Things we found, sorted by priority:
- Internet Explorer: This thing actually has a couple of issues. In my company IE8 is the mandatory Browser, and it is directed to a corporate proxy. The impact on performance is massive:
- IE 6 to IE 8 is limiting the amount of TCP Connections when connecting through a proxy down to 2! As we shard our Portal across three domains, this means for IE 8 a difference of 18 vs. 2 connections.
- IE 6 to IE 8 is by default downgrading from HTTP 1.1 to HTTP 1.0 when connecting through a Proxy! This is massive. You won’t have persistent connections, which is extremely painful with SSL (which is the case with our Portal), but you also lose the ability to use your carefully crafted Cache-Control Headers!The first issue can be solved via some Registry Key, the second one is a Browser Setting. Especially regarding the persistent connection be aware that you have to check the whole chain (Browser, Proxy, Webserver), that none of them is configured to downgrade to HTTP 1.0! Eric Law from Microsoft has written for example an excellent Blogpost on that.
- Security: Within our LAN we actually have two kind of proxies. One for unknown domains, and one for “known secure” domains. Which means some kind of white list. Of course our portal is on it :-) The proxy for unknown domains checks each and every object for Viruses. Now when we introduced with our relaunch of the portal 2 sharded domains, we forgot to put them on the white list. Resulting in all objects fetched from the sharded domains (~90%) went through a time consuming Virus scan!
- DNS: As we found out, in our corporate setup one device in front of our local DNS Servers was configured to drop traffic on TCP Port 53. Unfortunately the workstations in our LAN were trying to resolve our Portal domains using TCP first, and only after a time out, fell back to using UDP. So we had a nice lag in Time-to-Render right at the beginning. This behaviour has been in the past apparantly so common, that they published an RFC to halt people from thinking, UDP Port 53 is enough to support the DNS System.
So… well, we fixed the issues, and now they (our colleagues and the CEO) lived happily ever after. Testing, though, we still don’t do from our local LAN :-)
A big “Thanks” go out to Diemo S., Lars W. and Holger T. who actually DID the research and the fixes that I was just blogging about :-)