Posts Tagged Internet Explorer

Objects in the LAN may appear SLOWER than they are…


Many of you are pretty much aware of the fact, that you should never judge the performance / load times of your site testing from the Local LAN. This is actually pretty common knowledge, as the results may be skewed due to the fact, that your LAN is often connected via a big pipe to the site you are working on.

But, there is actually more to it. The results might be be skewed in the opposite direction as well, and here I would like to point out, what reasons there might be. And also, why you should care anyway, even though that you already know, that testing from your LAN is not recommended.

So let’s answer the second question first. I am responsible for the second time in my career for a rather large portal. And the second time, it was much slower form the local LAN compared to what normal customers see. The reason we did and do care is simply the reason of doubt of our INTERNAL customers. Being in a tech department, our internal customers are Marketing and Customer Services. And these employees (as the rest of our company. Like the CEO for example) of course might  (and some indeed are) thinking: “WTF, they  are celebrating how fast our portal is, and even though I am almost directly connected to it, it is f*king slow!”

There are times, when you have luck, and they confront you with that. And then you might have some good Videos under your belt, “proving” that the customer experience is much better. But I can assure you, doubts will remain (“They came back with some lame techie excuses”). And sometimes they don’t confront you with that. So you don’t even have the chance to defend yourself. We just had that recently, when we had a relaunch of our Portal, announcing big performance improvements, and we got some pretty harsh responses by our colleagues. So this is the reason you maybe SHOULD care about it, that it is at least not SLOWER than customers perception.

After we covered now the motivation, let’s have a look now at the root causes:

Debugging this was difficult, as workstations in the LAN a) rarely do have admin priviliges so some of your tools might be difficult to get running and b) are under the protection of data privacy laws, so tools like Wireshark might be forbidden. In our case most of the analysis was done using Fiddler.

Things we found, sorted by priority:

  1. Internet Explorer: This thing actually has a couple of issues. In my company IE8 is the mandatory Browser, and it is directed to a corporate proxy. The impact on performance is massive:
    IE 6 to IE 8 is limiting the amount of TCP Connections when connecting through a proxy down to 2! As we shard our Portal across three domains, this means for IE 8 a difference of 18 vs. 2 connections.
    IE 6 to IE 8 is by default downgrading from HTTP 1.1 to HTTP 1.0 when connecting through a Proxy! This is massive. You won’t have persistent connections, which is extremely painful with SSL (which is the case with our Portal), but you also lose the ability to use your carefully crafted Cache-Control Headers!The first issue can be solved via some Registry Key, the second one is a Browser Setting. Especially regarding the persistent connection be aware that you have to check the whole chain (Browser, Proxy, Webserver), that none of them is configured to downgrade to HTTP 1.0! Eric Law from Microsoft has written for example an excellent Blogpost on that.
  2. Security: Within our LAN we actually have two kind of proxies. One for unknown domains, and one for “known secure” domains. Which means some kind of white list. Of course our portal is on it 🙂 The proxy for unknown domains checks each and every object for Viruses. Now when we introduced with our relaunch of the portal 2 sharded domains, we forgot to put them on the white list. Resulting in all objects fetched from the sharded domains (~90%) went through a time consuming Virus scan!
  3. DNS: As we found out, in our corporate setup one device in front of our local DNS Servers was configured to drop traffic on TCP Port 53. Unfortunately the workstations in our LAN were trying to resolve our Portal domains using TCP first, and only after a time out, fell back to using UDP. So we had a nice lag in Time-to-Render right at the beginning. This behaviour has been in the past apparantly so common, that they published an RFC to halt people from thinking, UDP Port 53 is enough to support the DNS System.

So… well, we fixed the issues, and now they (our colleagues and the CEO) lived happily ever after. Testing, though, we still don’t do from our local LAN 🙂

A big “Thanks” go out to Diemo S., Lars W. and Holger T. who actually DID the research and the fixes that I was just blogging about 🙂

Leave a Comment

Domain-Sharding and SSL

Hi there!

Some time ago I noticed some weird behaviour on our website so I thought it would be a good idea to revisit this issue. Part of the above mentioned website is a selfcare area, where our customers are able to view their bill, to change the product options etc. etc. This area requires a login and so the login itself, as well as the subsequent pages are secured by SSL.

SSL and Performance Optimization was for quite some time a very difficult beast to debug, but fortunately with Pat Meenans WebPageTest this was solved at least for IE. In the old days you could use HTTPWatch for example, to see the HTTP interaction within the SSL tunnel, but you were unable to view the TCP Connection flows. With Microsofts Visual Roundtrip Analyzer you were able to see the TCP Connection flows, but you were unable to see the HTTP Conversation within the SSL Tunnel. Pat’s tool was the first (maybe still is the only one) that allowed to see both, the TCP Connection flow as well as what happens within the SSL Tunnel.

So, back to our page. In one of our former optimization iterations we decided to do some domain sharding. With this we wanted to improve performance especially for IE6 and IE7 users, as these Browsers only open up 2 TCP Connections per domain. Resulting in a performance limit, that is often lower than what your bandwidth is able to provide.

We implemented that in a way, that we could follow another performance rule, which is “Make static content cookie-free”. So we set up another Domain,, which we used for the above mentioned domain sharding, as well as keeping most of the content cookie-free.

There are some reasonable concerns regarding domain sharding in conjunction with SSL, as you not only have an additional DNS lookup and TCP handshake, but also an additional SSL handshake, which can be quite time-consuming. But we were pretty confident, as our customers are solely based in small Germany with a RTT of ~50 ms, that the benefit of 4 connections would outweigh the impact of that additional DNS lookup and TCP/SSL Handshake.

So, when I did a run with, the result in IE7 looked like this:

Hmm… that was weird… As you can see, you have some 2 connection waterfall behaviour, but for the first 7 objects, it seems like the TCP / SSL handshake is closed after each object, and re-established for the next one. Now THAT is much more painful, than what we would have expected…

So I was almost picking up the phone to call the Webserver Admin, that probably the Connection: Keep-alive Headers were not set correctly in Apache, but I decided first to check the HTTP Headers. And they looked like this:

Request Headers:

GET /selfcare/content/staticcode/tls/kundencenter/1025300/2010-06-23-09-49-15/ui.all.css HTTP/1.1
Accept: */*
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; PTST 2.257)
Connection: Keep-Alive

Response Headers:

HTTP/1.1 200 OK
Date: Thu, 13 Jan 2011 07:46:20 GMT
Server: Apache
Expires: Fri, 14 Jan 2011 07:45:33 GMT
Cache-Control: public, max-age=3628800
Content-Language: de-DE
Vary: Accept-Encoding
Content-Encoding: gzip
Age: 47
Content-Length: 5122
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/css;charset=UTF-8

As you can see, BOTH ends have a “Connection: Keep-Alive Header” using HTTP1.1. So why in the hell was the connection closed???

So I fired up Wireshark to look at the raw packet flow. Wireshark normally does not allow me to see the content within the SSL Tunnel, but in this case it didn’t matter, as I only wanted to see, WHO is closing the connection.

So it was ME (Or my machine) which was closing the connection, not the Server. As you can see in the trace, I am the one sending the TCP FIN.

So far, so bad.

Thinking about a solution / workaround I was wondering: Only the CSS files are affected. The Headers are correct and the same as the other assets, which are downloaded later on from the same domain. And with these assets I see a perfect persistent connection… Maybe it simply is due to being hosted on another domain?

Setting up another test page I dropped the idea of domain sharding, and used just one domain. The domain, where the base page resides. And the result was:

And the issue is gone… Meanwhile the Time-to-Render has improved from 2.2 to 1.6.

As it seems, the buggy TCP Connection behaviour seems to show up, when you are delivering CSS files VIA SSL from a different domain, than the base page.

I did a further test (waterfall coming soon), where I have shifted only the CSS files back to the domain, where the basepage resides. Result is

As you can see, the connection was kept-alive during the CSS downloads. BUT, later on, when the CSS-Images were downloaded from the sharded domain, suddenly these CSS-Images showed the Connection: Close behaviour!

Now, which IE Versions are affected? IE 7, as you can see above. IE 6 as well, IE 8 and IE 9 are not also affected. (Though I have to check with IE 8 and IE 9, if the issue would only be visible, if I add a seventh CSS file in the HEAD. But then you have another issue anyway). With IE 8 and IE 9, though, the issue is a lot less impacting, as you will be hit ONLY if you have more than 6 Stylesheets referenced in HEAD. Something which you should avoid anyway.

Firefox is not affected.

Finally: Who should care (If my observations/assumptions are true. Feel free to comment)?

– If it is not SSL, you have no problem.

– If you have inlined CSS, you have no problem.

– If your CSS files and CSS Images are on the same domain as the base page, you have no problem.

– If you have just a single CSS file via SSL in the HEAD, it is cacheable and your customers have a rather low latency, you shouldn’t worry too much.

– BUT, if you have multiple CSS files, or different CSS files on each page, which are on a different domain, then it might be an issue. More so, if they are not cacheable, and even more so, when your customers DO have high latency. Remember, NO RENDERING, until all CSS files from the HEAD are received! Blank Page!

So, even though there seem to be quite a lot of  conditions that have to be met, my assumption is that quite a few pages might be hit by this issue. I coincidentally saw that on Gateway’s site for example (CSS Images via SSL on a sharded Domain. Look at the bottom of the waterfall and the corresponding HTTP Headers). And it sounds reasonable. When you follow some of the performance best practices (Domain sharding, Make static assets cookie-free, Use a CDN) and are using SSL, you probably run automatically in this scenario.

Finally a big thank you to Thomas G., Daniel G. and Michael S. who supported me in this analysis!




I got in contact with Microsofts IE Team via their blog and indeed, they confirmed the above mentioned behaviour. So this is not a unique fail case with our domain, but actually a bug in IE, present in versions 6 to 9. In the same response they said they are looking to fix this in an upcoming IE version.

See the last 2 comments:

Comments (2)