Posts Tagged Performance

Transfer-Encoding: Chunked Debugging w/ WPT – 101

Hi there!

Just recently there was an interesting performance issue raised in the WPT Forum that served really well as a showcase for the rich debugging capabilities of WPT and I would focus in this Blog Post on Transfer-Encoding: Chunked.

Transfer-Encoding: Chunked was defined in the HTTP1.1 RFC and can be a nice performance improvement regarding Time-to-Render.

In HTTP1.0 the default behaviour of Browser-Server interaction was to a establish a TCP Connection to a Webserver using TCP’s well-known 3-way-handshake. After the Connection was established, the Browser fired its GET-Request and waited for the response. The requested object was sent down the wire from the Server to the Browser, and as soon as all Bytes were transmitted, the Server closed the TCP Connection. The Browser then would establish a new TCP connection to request the next object and so on.

With HTTP1.1 so-called persistent connections became the default, meaning that a Browser could request sequentially, one after another, multiple objects on a single TCP connection (though it would have to wait for each request to be fulfilled, before it could request the next object. Otherwise it would be Pipelining).

Now looking at performance a transportation method was defined in HTTP1.1 called Transfer-Encoding: Chunked. Reason for this was, that sometimes the Werbserver would have already a couple of Bytes to be sent, but would not know yet the total length (size) of the whole answer. Due to this the Webserver would have to wait until the last Byte is served to him, so the Webserver could write the correct Content-Length Header of the full answer into the HTTP Header.

Otherwise, how would the Browser be able to know, that the received answer is complete, and be allowed to send the next request on this very same TCP connection?

Transfer-Encoding: Chunked solves this problem. It tells the Browser basically: “Here are the first Bytes to your request, but more will come. I do not know yet how many, but I will mark the ending of the Byte stream with a defined marker”. How this exactly is marked can be seen in the above linked RFC. And the performance improvement is, that the Browser can already “work” with this partial answer to start rendering or request other objects. How to use this technique best is described here by Steve Souders.

So after we have now some basic understanding of the Why and How of Transfer-Encoding: Chunked let’s get to the issue of this specific site:

In the waterfall below you can see, how this website was loading. You see, that the base page, the HTML document, is loading for a very slow 15 seconds.

What might be the reason? I did repeat the test a couple of times to make sure this wasn’t a single packet being lost. But the pattern remained stable.

As you can see in the bandwidth utilization below the waterfall, it is most definitely not a bandwidth issue. Most of the time the bandwidth consumption is close to zero. And also you see by the blue bar, that the content started to come down very early, but it took a REALLY long time to complete.

So, clicking at the object will tell us a little bit more about it:

First you see the size of the object. A small 3.3 KByte. So the reason for the long transfer time is not the size of the object. And again, that the first Bytes of the object arrived under 1 second, but it took 15 seconds for the full answer.

Additionally you see, that the Browser was allowing the content to be gezipped, and the Server indeed not only gezipped it, but also applied Transfer-Encoding: Chunked.

So, how can you dig deeper into this? A good idea is using the feature TCPDump of WPT, which allows you to see each and every Byte and its timing on the wire. So when doing this,  you get the following picture:

Now you get some more information, of what the issue is. You can see in the middle pane, that the answer consisted of 3 chunks, whereof the first 2 chunks arrived almost instantly (Frame 8 and 28), and then, after 15 seconds, you received the rest of 444 Bytes in Frame 242. So it seems, that the beginning of the object could be sent very fast, but then the webserver had to wait for 15 seconds for the last part to be generated. Unfortunately, the content is gezipped, so you can’t see, what was the content of these last 444 Bytes. You can only guess, that somewhere in the last 15% of the document is the code causing the delay.

But fortunately enough there is Pat. Pat Meenan. He gave me the advise: You might want to look at the setHeader command from WPT’s scripting capabilities. Bingo! setHeader allows you to modify/override any of the Browsers GET-Request Headers. And that’s what you can do. Simply use the Script:

setHeader Accept-Encoding: None
navigate http://www.example.com

and your done!

Now when repeating this test, you see indeed in the Headers that no gzip was applied:

And in Wireshark you can see the content of the last chunk in plain text!

So now you CAN (or rather COULD) see, what code was in the delayed frame(s). Could, as in this “real-life-example” the site owner fixed the issue, before I was able to apply the Accept-Encoding: None Header. 🙂

Anyway, two scenarios might have been possible to see:

a) Issue still present -> You can see the causing code piece.

b) Issue gone -> You have a problem with flushing gzip buffers.

or something totally wild like serverside malware, that tried and failed to attach malware code to the bottom of the object… 😉

Leave a Comment

Quiz: Guess the impact of 50 KByte on Page load via DSL

As you might recall, www.alice-dsl.de is one of our web properties we’re responsible for. And due to this we had a rather busy week. Reason for this can be found here. But I do not want to comment on that, but rather about the technical outcome 🙂 Because due to “the story” we had to redo quite a few of our graphics on our website.

And THAT was actually one thing I was eagerly waiting for.

Before “the story” we had a page header with a rather difficult image. Our current design language pretty often challenges us (or better, our agencies) quite a bit, as the graphics often consist of a photo-realistic part, which is in front of a colour-gradient background. So the different compression methods fail one way or the other. If we compress using PNG8, the quality of the photo-realistic part degrades rather badly. If we compress using JPG, the colour gradients and sharp edges become really ugly, making it necessary to compress with high quality settings, resulting in rather large files.  If we would work with 2 files using transparancy and different compression methods, well, we would have 2 HTTP Requests instead of 1.

So, to make a long story short, this header image was formerly 72 KByte of size. I asked my colleagues to make sure, that the new one would be much smaller in size, and to really push for that. What we got back was an 18 KByte image. I wasn’t totally satisfied with the result, as the agency used JPG again, even though the new image wasn’t really suitable for JPG. With PNG8 I was able to further reduce it to 8 KByte instead of 18 KByte. Nevertheless I was happy enough by the reduction of ~50 KByte.

Now back to the quiz question: What do you think, how much faster our site reaches its time to visually complete due to that reduction? (using the Frankfurt node of WPT @ 1.5 MBit/s, 50 ms RTT and IE8)

The answer might surprise some people (at least it did within our company). Normally you might do a napkin calculation like this: 50 KByte = 400 KBit. 400 KBit on a 1.5 MBit/s line should be transmitted in ~1/4th of a second = 250 ms. So the page load time might decrease by 250 ms.

But this omits from the equation TCP Slow Start! If you are unfamiliar with TCP Slow Start, the VERY, VERY simplified and brief explanation is: When a TCP Connection is established, it doesn’t utilize all available bandwidth from the beginning, but instead is “slowly” increasing the bandwidth utilization, to test the available bandwidth. A TCP connection “can’t know” the available bandwidth, so in order to not overload the network, it starts slowly and increases over time.

A much better and longer explanation is here by Steve Souders, an excellent Video from Velocity 2010 can be found here, and a really great animation visualizing it can be found here.

Sooo… What was the question again? Oh, right! The benefit of the image size reduction! Getting back to our napkin calculation: The former image was ~70 KByte of size, which is roughly 0.5 MBit, which should load in ~333 ms over a 1.5 MBit/s line. Right?

Again I used WPT with its tcpdump feature and loaded the image. And the result is, without DNS resolution…: ~666 ms! 🙂 So it is roughly the double! Why so? You guessed it, the reason is TCP Slow Start.

As you can see in the image above, using Wiresharks TCP Bandwidth Statistic Analysis, it takes close to 400 ms before this TCP Connection has reached its bandwidth limitation!

Now the problem is, that the header image is quite at the beginning of the HTML basepage. And therefore it gets loaded on a rather “cold” TCP Connection. With IE8 opening up to 6 connections per server, you will start close to the beginning of the page load with 5 “cold” TCP connections.

Just recently a lot of smart people started working on circumventing the limitations of TCP Slow Start in different areas. So SPDY for example multiplexes requests on a single TCP connection, therefore going through TCP Slow Start only once. Firefox now reuses connections by the highest CWND. Starting with Linux Kernel 2.6.33 the initial CWND has been increased from 3 to 10.

But, as you can’t force your visitiors to use a specific Browser, or you might not be able to choose your Linux Kernel, your best bet is still:

Reduce Bytes!
And don’t be fooled by the fact, that 50 KByte on a 1.5 MBit/s line sounds neglectable.

While rambling, I almost forgot the initial question: The impact of this 50 KByte saved on Time-to-visually-complete. See yourself! 🙂

Comments (6)

Going Async!

A lot of things have changed since my last post. In the meantime my company (Hansenet aka Alice) was sold and is now part of Telefonica, which sails under the “O2” Brand in Germany. My responsibilities have changed also quite a bit, and part of it is now the O2 Portal.

We had a big Relaunch of the site something like 6 weeks ago, and are now doing some more (performance) tweaks here and there.

One of the tweaks we had on our Agenda was going Async for the newly integrated social media widgets of Facebook, Twitter and Google+. The idea came up actually by a tweet from Steve Souders announcing that the Google Plus Button has gone async.

This was then further fueled by a Post from Stoyan Stefanov, where he showed a way to go async for actually all of the most prominent widgets.

So, maybe you are asking yourself, what is so important on async, that some of the brightest guys are working and evangelizing so much on that? Reason for that is the way Browsers behave. They are single threaded, and so quite a few Browsers will simply stop doing anything else (even downloading other assets in parallel) when they are downloading and executing Javascript. So a) it comes at a performance price for some browsers, as no other asset is downloaded and therefore interfers/blocks/halts the process of Page load and rendering, and b) due to above mentioned behaviour it becomes a Single-Point-of-Failure (SPOF). If an image loads slowly (or not at all) it is not a big issue. If a Javascript loads slowly (or not at all), it becomes really ugly for the above mentioned reasons.

And with the social widgets it is even more difficult: If your image Server is unreachable, you can politely call your engineer. If Twitter is unavailable… well… you can’t even tweet that.

And just recently Pat Meenan has written an excellent Blogpost, how the result can look like, visible in this Video. You see how the Page of businessinsider remains blank for full 20 seconds (White Page! Nothing!) just because Twitter is unavailable. And the nice chap that Pat is, he already implemented the possibility to check your Site for these SPOFs in Webpagetest.

So I tried it with our site and the result was this (Left: Twitter unavailable, Right: Twitter available) :

Video Before

What you see here is not as bad as Businessinsider. Nevertheless the Browser is spinning wheels for more than 25 seconds, indicating, that the page hasn’t completed. Even more problematic is, that the Slider (arrow >) to the right of the Screen does not show up until 24 seconds within the page load. And THAT is problematic, as the Slider functionality of the main teaser is quite an important feature to our site.

So overnight 🙂 we (actually others, see below) did some magic and today the widgets have gone async! Result? (Left: Twitter unavailable, Right: Twitter available)

Video After

As you can see, the page doesn’t need any longer, if any of the social widget providers is unavailable! (well actually it is even a bit faster due to less data to download) And as a goodie, our PageSpeed Score has improved by +1 due to ~150KB of Javascript having been defered.

Thanks to Alex K., Sasa S. and Siegfried H. for the quick fix, and as always, to the community (and this time specifically to Steve, Stoyan and Pat) for continously developing tools and sharing research results and best practices.

So I recommend to testdrive your site with Webpagetest, and make sure, it doesn’t break!

Leave a Comment

Domain-Sharding and SSL

Hi there!

Some time ago I noticed some weird behaviour on our website http://www.alice-dsl.de so I thought it would be a good idea to revisit this issue. Part of the above mentioned website is a selfcare area, where our customers are able to view their bill, to change the product options etc. etc. This area requires a login and so the login itself, as well as the subsequent pages are secured by SSL.

SSL and Performance Optimization was for quite some time a very difficult beast to debug, but fortunately with Pat Meenans WebPageTest this was solved at least for IE. In the old days you could use HTTPWatch for example, to see the HTTP interaction within the SSL tunnel, but you were unable to view the TCP Connection flows. With Microsofts Visual Roundtrip Analyzer you were able to see the TCP Connection flows, but you were unable to see the HTTP Conversation within the SSL Tunnel. Pat’s tool was the first (maybe still is the only one) that allowed to see both, the TCP Connection flow as well as what happens within the SSL Tunnel.

So, back to our page. In one of our former optimization iterations we decided to do some domain sharding. With this we wanted to improve performance especially for IE6 and IE7 users, as these Browsers only open up 2 TCP Connections per domain. Resulting in a performance limit, that is often lower than what your bandwidth is able to provide.

We implemented that in a way, that we could follow another performance rule, which is “Make static content cookie-free”. So we set up another Domain, static.alice.de, which we used for the above mentioned domain sharding, as well as keeping most of the content cookie-free.

There are some reasonable concerns regarding domain sharding in conjunction with SSL, as you not only have an additional DNS lookup and TCP handshake, but also an additional SSL handshake, which can be quite time-consuming. But we were pretty confident, as our customers are solely based in small Germany with a RTT of ~50 ms, that the benefit of 4 connections would outweigh the impact of that additional DNS lookup and TCP/SSL Handshake.

So, when I did a run with Webpagetest.org, the result in IE7 looked like this:

Hmm… that was weird… As you can see, you have some 2 connection waterfall behaviour, but for the first 7 objects, it seems like the TCP / SSL handshake is closed after each object, and re-established for the next one. Now THAT is much more painful, than what we would have expected…

So I was almost picking up the phone to call the Webserver Admin, that probably the Connection: Keep-alive Headers were not set correctly in Apache, but I decided first to check the HTTP Headers. And they looked like this:

Request Headers:

GET /selfcare/content/staticcode/tls/kundencenter/1025300/2010-06-23-09-49-15/ui.all.css HTTP/1.1
Accept: */*
Referer: https://www.alice-dsl.de/errorsides/test/rechnung2.htm
Accept-Language: en-us
UA-CPU: x86
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; PTST 2.257)
Host: static.alice.de
Connection: Keep-Alive

Response Headers:

HTTP/1.1 200 OK
Date: Thu, 13 Jan 2011 07:46:20 GMT
Server: Apache
Expires: Fri, 14 Jan 2011 07:45:33 GMT
Cache-Control: public, max-age=3628800
Content-Language: de-DE
Vary: Accept-Encoding
Content-Encoding: gzip
Age: 47
Content-Length: 5122
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/css;charset=UTF-8

As you can see, BOTH ends have a “Connection: Keep-Alive Header” using HTTP1.1. So why in the hell was the connection closed???

So I fired up Wireshark to look at the raw packet flow. Wireshark normally does not allow me to see the content within the SSL Tunnel, but in this case it didn’t matter, as I only wanted to see, WHO is closing the connection.

So it was ME (Or my machine) which was closing the connection, not the Server. As you can see in the trace, I am the one sending the TCP FIN.

So far, so bad.

Thinking about a solution / workaround I was wondering: Only the CSS files are affected. The Headers are correct and the same as the other assets, which are downloaded later on from the same domain. And with these assets I see a perfect persistent connection… Maybe it simply is due to being hosted on another domain?

Setting up another test page I dropped the idea of domain sharding, and used just one domain. The domain, where the base page resides. And the result was:

And the issue is gone… Meanwhile the Time-to-Render has improved from 2.2 to 1.6.

As it seems, the buggy TCP Connection behaviour seems to show up, when you are delivering CSS files VIA SSL from a different domain, than the base page.

I did a further test (waterfall coming soon), where I have shifted only the CSS files back to the domain, where the basepage resides. Result is

As you can see, the connection was kept-alive during the CSS downloads. BUT, later on, when the CSS-Images were downloaded from the sharded domain, suddenly these CSS-Images showed the Connection: Close behaviour!

Now, which IE Versions are affected? IE 7, as you can see above. IE 6 as well, IE 8 and IE 9 are not also affected. (Though I have to check with IE 8 and IE 9, if the issue would only be visible, if I add a seventh CSS file in the HEAD. But then you have another issue anyway). With IE 8 and IE 9, though, the issue is a lot less impacting, as you will be hit ONLY if you have more than 6 Stylesheets referenced in HEAD. Something which you should avoid anyway.

Firefox is not affected.

Finally: Who should care (If my observations/assumptions are true. Feel free to comment)?

– If it is not SSL, you have no problem.

– If you have inlined CSS, you have no problem.

– If your CSS files and CSS Images are on the same domain as the base page, you have no problem.

– If you have just a single CSS file via SSL in the HEAD, it is cacheable and your customers have a rather low latency, you shouldn’t worry too much.

– BUT, if you have multiple CSS files, or different CSS files on each page, which are on a different domain, then it might be an issue. More so, if they are not cacheable, and even more so, when your customers DO have high latency. Remember, NO RENDERING, until all CSS files from the HEAD are received! Blank Page!

So, even though there seem to be quite a lot of  conditions that have to be met, my assumption is that quite a few pages might be hit by this issue. I coincidentally saw that on Gateway’s site for example (CSS Images via SSL on a sharded Domain. Look at the bottom of the waterfall and the corresponding HTTP Headers). And it sounds reasonable. When you follow some of the performance best practices (Domain sharding, Make static assets cookie-free, Use a CDN) and are using SSL, you probably run automatically in this scenario.

Finally a big thank you to Thomas G., Daniel G. and Michael S. who supported me in this analysis!

-Markus

 

UPDATE:

I got in contact with Microsofts IE Team via their blog and indeed, they confirmed the above mentioned behaviour. So this is not a unique fail case with our domain, but actually a bug in IE, present in versions 6 to 9. In the same response they said they are looking to fix this in an upcoming IE version.

See the last 2 comments:

http://blogs.msdn.com/b/ieinternals/archive/2011/03/26/https-and-connection-close-is-your-apache-modssl-server-configuration-set-to-slow.aspx

Comments (2)

Taming IE 8 Javascript download order

Hi there!

While having started to analyze, how IE 8 interacts with our site http://www.alice-dsl.de in my recent post here, I observed another, to me, strange behaviour. We have read of course Steve Souders book High Performance Websites, and therefore decided to place as much external Javascript files as possible at THE BOTTOM of the page. Due to some reasons, though, we were forced to have still some external Javascript files in the HEAD, to be precise: 2 of them. Everything else went to the bottom.

So, when I looked at IE 7 Waterfall diagram, again plugging Pat’s great Webpagetest.org, everything seemed to be fine.

You see the 2 JS Assets in the HEAD being downloaded, sequentially, as it is IE 7, and then going on downloading the images as defined in the html document.

So, just as a comparison, I did the same thing using IE 8. And, boy, was I surprised (Which is a frequent feeling, when doing web performance optimization). The Waterfall looked like this:

It seems all our optimization efforts never took place. Again, we see all JS being downloaded first! This left me a little bit puzzled… So I looked around at webpagetest.org, and saw other waterfalls, showing the same behaviour. But also others, that didn’t.

So I built a really simple page and was playing around a little bit. First attempt was to try to repro the results by a page being as simple as possible, but that would match this behaviour.  I succeeded with this page and the Waterfall indeed showed the same behaviour:

A small JS Script 01 is placed in the HEAD (splitting the inital payload 🙂 ). This JS Script is actually empty, nothing in there. But even though in the html document all images are placed right at the TOP of the BODY, and the JS files 02-09 are at the BOTTOM of the BODY, IE 8 fetches these JS resources, before it starts to download the images.

Rendering, though, is not being blocked by these “elevated” JS-Files. But the images of this rather… image-centric 🙂 page are not visible up until ~4.75 seconds.

So while it does NOT block Rendering, it CLOGS UP your available TCP Connections.

So by fiddling around, I found the issue is the small JS in the HEAD of the document. When I remove this JS 01 from the HEAD of the Document, the Waterfall in IE 8 suddenly changes dramatically to this:

In comparison to the first example, this page displays the first image right after 1.5 seconds, instead of 4.75 seconds. That is three times as fast! And the difference is a ~0 Byte external Javascript placed in the HEAD! (Well, okay, my example might be a little bit strechted and rather rare in the wild 🙂 )

So being also an avid reader of Steves “Even faster Websites”, I remembered, that even though IE 8 is not blocking downloads of Images while fetching JS, this might be a different story for CSS and IFRAMEs.

I did further tests and to make a long story short: While “elevating” the JS from the Bottom to the Top, these “elevated” JS DO NOT block IFRAMEs or CSS files, to be seen here:

(As you can see, even though I placed the CSS file at the TOP of the BODY and not in the HEAD, IE 8 pulls it first, and afterwards the “elevated” JS files)

and here:

So, after we have now checked out rather thoroughly the behaviour of IE 8, what can you do about this, if you simply NEED an external JS in your HEAD?

Well, you can, for instance, use the method explained here by Nicholas Zakas. Using this method, my Test Page Waterfall looks in IE 8 now like that:

Apart of finally being able to tame IE 8 Javascript load order, this also gives you the benefit of “unblocking” JS downloads in IE 7.

The result of these tests to me is the following:

With IE 8, in case an external JS-file is being placed in the HEAD by the conventional <script> method, all other external JS assets in the BODY being fetched by <script> will be elevated in the download order. These assets are not blocking rendering, but they do clog up your connections.

This can be avoided by the above mentioned method (probably there are other methods as well).

IE 7 does not show this behaviour, FF 3.6.3 does not show this behaviour.

Markus

Leave a Comment