Status report for February

March 7, 2011

February in Mozilla Services land saw the killing of our last Firefox 4 blockers, performance and efficiency tweaks for Sync, and the beginnings of new endeavours.

Performance tweaks in Sync

Particularly for the very first sync on mobile devices, Firefox Sync has been showing some rather poor performance, sometimes blocking the main thread for far too long, causing the UI to be unresponsive. The first step was to get a good measurement of how long we these blocks occur:

  1. I added a small in-memory logger that writes to an nsIStorageStream. This little gem is one of many undocumented but marvelous XPCOM components. It creates an in-memory stream pipe. You can get its nsIInputStream, hand it off to some code that wants to write to streams, and then later you get the corresponding nsIOutputStream to inspect the data, write it to disk or whatever.
  2. Next I recorded the timestamp every time Sync would get called back on the main event loop and every time it yielded back. The difference would then be written to the in-memory log, ignoring anything below a certain threshold to avoid huge log files.
  3. At the end of a sync, I dumped the in-memory log to disk. For the location I chose the browser’s downloads directory as available from nsIDownloadManager.userDownloadsDirectory so that the file would be available on the USB-mountable partition that Android phones typically have. The copying is easily done with NetUtil.asyncCopy(input, output) where input is the in-memory nsIInputStream and output is a nsIFileOutputStream (the safe variant)

This produced an overwhelming amount of data for each sync. After all, each event that blocks the main event loop — of which there are many — was recorded, so a little bit of statistical analysis was in order. The easiest way to visualize data like this is to draw a histogram — finally all those years of doing Particle Physics came in handy! After little bit of Python and gnuplot, I was able to visualize Fennec’s responsiveness during a first sync on various phones, e.g.:

A first sync on the Motorola Droid 2

As you can probably tell from the graph, it was pretty terrible. What we didn’t expect was that history sync would still produce so many blocking events after we made all its I/O async. Further investigation revealed that most of the events were in fact so long because of garbage collection kicking in. On the Motorola Droid 2 this was nicely visible because many events clustered around the 500ms mark which seemed to be the order of magnitude of a GC free operation. Clearly the problem was that we were allocating far too many objects.

With the help of Andreas Gal, we managed to instrument the JS engine to tell us whenever a new object was created. Analyzing the results of that instrumentation gave us a list of top offenders which we then proceeded to knock off one by one. Within a week we managed to reduce the number of objects created during a first sync to less than a quarter of what it was before. There’s still room for more improvement, but we’ve now reached a point where further reduction will require changes to the underlying APIs such as Places. We definitely want to do this work for Firefox 5, though.

New projects

Our team is now researching and prototyping ideas on how to improve identity handling, sharing and notifications in the browser. Some of those things we hope to get into the product as soon as Firefox 5. For instance, we’ve joined forces with the guys from Mozilla Messaging who came up with the awesome F1 add-on to evolve it into a product that can land in Firefox eventually.

Our awesome interns Alex and Shane are prototyping a push notification system for the browser. For this we’ve started fleshing out the design and APIs for the cloud service and the client embedded into Firefox. The initial idea is that through this service the cloud can notify the browser of things like new emails without the need for open tabs or even background tabs like in Google Chrome. I’m sure they’ll soon be blogging about their progress themselves, so I won’t spoil too many details here.


4 Responses to “Status report for February”

  1. How does the nested event loop spinning interact with the data gathering? Specifically, did you count it as a yield when you started spinning the loop and count it as a blocking acquisition when you stopped spinning and returned to your flow control?

    Also, is one of the projects the team is working on the removal of the nested event loop spinning?Thunderbird has various efforts where it would be neat to use sync, but nested event loop spinning is likely much more dangerous on Thunderbird…

    • philikon Says:

      Yes, spinning the loop counts as a yield. In fact, for once the nested event loop spinning that Sync does worked for us here because it made the data mining much easier.

      To answer your second question: yes, removing the event loop spinning is a top priority post Firefox 4.

  2. I’ve got some related work happening in bug 606574, where I’m simply instrumenting the event loop itself to determine the latency of servicing events. Our plan is to use this in a set of Talos tests so we can perform some known actions and observe the impact on responsiveness, and then hopefully make the largest ones go away.

    • philikon Says:

      Thanks, this is awesome. Once we have a functional test suite for sync running on tinderboxes, we can use the same thing to instrument Sync’s responsiveness! Go go go go! 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: