Recently we analyzed Firefox Sync’s performance and found that it created a shockingly large number of objects during an initial sync, i.e. when all the data on the server is downloaded and applied locally. Millions of objects in fact. This caused us to not only allocate more memory than we really needed, it also meant that the garbage collection (GC) would have to step in a lot and block the main thread for several hundred milliseconds.

Here are some tips and lessons learned. (Many thanks to Andreas Gal for guiding me through the JS engine internals!)

Profiling

You can use operating system level debug and profiling tools such as Apple Instruments, but they will usually not produce useful data that will help you optimize your JavaScript. Fortunately there are several ways to instrument Spidermonkey’s GC at various levels.

How often and for how long does GC happen?

Set the javascript.options.mem.log preference to true. This will write a log message to the Error Console every time a GC happens. It will also tell you how long each GC run took, e.g.:

GC timestamp: 1298535350097039, duration: 321 ms.

This is a good way to find out if your GC is overworked, but really not much more.

What does the GC spend its time on?

To find out more, you can recompile Firefox with --enable-gctimer which defines MOZ_GCTIMER. This will dump a file called gcTimer.dat into your current working directory. This is what it looked like for a first sync before we started our optimization work:

  AppTime,  Total,   Mark,  Sweep, FinObj, FinStr,  Destroy,  newChunks, destroyChunks
      0.0,  222.7,  141.5,   59.1,   45.9,    4.4,      8.8,          7,          0 
  11805.8,  301.9,  176.2,   64.8,   52.7,    7.3,      4.8,          1,          0 
  24492.8,  258.9,  185.1,   63.4,   49.5,    7.3,      6.6,          1,          0 
  35807.6,  233.2,  195.7,   26.8,   18.5,    4.5,      3.8,          0,          0 
  48767.3,  230.5,  189.5,   26.3,   19.6,    3.8,      2.9,          0,          0 
 113195.6,  281.6,  203.6,   54.4,   32.3,   15.4,      6.7,          1,          0 
 125005.3,  338.1,  217.5,   94.8,   69.1,   14.9,     10.8,          1,          0 
 137759.5,  374.5,  199.8,  131.7,   85.8,   22.8,     23.1,          5,          0 
 149515.7,  312.2,  211.2,   75.6,   48.8,    8.8,     18.0,          0,          0 
 201876.5, 1384.0,  243.5,  754.7,  488.4,   46.1,    220.3,         40,          0 
 258690.2, 1525.4,  300.9,  822.2,  556.4,   47.6,    218.1,          0,          0 
 309871.2, 1336.6,  242.1,  713.5,  450.0,   50.3,    213.2,          0,          0 
 362310.4, 1363.3,  248.1,  731.4,  465.5,   47.0,    218.9,          1,          0 
 417596.2, 1332.4,  264.1,  702.4,  450.2,   45.4,    206.7,          0,          0 
 472853.8, 1399.9,  254.4,  762.7,  486.2,   50.1,    226.5,          0,          0 
 540223.6, 1548.2,  328.8,  848.4,  486.1,   72.0,    290.3,          0,          0 
 599972.6, 1363.9,  269.3,  729.1,  466.6,   51.3,    211.2,          0,          0 
 655834.2, 1384.6,  265.1,  737.2,  468.2,   52.5,    216.5,          0,          0 
 764844.8,  811.9,  241.0,  363.6,  233.7,   50.8,     79.1,          0,          0 
 768540.4,  213.5,  170.4,   34.1,   24.2,    5.3,      4.6,          0,          0 
 768905.3,  372.0,  134.2,  230.1,  218.1,    4.4,      7.7,          0,          0 
 769290.4,  303.0,   13.6,  258.8,  220.1,    5.0,     33.7,          0,          0 
 769742.4,   58.7,    4.9,   50.0,   24.5,    2.9,     22.6,          0,         30 
 769802.3,   12.1,    5.5,    6.1,    4.0,    2.0,      0.1,          0,          0 

This can give you a good idea of when and where the GC spends its time.

Note: the numbers in the first couple of columns are in some JS engine internal time format (they’re really some sort of cycle count, expressed in millions I believe). The Mark and Sweep times will roughly add up to the Total time. The FinObj and FinStr times indicate how long it took to free objects and strings, respectively. These numbers will roughly add up to the Sweep time.

Also, each chunk in the two rightmost columns is 1MB large, just to give you an idea of how much data we’re dealing with here.

Last resort: printf and gdb

As a last resort, you can instrument js_NewGCObject and the like in js/src/jsgcinlines.h, for instance by adding the following lines of code to the top of the function:

    if (cx->hasfp() && cx->fp()->isScriptFrame()) {
        printf("NewGCObject %s:%d\n", cx->fp()->script()->filename,
               js_PCToLineNumber(cx, cx->fp()->script(), cx->fp()->pc(cx)));
    }

Before you run Firefox with this change, make sure you disable the JIT though by setting the javascript.options.tracejit.* and javascript.options.methodjit.* preferences to false.

Also, this change will make Firefox very chatty. Best you direct this output to a file, grep for the resource or chrome URLs you’re interested in and then use a simple script to accumulate and sort the data. I ended up with something like this (output shortened a bit):

18187 resource://services-sync/engines/history.js:505
18187 resource://services-sync/engines/history.js:506
18533 resource://services-sync/record.js:264
18578 resource://services-sync/record.js:259
18613 resource://services-sync/ext/Sync.js:185
18613 resource://services-sync/ext/Sync.js:186
18615 resource://services-sync/util.js:609
18616 resource://services-crypto/WeaveCrypto.js:522
18616 resource://services-crypto/WeaveCrypto.js:523
56393 resource://services-sync/util.js:229
56393 resource://services-sync/util.js:234
56393 resource://services-sync/util.js:235
56393 resource://services-sync/util.js:236
192622 resource://services-sync/log4moz.js:361
192622 resource://services-sync/log4moz.js:365
260624 resource://services-crypto/WeaveCrypto.js:713
674818 resource://services-sync/util.js:408

Now you have a list of top offenders and you can look at them one by one. As a bonus, you also see which places in your code are called how often. In the above example, I was syncing ca. 18,000 history records, so it was easy to see which places were called once, twice or many more times for each record.

Often it will be obvious which expression on a given line of code creates an object and how to avoid (not always as obvious, more on that below.) However, in some cases the line will look completely innocent. Then you can use gdb to see what the interpreter is doing:

  1. Add a Math.sin(0) call or similar just before the line in question.
  2. Start Firefox in gdb (firefox -g) and set a breakpoint on the math_sin function.
  3. Run your code. Once you hit the math_sin breakpoint, set a breakpoint for js_NewGCObject and continue. That way you will (hopefully!) break exactly on the object creation you can’t identify.
  4. Look at the backtrace. You might see things like NewArguments which means you accessed the arguments object on that line for the first time (more on that below). Or you might see something entirely different. Fun!

How to avoid creating objects

Here are some of the patterns I encountered that create objects when not necessary. This isn’t a complete list by far and I invite you all to contribute more. Ed Lee used the above profiling techniques on his code already and came up with some useful tips. Also note that all this only applies to code that gets called a massive number of times, in a short period of time, thus putting lots of pressure on the GC.

  • Put constant things in global consts. This one is pretty “duh,” but you often don’t think about it. We had lots of code like this:
    function somewhere_in_sync() {
      let data = queryDatabase(sql_statement, ["column1", "column2", "column3"]);
      ...
    }
    

    That array of strings can obviously be pulled out of the function. Not a big deal if you call this once. Big deal if you call this 20,000 times.

  • Similarly, avoid creating arrays just to use their join method. We had lots of logging code do this:
    log.debug(["Associated", record.id, "with", record.guid].join(" "));
    

    Also, as a rule of thumb, creating strings rather than objects is better. The GC timer data as shown above has proven this as well, where string sweeping time typically amounts to a fraction of object sweeping time.

  • Reuse arrays. Avoid Array.map, Array.filter, Array.concat, Array.reduce, Array.slice, Array.splice because they create new arrays. Instead just modify the existing array in-place. As a bonus you also get to avoid a closure (see below.) For instance, we had code like this:
    let records = Array.filter(function(record) {
      return determine_if_record_should_stay(record);
    });
    

    We replaced this with

    let i, k;
    for (i = 0, k = 0; i < records.length; i++) {
      let record = records[k] = records[i]; // replace filtered items
      if (determine_if_record_should_stay(record)) {
        k += 1;
      }
    }
    records.length = k; // truncate
    

    Pretty it’s not. But avoids creating new arrays.

  • Similarly, reuse objects and functions. Lots of APIs have you pass in listener or observer objects, e.g.:
    function doStuff() {
      let value1, value2;
      someAPICall({
        onSuccess: function onSuccess() {
          // do stuff with value1, value2
        },
        onError: function onError() {
          // do stuff with value1, value2
        }
      });
    }
    

    If you can, only create this object and the functions once and put the values they need on the object, e.g.:

    const gSomeHandler = {
      onSuccess: function onSuccess() {
        // do stuff with this.value1, this.value2
      },
      onError: function onError() {
        // do stuff with this.value1, this.value2
      }
    };
    function doStuff() {
      gSomeHandler.value1 = value1;
      gSomeHandler.value2 = value2;
      someAPICall(gSomeHandler);
    }
    

    Of course this only works if you can guarantee that this function won’t be called multiple times in parallel. But even then you can at least save on creating the functions by using the global object as a prototype:

    function doStuff() {
      let handler = {};
      handler.__proto__ = gSomeHandler;
      handler.value1 = value1;
      handler.value2 = value2;
      someAPICall(handler);
    }
    
  • The magic arguments object is created lazily on the first access. Once created, it has to be garbage collected, though. Thus avoid patterns that have you do things like
      func.apply(this, arguments);
    

    or even inspect the contents or length of arguments. If you must, use

      func.call(this, arg1, arg2, ...)
    
  • Tracemonkey will try to optimize closures, so even if you define a function within a function that gets called many times, it recognizes all those functions to be the same. The problem arises when you mutate those functions, e.g. set an attribute on them. Then Tracemonkey has to clone the function object so that you end up with a unique one (Brendan explain this very well in this episode of A Minute With Brendan Eich.)

    So try to avoid mutating closure functions. Actually, as Ed points out, try to avoid closures anywhere in code that gets called a lot.

  • Lastly, when designing new XPCOM APIs, optimize for the least number of objects necessary in JavaScript. For instance, instead of requiring JS code to create an nsIURI object, accept a simple string and do the validation in native code. Moreover, allow JavaScript to opt into callbacks rather than forcing it to pass no-ops.

Conclusions

By using the profiling techniques and modifying the code as described above, we managed to eliminate more than 75% of all object allocations during an initial sync. It has been a lot of fun hunting down and killing object allocations, but there are some serious lessons learned here:

  • We need better instrumentation tools. Instrumenting functions within the JS engine isn’t available to everyone, particularly to authors of cross-browser JS frameworks. As a first step it would be great to get the number of object allocations per source code line, much like I was able to produce with printf and a little bit of script code.
  • Many async patterns involve creating new objects one way or the other, either implicitly through func.apply(this, arguments) or through generator, deferred or event objects. Neither pattern will be available to us in the guts of Sync as we’ll have to guarantee that it remains optimized for tens of thousands of calls. That’s ok, though. There’s nothing wrong with simple callbacks. We’ll just have to keep this in mind whenever we feel tempted to use syntactic sugar to replace ease the use of asynchronous APIs.

February in Mozilla Services land saw the killing of our last Firefox 4 blockers, performance and efficiency tweaks for Sync, and the beginnings of new endeavours.

Performance tweaks in Sync

Particularly for the very first sync on mobile devices, Firefox Sync has been showing some rather poor performance, sometimes blocking the main thread for far too long, causing the UI to be unresponsive. The first step was to get a good measurement of how long we these blocks occur:

  1. I added a small in-memory logger that writes to an nsIStorageStream. This little gem is one of many undocumented but marvelous XPCOM components. It creates an in-memory stream pipe. You can get its nsIInputStream, hand it off to some code that wants to write to streams, and then later you get the corresponding nsIOutputStream to inspect the data, write it to disk or whatever.
  2. Next I recorded the Date.now() timestamp every time Sync would get called back on the main event loop and every time it yielded back. The difference would then be written to the in-memory log, ignoring anything below a certain threshold to avoid huge log files.
  3. At the end of a sync, I dumped the in-memory log to disk. For the location I chose the browser’s downloads directory as available from nsIDownloadManager.userDownloadsDirectory so that the file would be available on the USB-mountable partition that Android phones typically have. The copying is easily done with NetUtil.asyncCopy(input, output) where input is the in-memory nsIInputStream and output is a nsIFileOutputStream (the safe variant)

This produced an overwhelming amount of data for each sync. After all, each event that blocks the main event loop — of which there are many — was recorded, so a little bit of statistical analysis was in order. The easiest way to visualize data like this is to draw a histogram — finally all those years of doing Particle Physics came in handy! After little bit of Python and gnuplot, I was able to visualize Fennec’s responsiveness during a first sync on various phones, e.g.:

A first sync on the Motorola Droid 2

As you can probably tell from the graph, it was pretty terrible. What we didn’t expect was that history sync would still produce so many blocking events after we made all its I/O async. Further investigation revealed that most of the events were in fact so long because of garbage collection kicking in. On the Motorola Droid 2 this was nicely visible because many events clustered around the 500ms mark which seemed to be the order of magnitude of a GC free operation. Clearly the problem was that we were allocating far too many objects.

With the help of Andreas Gal, we managed to instrument the JS engine to tell us whenever a new object was created. Analyzing the results of that instrumentation gave us a list of top offenders which we then proceeded to knock off one by one. Within a week we managed to reduce the number of objects created during a first sync to less than a quarter of what it was before. There’s still room for more improvement, but we’ve now reached a point where further reduction will require changes to the underlying APIs such as Places. We definitely want to do this work for Firefox 5, though.

New projects

Our team is now researching and prototyping ideas on how to improve identity handling, sharing and notifications in the browser. Some of those things we hope to get into the product as soon as Firefox 5. For instance, we’ve joined forces with the guys from Mozilla Messaging who came up with the awesome F1 add-on to evolve it into a product that can land in Firefox eventually.

Our awesome interns Alex and Shane are prototyping a push notification system for the browser. For this we’ve started fleshing out the design and APIs for the cloud service and the client embedded into Firefox. The initial idea is that through this service the cloud can notify the browser of things like new emails without the need for open tabs or even background tabs like in Google Chrome. I’m sure they’ll soon be blogging about their progress themselves, so I won’t spoil too many details here.

Follow

Get every new post delivered to your Inbox.