JavaScript perf: avoid creating objects

March 9, 2011

Recently we analyzed Firefox Sync’s performance and found that it created a shockingly large number of objects during an initial sync, i.e. when all the data on the server is downloaded and applied locally. Millions of objects in fact. This caused us to not only allocate more memory than we really needed, it also meant that the garbage collection (GC) would have to step in a lot and block the main thread for several hundred milliseconds.

Here are some tips and lessons learned. (Many thanks to Andreas Gal for guiding me through the JS engine internals!)

Profiling

You can use operating system level debug and profiling tools such as Apple Instruments, but they will usually not produce useful data that will help you optimize your JavaScript. Fortunately there are several ways to instrument Spidermonkey’s GC at various levels.

How often and for how long does GC happen?

Set the javascript.options.mem.log preference to true. This will write a log message to the Error Console every time a GC happens. It will also tell you how long each GC run took, e.g.:

GC timestamp: 1298535350097039, duration: 321 ms.

This is a good way to find out if your GC is overworked, but really not much more.

What does the GC spend its time on?

To find out more, you can recompile Firefox with --enable-gctimer which defines MOZ_GCTIMER. This will dump a file called gcTimer.dat into your current working directory. This is what it looked like for a first sync before we started our optimization work:

  AppTime,  Total,   Mark,  Sweep, FinObj, FinStr,  Destroy,  newChunks, destroyChunks
      0.0,  222.7,  141.5,   59.1,   45.9,    4.4,      8.8,          7,          0 
  11805.8,  301.9,  176.2,   64.8,   52.7,    7.3,      4.8,          1,          0 
  24492.8,  258.9,  185.1,   63.4,   49.5,    7.3,      6.6,          1,          0 
  35807.6,  233.2,  195.7,   26.8,   18.5,    4.5,      3.8,          0,          0 
  48767.3,  230.5,  189.5,   26.3,   19.6,    3.8,      2.9,          0,          0 
 113195.6,  281.6,  203.6,   54.4,   32.3,   15.4,      6.7,          1,          0 
 125005.3,  338.1,  217.5,   94.8,   69.1,   14.9,     10.8,          1,          0 
 137759.5,  374.5,  199.8,  131.7,   85.8,   22.8,     23.1,          5,          0 
 149515.7,  312.2,  211.2,   75.6,   48.8,    8.8,     18.0,          0,          0 
 201876.5, 1384.0,  243.5,  754.7,  488.4,   46.1,    220.3,         40,          0 
 258690.2, 1525.4,  300.9,  822.2,  556.4,   47.6,    218.1,          0,          0 
 309871.2, 1336.6,  242.1,  713.5,  450.0,   50.3,    213.2,          0,          0 
 362310.4, 1363.3,  248.1,  731.4,  465.5,   47.0,    218.9,          1,          0 
 417596.2, 1332.4,  264.1,  702.4,  450.2,   45.4,    206.7,          0,          0 
 472853.8, 1399.9,  254.4,  762.7,  486.2,   50.1,    226.5,          0,          0 
 540223.6, 1548.2,  328.8,  848.4,  486.1,   72.0,    290.3,          0,          0 
 599972.6, 1363.9,  269.3,  729.1,  466.6,   51.3,    211.2,          0,          0 
 655834.2, 1384.6,  265.1,  737.2,  468.2,   52.5,    216.5,          0,          0 
 764844.8,  811.9,  241.0,  363.6,  233.7,   50.8,     79.1,          0,          0 
 768540.4,  213.5,  170.4,   34.1,   24.2,    5.3,      4.6,          0,          0 
 768905.3,  372.0,  134.2,  230.1,  218.1,    4.4,      7.7,          0,          0 
 769290.4,  303.0,   13.6,  258.8,  220.1,    5.0,     33.7,          0,          0 
 769742.4,   58.7,    4.9,   50.0,   24.5,    2.9,     22.6,          0,         30 
 769802.3,   12.1,    5.5,    6.1,    4.0,    2.0,      0.1,          0,          0 

This can give you a good idea of when and where the GC spends its time.

Note: the numbers in the first couple of columns are in some JS engine internal time format (they’re really some sort of cycle count, expressed in millions I believe). The Mark and Sweep times will roughly add up to the Total time. The FinObj and FinStr times indicate how long it took to free objects and strings, respectively. These numbers will roughly add up to the Sweep time.

Also, each chunk in the two rightmost columns is 1MB large, just to give you an idea of how much data we’re dealing with here.

Last resort: printf and gdb

As a last resort, you can instrument js_NewGCObject and the like in js/src/jsgcinlines.h, for instance by adding the following lines of code to the top of the function:

    if (cx->hasfp() && cx->fp()->isScriptFrame()) {
        printf("NewGCObject %s:%d\n", cx->fp()->script()->filename,
               js_PCToLineNumber(cx, cx->fp()->script(), cx->fp()->pc(cx)));
    }

Before you run Firefox with this change, make sure you disable the JIT though by setting the javascript.options.tracejit.* and javascript.options.methodjit.* preferences to false.

Also, this change will make Firefox very chatty. Best you direct this output to a file, grep for the resource or chrome URLs you’re interested in and then use a simple script to accumulate and sort the data. I ended up with something like this (output shortened a bit):

18187 resource://services-sync/engines/history.js:505
18187 resource://services-sync/engines/history.js:506
18533 resource://services-sync/record.js:264
18578 resource://services-sync/record.js:259
18613 resource://services-sync/ext/Sync.js:185
18613 resource://services-sync/ext/Sync.js:186
18615 resource://services-sync/util.js:609
18616 resource://services-crypto/WeaveCrypto.js:522
18616 resource://services-crypto/WeaveCrypto.js:523
56393 resource://services-sync/util.js:229
56393 resource://services-sync/util.js:234
56393 resource://services-sync/util.js:235
56393 resource://services-sync/util.js:236
192622 resource://services-sync/log4moz.js:361
192622 resource://services-sync/log4moz.js:365
260624 resource://services-crypto/WeaveCrypto.js:713
674818 resource://services-sync/util.js:408

Now you have a list of top offenders and you can look at them one by one. As a bonus, you also see which places in your code are called how often. In the above example, I was syncing ca. 18,000 history records, so it was easy to see which places were called once, twice or many more times for each record.

Often it will be obvious which expression on a given line of code creates an object and how to avoid (not always as obvious, more on that below.) However, in some cases the line will look completely innocent. Then you can use gdb to see what the interpreter is doing:

  1. Add a Math.sin(0) call or similar just before the line in question.
  2. Start Firefox in gdb (firefox -g) and set a breakpoint on the math_sin function.
  3. Run your code. Once you hit the math_sin breakpoint, set a breakpoint for js_NewGCObject and continue. That way you will (hopefully!) break exactly on the object creation you can’t identify.
  4. Look at the backtrace. You might see things like NewArguments which means you accessed the arguments object on that line for the first time (more on that below). Or you might see something entirely different. Fun!

How to avoid creating objects

Here are some of the patterns I encountered that create objects when not necessary. This isn’t a complete list by far and I invite you all to contribute more. Ed Lee used the above profiling techniques on his code already and came up with some useful tips. Also note that all this only applies to code that gets called a massive number of times, in a short period of time, thus putting lots of pressure on the GC.

  • Put constant things in global consts. This one is pretty “duh,” but you often don’t think about it. We had lots of code like this:
    function somewhere_in_sync() {
      let data = queryDatabase(sql_statement, ["column1", "column2", "column3"]);
      ...
    }
    

    That array of strings can obviously be pulled out of the function. Not a big deal if you call this once. Big deal if you call this 20,000 times.

  • Similarly, avoid creating arrays just to use their join method. We had lots of logging code do this:
    log.debug(["Associated", record.id, "with", record.guid].join(" "));
    

    Also, as a rule of thumb, creating strings rather than objects is better. The GC timer data as shown above has proven this as well, where string sweeping time typically amounts to a fraction of object sweeping time.

  • Reuse arrays. Avoid Array.map, Array.filter, Array.concat, Array.reduce, Array.slice, Array.splice because they create new arrays. Instead just modify the existing array in-place. As a bonus you also get to avoid a closure (see below.) For instance, we had code like this:
    let records = Array.filter(function(record) {
      return determine_if_record_should_stay(record);
    });
    

    We replaced this with

    let i, k;
    for (i = 0, k = 0; i < records.length; i++) {
      let record = records[k] = records[i]; // replace filtered items
      if (determine_if_record_should_stay(record)) {
        k += 1;
      }
    }
    records.length = k; // truncate
    

    Pretty it’s not. But avoids creating new arrays.

  • Similarly, reuse objects and functions. Lots of APIs have you pass in listener or observer objects, e.g.:
    function doStuff() {
      let value1, value2;
      someAPICall({
        onSuccess: function onSuccess() {
          // do stuff with value1, value2
        },
        onError: function onError() {
          // do stuff with value1, value2
        }
      });
    }
    

    If you can, only create this object and the functions once and put the values they need on the object, e.g.:

    const gSomeHandler = {
      onSuccess: function onSuccess() {
        // do stuff with this.value1, this.value2
      },
      onError: function onError() {
        // do stuff with this.value1, this.value2
      }
    };
    function doStuff() {
      gSomeHandler.value1 = value1;
      gSomeHandler.value2 = value2;
      someAPICall(gSomeHandler);
    }
    

    Of course this only works if you can guarantee that this function won’t be called multiple times in parallel. But even then you can at least save on creating the functions by using the global object as a prototype:

    function doStuff() {
      let handler = {};
      handler.__proto__ = gSomeHandler;
      handler.value1 = value1;
      handler.value2 = value2;
      someAPICall(handler);
    }
    
  • The magic arguments object is created lazily on the first access. Once created, it has to be garbage collected, though. Thus avoid patterns that have you do things like
      func.apply(this, arguments);
    

    or even inspect the contents or length of arguments. If you must, use

      func.call(this, arg1, arg2, ...)
    
  • Tracemonkey will try to optimize closures, so even if you define a function within a function that gets called many times, it recognizes all those functions to be the same. The problem arises when you mutate those functions, e.g. set an attribute on them. Then Tracemonkey has to clone the function object so that you end up with a unique one (Brendan explain this very well in this episode of A Minute With Brendan Eich.)

    So try to avoid mutating closure functions. Actually, as Ed points out, try to avoid closures anywhere in code that gets called a lot.

  • Lastly, when designing new XPCOM APIs, optimize for the least number of objects necessary in JavaScript. For instance, instead of requiring JS code to create an nsIURI object, accept a simple string and do the validation in native code. Moreover, allow JavaScript to opt into callbacks rather than forcing it to pass no-ops.

Conclusions

By using the profiling techniques and modifying the code as described above, we managed to eliminate more than 75% of all object allocations during an initial sync. It has been a lot of fun hunting down and killing object allocations, but there are some serious lessons learned here:

  • We need better instrumentation tools. Instrumenting functions within the JS engine isn’t available to everyone, particularly to authors of cross-browser JS frameworks. As a first step it would be great to get the number of object allocations per source code line, much like I was able to produce with printf and a little bit of script code.
  • Many async patterns involve creating new objects one way or the other, either implicitly through func.apply(this, arguments) or through generator, deferred or event objects. Neither pattern will be available to us in the guts of Sync as we’ll have to guarantee that it remains optimized for tens of thousands of calls. That’s ok, though. There’s nothing wrong with simple callbacks. We’ll just have to keep this in mind whenever we feel tempted to use syntactic sugar to replace ease the use of asynchronous APIs.
About these ads

13 Responses to “JavaScript perf: avoid creating objects”


  1. [...] &#1072t th&#1110&#1109 time: JavaScript perf: avoid m&#1072k&#1110n&#609 objects « philiKON – a journal Related Posts:Overriding C++ xpcom objects using Javascript – MesQuilla One of the more [...]


  2. So, I’m presuming the last of these examples is the most efficient:

    doStuff: function(element) { var self = this; element.addEventListener(“event”, function(event) { self.handleEvent(event); }, false); },

    doStuff: function(element) { element.addEventListener(“event”, this.handleEvent.bind(this), false); },

    doStuff: function(element) { element.addEventListener(“event”, this, false); },

  3. Tom Schuster Says:

    I don’t know if sync code is jitted, but Jägermonkey, doesn’t create the arguments object for this specific code
    func.apply(this, arguments);

    • philikon Says:

      Sync code, like all chrome code, is not method-jitted right now. But it’s good to know that this will help!

    • Andreas Says:

      That depends whether the arguments object escapes or not. Both JITs try to do some optimization of common cases, but if something triggers the arguments object to be materialized, there is no way avoiding the allocation.


  4. [...] here to read the rest: JavaScript perf: avoid creating objects « philiKON – a journal This entry was posted in Java script, US and tagged and-profiling, apple, help-you, not-produce, [...]

  5. Anonymous Says:

    Sounds to me like the JavaScript engine needs some of the optimizations applied in functional language runtimes. You shouldn’t need to avoid Array.filter; the runtime should just figure out that you filter an Array and never use the unfiltered array again, so it can replace the unfiltered array with the filtered one. Similarly, constant values should get let-floated out of functions automatically; that applies to the array case as well as the function case you used gSomeHandler for.

    And if you don’t already, you should seriously consider using a generational copying collector. Such a collector avoids fragmentation, makes malloc effectively free (just add and return a pointer), and optimizes for quick creation and destruction of many objects (via the nursery) as well as some long-lived objects.

    • Andreas Says:

      Escape analysis is an impure language like JS is easier said than done. Most functional languages don’t have complications like prototype chains, implicit conversion, dynamic types and getters/setters. On top of that, JS VMs have to been extremely fast compiling code. JS often runs for a few milliseconds only. This makes performing expensive analysis like escape analysis difficult to implement. We will get to it eventually, but we would like to ship a well working sync code now, not in a couple months whenever the VM is better :)

      • Anonymous Says:

        At a minimum, the faster GC would help greatly, particularly for the usage model you describe.

  6. Johan Says:

    If GC is garbage collection, what is CC then?

  7. Jurij Says:

    Please return to develop BarTab! This addon don’t work with FF4.0 RC1 :-(


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: