We always appreciate it when users let us know about problems they’re experiencing with Sync because it means we can improve it! Here are some guidelines on how to file a good Sync bug:

  1. Get the Sync log on the machine you’re having problems (these steps assume you’re running the latest version of Firefox; if you’re not, please upgrade):
    1. Go to about:sync-log in Firefox.
    2. Find the last log file entry in the list and click on it. It should be called something like 1307991874508.log or error-1307991874508.log.
    3. Save it to a file on disk so you can upload to the bug later.
    4. If the list in step 2 is empty or only contains outdated logs (you can tell by the dates), go to about:config, find the services.sync.log.appender.file.logOnSuccess preference, set it to true, and sync again. Then repeat these steps.
  2. File a bug
    1. Even if you’ve found an existing bug that may look like the one you’re experiencing, do not add your comments to it. Please always file a new bug, we will dupe it appropriately once we’ve determined the root issue. If you want, you can clone the existing bug or make a cross-reference to it in the summary.
    2. Attach the log file you saved earlier to the bug by uploading an attachment. Do not copy & paste the text into the comment field, it won’t hould enough text!

We’re hoping to automate much of this in upcoming versions of Firefox, so that a Sync problem can automatically be reported to us, or at least with just a few clicks.

Today exactly one year ago was my first day at Mozilla. I started it by joining IRC from my flat in Glasgow, where I had just given up a PhD position and a career as a physicist. Shortly after that I moved to Berlin for a few months and when I got my H1B visa, I moved to San Francisco.

People frequently ask me how I like it “here.” Well, it’s a fucking blast. I get to call many incredibly smart and fun people not only my colleagues but also my friends. I live in one of the most brilliant cities in the world and occasionally get to travel to the remaining ones. I race cars now, for crying out loud.

Yup, life is good.

Thanks, everybody who’s been in my life this past year. It’s been one hell of a ride.

Recently we analyzed Firefox Sync’s performance and found that it created a shockingly large number of objects during an initial sync, i.e. when all the data on the server is downloaded and applied locally. Millions of objects in fact. This caused us to not only allocate more memory than we really needed, it also meant that the garbage collection (GC) would have to step in a lot and block the main thread for several hundred milliseconds.

Here are some tips and lessons learned. (Many thanks to Andreas Gal for guiding me through the JS engine internals!)

Profiling

You can use operating system level debug and profiling tools such as Apple Instruments, but they will usually not produce useful data that will help you optimize your JavaScript. Fortunately there are several ways to instrument Spidermonkey’s GC at various levels.

How often and for how long does GC happen?

Set the javascript.options.mem.log preference to true. This will write a log message to the Error Console every time a GC happens. It will also tell you how long each GC run took, e.g.:

GC timestamp: 1298535350097039, duration: 321 ms.

This is a good way to find out if your GC is overworked, but really not much more.

What does the GC spend its time on?

To find out more, you can recompile Firefox with --enable-gctimer which defines MOZ_GCTIMER. This will dump a file called gcTimer.dat into your current working directory. This is what it looked like for a first sync before we started our optimization work:

  AppTime,  Total,   Mark,  Sweep, FinObj, FinStr,  Destroy,  newChunks, destroyChunks
      0.0,  222.7,  141.5,   59.1,   45.9,    4.4,      8.8,          7,          0 
  11805.8,  301.9,  176.2,   64.8,   52.7,    7.3,      4.8,          1,          0 
  24492.8,  258.9,  185.1,   63.4,   49.5,    7.3,      6.6,          1,          0 
  35807.6,  233.2,  195.7,   26.8,   18.5,    4.5,      3.8,          0,          0 
  48767.3,  230.5,  189.5,   26.3,   19.6,    3.8,      2.9,          0,          0 
 113195.6,  281.6,  203.6,   54.4,   32.3,   15.4,      6.7,          1,          0 
 125005.3,  338.1,  217.5,   94.8,   69.1,   14.9,     10.8,          1,          0 
 137759.5,  374.5,  199.8,  131.7,   85.8,   22.8,     23.1,          5,          0 
 149515.7,  312.2,  211.2,   75.6,   48.8,    8.8,     18.0,          0,          0 
 201876.5, 1384.0,  243.5,  754.7,  488.4,   46.1,    220.3,         40,          0 
 258690.2, 1525.4,  300.9,  822.2,  556.4,   47.6,    218.1,          0,          0 
 309871.2, 1336.6,  242.1,  713.5,  450.0,   50.3,    213.2,          0,          0 
 362310.4, 1363.3,  248.1,  731.4,  465.5,   47.0,    218.9,          1,          0 
 417596.2, 1332.4,  264.1,  702.4,  450.2,   45.4,    206.7,          0,          0 
 472853.8, 1399.9,  254.4,  762.7,  486.2,   50.1,    226.5,          0,          0 
 540223.6, 1548.2,  328.8,  848.4,  486.1,   72.0,    290.3,          0,          0 
 599972.6, 1363.9,  269.3,  729.1,  466.6,   51.3,    211.2,          0,          0 
 655834.2, 1384.6,  265.1,  737.2,  468.2,   52.5,    216.5,          0,          0 
 764844.8,  811.9,  241.0,  363.6,  233.7,   50.8,     79.1,          0,          0 
 768540.4,  213.5,  170.4,   34.1,   24.2,    5.3,      4.6,          0,          0 
 768905.3,  372.0,  134.2,  230.1,  218.1,    4.4,      7.7,          0,          0 
 769290.4,  303.0,   13.6,  258.8,  220.1,    5.0,     33.7,          0,          0 
 769742.4,   58.7,    4.9,   50.0,   24.5,    2.9,     22.6,          0,         30 
 769802.3,   12.1,    5.5,    6.1,    4.0,    2.0,      0.1,          0,          0 

This can give you a good idea of when and where the GC spends its time.

Note: the numbers in the first couple of columns are in some JS engine internal time format (they’re really some sort of cycle count, expressed in millions I believe). The Mark and Sweep times will roughly add up to the Total time. The FinObj and FinStr times indicate how long it took to free objects and strings, respectively. These numbers will roughly add up to the Sweep time.

Also, each chunk in the two rightmost columns is 1MB large, just to give you an idea of how much data we’re dealing with here.

Last resort: printf and gdb

As a last resort, you can instrument js_NewGCObject and the like in js/src/jsgcinlines.h, for instance by adding the following lines of code to the top of the function:

    if (cx->hasfp() && cx->fp()->isScriptFrame()) {
        printf("NewGCObject %s:%d\n", cx->fp()->script()->filename,
               js_PCToLineNumber(cx, cx->fp()->script(), cx->fp()->pc(cx)));
    }

Before you run Firefox with this change, make sure you disable the JIT though by setting the javascript.options.tracejit.* and javascript.options.methodjit.* preferences to false.

Also, this change will make Firefox very chatty. Best you direct this output to a file, grep for the resource or chrome URLs you’re interested in and then use a simple script to accumulate and sort the data. I ended up with something like this (output shortened a bit):

18187 resource://services-sync/engines/history.js:505
18187 resource://services-sync/engines/history.js:506
18533 resource://services-sync/record.js:264
18578 resource://services-sync/record.js:259
18613 resource://services-sync/ext/Sync.js:185
18613 resource://services-sync/ext/Sync.js:186
18615 resource://services-sync/util.js:609
18616 resource://services-crypto/WeaveCrypto.js:522
18616 resource://services-crypto/WeaveCrypto.js:523
56393 resource://services-sync/util.js:229
56393 resource://services-sync/util.js:234
56393 resource://services-sync/util.js:235
56393 resource://services-sync/util.js:236
192622 resource://services-sync/log4moz.js:361
192622 resource://services-sync/log4moz.js:365
260624 resource://services-crypto/WeaveCrypto.js:713
674818 resource://services-sync/util.js:408

Now you have a list of top offenders and you can look at them one by one. As a bonus, you also see which places in your code are called how often. In the above example, I was syncing ca. 18,000 history records, so it was easy to see which places were called once, twice or many more times for each record.

Often it will be obvious which expression on a given line of code creates an object and how to avoid (not always as obvious, more on that below.) However, in some cases the line will look completely innocent. Then you can use gdb to see what the interpreter is doing:

  1. Add a Math.sin(0) call or similar just before the line in question.
  2. Start Firefox in gdb (firefox -g) and set a breakpoint on the math_sin function.
  3. Run your code. Once you hit the math_sin breakpoint, set a breakpoint for js_NewGCObject and continue. That way you will (hopefully!) break exactly on the object creation you can’t identify.
  4. Look at the backtrace. You might see things like NewArguments which means you accessed the arguments object on that line for the first time (more on that below). Or you might see something entirely different. Fun!

How to avoid creating objects

Here are some of the patterns I encountered that create objects when not necessary. This isn’t a complete list by far and I invite you all to contribute more. Ed Lee used the above profiling techniques on his code already and came up with some useful tips. Also note that all this only applies to code that gets called a massive number of times, in a short period of time, thus putting lots of pressure on the GC.

  • Put constant things in global consts. This one is pretty “duh,” but you often don’t think about it. We had lots of code like this:
    function somewhere_in_sync() {
      let data = queryDatabase(sql_statement, ["column1", "column2", "column3"]);
      ...
    }
    

    That array of strings can obviously be pulled out of the function. Not a big deal if you call this once. Big deal if you call this 20,000 times.

  • Similarly, avoid creating arrays just to use their join method. We had lots of logging code do this:
    log.debug(["Associated", record.id, "with", record.guid].join(" "));
    

    Also, as a rule of thumb, creating strings rather than objects is better. The GC timer data as shown above has proven this as well, where string sweeping time typically amounts to a fraction of object sweeping time.

  • Reuse arrays. Avoid Array.map, Array.filter, Array.concat, Array.reduce, Array.slice, Array.splice because they create new arrays. Instead just modify the existing array in-place. As a bonus you also get to avoid a closure (see below.) For instance, we had code like this:
    let records = Array.filter(function(record) {
      return determine_if_record_should_stay(record);
    });
    

    We replaced this with

    let i, k;
    for (i = 0, k = 0; i < records.length; i++) {
      let record = records[k] = records[i]; // replace filtered items
      if (determine_if_record_should_stay(record)) {
        k += 1;
      }
    }
    records.length = k; // truncate
    

    Pretty it’s not. But avoids creating new arrays.

  • Similarly, reuse objects and functions. Lots of APIs have you pass in listener or observer objects, e.g.:
    function doStuff() {
      let value1, value2;
      someAPICall({
        onSuccess: function onSuccess() {
          // do stuff with value1, value2
        },
        onError: function onError() {
          // do stuff with value1, value2
        }
      });
    }
    

    If you can, only create this object and the functions once and put the values they need on the object, e.g.:

    const gSomeHandler = {
      onSuccess: function onSuccess() {
        // do stuff with this.value1, this.value2
      },
      onError: function onError() {
        // do stuff with this.value1, this.value2
      }
    };
    function doStuff() {
      gSomeHandler.value1 = value1;
      gSomeHandler.value2 = value2;
      someAPICall(gSomeHandler);
    }
    

    Of course this only works if you can guarantee that this function won’t be called multiple times in parallel. But even then you can at least save on creating the functions by using the global object as a prototype:

    function doStuff() {
      let handler = {};
      handler.__proto__ = gSomeHandler;
      handler.value1 = value1;
      handler.value2 = value2;
      someAPICall(handler);
    }
    
  • The magic arguments object is created lazily on the first access. Once created, it has to be garbage collected, though. Thus avoid patterns that have you do things like
      func.apply(this, arguments);
    

    or even inspect the contents or length of arguments. If you must, use

      func.call(this, arg1, arg2, ...)
    
  • Tracemonkey will try to optimize closures, so even if you define a function within a function that gets called many times, it recognizes all those functions to be the same. The problem arises when you mutate those functions, e.g. set an attribute on them. Then Tracemonkey has to clone the function object so that you end up with a unique one (Brendan explain this very well in this episode of A Minute With Brendan Eich.)

    So try to avoid mutating closure functions. Actually, as Ed points out, try to avoid closures anywhere in code that gets called a lot.

  • Lastly, when designing new XPCOM APIs, optimize for the least number of objects necessary in JavaScript. For instance, instead of requiring JS code to create an nsIURI object, accept a simple string and do the validation in native code. Moreover, allow JavaScript to opt into callbacks rather than forcing it to pass no-ops.

Conclusions

By using the profiling techniques and modifying the code as described above, we managed to eliminate more than 75% of all object allocations during an initial sync. It has been a lot of fun hunting down and killing object allocations, but there are some serious lessons learned here:

  • We need better instrumentation tools. Instrumenting functions within the JS engine isn’t available to everyone, particularly to authors of cross-browser JS frameworks. As a first step it would be great to get the number of object allocations per source code line, much like I was able to produce with printf and a little bit of script code.
  • Many async patterns involve creating new objects one way or the other, either implicitly through func.apply(this, arguments) or through generator, deferred or event objects. Neither pattern will be available to us in the guts of Sync as we’ll have to guarantee that it remains optimized for tens of thousands of calls. That’s ok, though. There’s nothing wrong with simple callbacks. We’ll just have to keep this in mind whenever we feel tempted to use syntactic sugar to replace ease the use of asynchronous APIs.

February in Mozilla Services land saw the killing of our last Firefox 4 blockers, performance and efficiency tweaks for Sync, and the beginnings of new endeavours.

Performance tweaks in Sync

Particularly for the very first sync on mobile devices, Firefox Sync has been showing some rather poor performance, sometimes blocking the main thread for far too long, causing the UI to be unresponsive. The first step was to get a good measurement of how long we these blocks occur:

  1. I added a small in-memory logger that writes to an nsIStorageStream. This little gem is one of many undocumented but marvelous XPCOM components. It creates an in-memory stream pipe. You can get its nsIInputStream, hand it off to some code that wants to write to streams, and then later you get the corresponding nsIOutputStream to inspect the data, write it to disk or whatever.
  2. Next I recorded the Date.now() timestamp every time Sync would get called back on the main event loop and every time it yielded back. The difference would then be written to the in-memory log, ignoring anything below a certain threshold to avoid huge log files.
  3. At the end of a sync, I dumped the in-memory log to disk. For the location I chose the browser’s downloads directory as available from nsIDownloadManager.userDownloadsDirectory so that the file would be available on the USB-mountable partition that Android phones typically have. The copying is easily done with NetUtil.asyncCopy(input, output) where input is the in-memory nsIInputStream and output is a nsIFileOutputStream (the safe variant)

This produced an overwhelming amount of data for each sync. After all, each event that blocks the main event loop — of which there are many — was recorded, so a little bit of statistical analysis was in order. The easiest way to visualize data like this is to draw a histogram — finally all those years of doing Particle Physics came in handy! After little bit of Python and gnuplot, I was able to visualize Fennec’s responsiveness during a first sync on various phones, e.g.:

A first sync on the Motorola Droid 2

As you can probably tell from the graph, it was pretty terrible. What we didn’t expect was that history sync would still produce so many blocking events after we made all its I/O async. Further investigation revealed that most of the events were in fact so long because of garbage collection kicking in. On the Motorola Droid 2 this was nicely visible because many events clustered around the 500ms mark which seemed to be the order of magnitude of a GC free operation. Clearly the problem was that we were allocating far too many objects.

With the help of Andreas Gal, we managed to instrument the JS engine to tell us whenever a new object was created. Analyzing the results of that instrumentation gave us a list of top offenders which we then proceeded to knock off one by one. Within a week we managed to reduce the number of objects created during a first sync to less than a quarter of what it was before. There’s still room for more improvement, but we’ve now reached a point where further reduction will require changes to the underlying APIs such as Places. We definitely want to do this work for Firefox 5, though.

New projects

Our team is now researching and prototyping ideas on how to improve identity handling, sharing and notifications in the browser. Some of those things we hope to get into the product as soon as Firefox 5. For instance, we’ve joined forces with the guys from Mozilla Messaging who came up with the awesome F1 add-on to evolve it into a product that can land in Firefox eventually.

Our awesome interns Alex and Shane are prototyping a push notification system for the browser. For this we’ve started fleshing out the design and APIs for the cloud service and the client embedded into Firefox. The initial idea is that through this service the cloud can notify the browser of things like new emails without the need for open tabs or even background tabs like in Google Chrome. I’m sure they’ll soon be blogging about their progress themselves, so I won’t spoil too many details here.

Status report for January

January 31, 2011

The month of January in Sync land was dominated by two things:

  • sorting out the aftermath of Sync 1.6 which unfortuantely reshuffled and duplicated a lot of users’ bookmarks,
  • finishing up all the work we need to do for Firefox 4

Bookmarks

My colleague Richard Newman and I sat down and stared at the bookmark sync code for quite a bit. We identified and fixed a bug introduced in 4.0b8 that caused the reordering problems people had reported. We also made bookmark sync less aggressive about reparenting bookmarks which should alleviate many of the bookmarks-are-moved-to-the-unsorted-folder and bookmarks-are-duplicated issues. These were fixed for Firefox 4.0b10 / Sync 1.6.2. We also identified a bunch of old bugs, particularly related to the restore-from-backup functionality. These are already fixed and will go out in Firefox 4.0b11 / Sync 1.6.3.

Many of those bugs were due to parallel structures that were not synchronized properly, assumptions that certain values would not change, and similar things. This prompted me to quote Phil Karlton’s famous bonmot a couple of times:

There are only two hard things in Computer Science: cache invalidation and naming things.

Firefox 4

Apart from a few UI improvements to the Sync preference window and its dialogs, the remaining work for Firefox 4 mostly consistent of making history sync use the new asynchronous bulk API that Shawn Wilsher implemented. This speeds up sync when you’re syncing down a lot of history (e.g. the first sync on a second device) while improving the UI’s responsiveness at the same time.

As a byproduct, we also fixed a long standing problem in Sync: when a particular record could not be added to the local profile, Sync would stop altogether for that engine, leaving all other changes, including any local changes, unsynced.

As we’re getting closer to shipping Firefox 4, less of us have any blockers to work on while the pressure increases on the few people that do. Here are some thoughts and experiences on how to level that.

This past month we picked up a few blockers in Sync that had people’s bookmarks duplicated and reordered. An embarrassing and unfortunate scenario, especially since it was hard to reproduce and our unit tests were all passing, of course. We had gotten reports of duplicated bookmarks in the past. So while we had recently made some changes to bookmark sync that may have made situation worse, it certainly looked an old lurking bug.

After two days of staring at the code, writing tests that passed and getting nowhere, I asked my colleague Richard Newman to help me out. In previous projects I’ve had extremely good experiences with pair programming. In this technique, two developers sit down over one machine and one text editor. The “navigator” dictates the general direction while the “driver” writes the actual code. Developing code this way has a lot of advantages:

  • Two sets of eyes see more than one. Simple mistakes that can slow an individual developer down, such as typos, off-by-one errors, etc. happen less.
  • Code is reviewed before it’s checked in. Granted, we already do that at Mozilla.
  • Code is reviewed instantly. Less developer and reviewer time is wasted with an asynchronous review cycle and less time has to be spent explaining why a particular change was made this or that way.

However, Richard works remotely, nearly a thousand miles away, although fortunately he’s in the same timezone. As it turns out, pair programming over a distance works great when you have the right tools. Here’s what we did:

  • We both had hands-free phones available, so we got on the phone and started walking through the code together.
  • As we bounced ideas off each other, we collected our findings in an Etherpad. This allowed us to write down our ideas in a structured manner, while allowing the other one to reorder and restructure them as we filed bugs.
  • On some of those bugs, we paired up in the traditional pair-programming sense. To compensate the lack of a common screen to stare at, we used a shared GNU screen session where I brought up an instance of emacs so that we could take turns in “driving.”

We ended up doing this this for three whole afternoons. We found potential culprits, filed bugs and fixed a bunch. The experience very much reminded me of on-site pair programming. I suggest you try it some time, be it with a colleague who’s remote or not. You’ll feel extremely productive even if you’re not writing a single line of code yourself. Not only that, you’ll end the session with a much better understanding of what the code does (instant r+!).

Let’s kill those blockers dead.

Bookmark sync issues

December 28, 2010

We’ve been getting a lot of reports from users about bookmark sync being somewhat unreliable in Sync 1.6 and Firefox 4.0b8. Symptoms include duplicate bookmarks, erroneous re-ordering and bookmarks occasionally being moved to the Unsorted Bookmarks folder. We’re trying hard to track down these issues, but since they appear seemingly randomly, we’re mostly poking in the dark.

How to fix

Fortunately, Sync makes backups of your bookmarks before each sync. So if your bookmarks have been messed up, you can easily restore from bookmarks. Simply go to the Bookmarks menu and choose Show All Bookmarks (Organize Bookmarks on Firefox 3.x). In the window that opens, click on the Import and Backup button in the toolbar (it’s a star icon on OS X) and open the Restore submenu. It will give you the option to restore your bookmarks to a previous state.

How to help us

If you’re one of the affected users and would like to help us track down this issue, here’s what you can do:

  1. Let’s say you have computer A and computer B. On both computers, enable the sync log (about:sync-log) by going to about:config and setting the services.sync.log.appender.debugLog.enabled preference to true. In addition to that please make the log for bookmarks extra chatty by setting the services.sync.log.logger.engine.bookmarks preference to Trace. Make sure to restart the browser after making these changes.
  2. Now on computer A, restore your bookmarks to a good state using the above steps. Then choose Reset Sync under Sync Options and choose the Replace all other devices options.
  3. On computer B simply sync and observe whether you’re seeing any of the symptoms again. If you do, please send us your logs from both computers. They’re at about:sync-log and about:sync-log.1. You can just save those files and attach them to a new bug. Note that those logs will contain your bookmark data, so if you’re concerned about privacy, we completely understand. In that case feel free to send your logs to philikon@mozilla.com.

Please be assured that we’re trying hard to fix these problems and we’re looking forward to your help. Thanks for bearing with us and apologies if you’ve been affected.

Follow

Get every new post delivered to your Inbox.