Thursday, October 7, 2010

HTML5 offline webapps: a practical example

Web apps that work offline are still uncommon, and there aren't a lot of "real life" implementation examples on the Web.
So, we figured we should share our experience making Cube work offline.
This is not only about "keep it working on airplanes" scenarios. Offline support can make an app load faster, cause less load on the server, and be way more resilient to tough network conditions.

This isn't an introduction to HTML5 - for that, we'd like to recommend Mark Pilgrin's Dive into HTML5 as a great introduction on what's possible with the newer specs. If you have access to the Apple's WWDC 2010 videos check session 512, about offline web apps - we also found it really helpful.

Making an app work offline means making two components available: the application assets (the base HTML pages, images, CSSs and JavaScript), and the user's data.

Offline User Data

Let's start with local user data storage, since our current implementation is quite straightforward. To go further than cookies and store data on the browser there are two widely deployed solutions, and a third one in the works.

The first solution is supported on all modern browsers - and I mean all of them, even Internet Explorer 8: localStorage - a basic key/value store.
Second is Web SQL database - a spec that WebKit based browsers and Opera also support - which is a thin layer on top of sqlite, giving you full relational power. The problem is that Firefox said it will never implement it.
The third option, IndexedDB, is not yet deployed at all, so it's not really an option.

So, given these options, we choose localStorage. The values stored need to be strings, but that's not actually a big deal - you can serialize data to a JSON string and store that.
Cube does most of the template rendering client side; a GET is made to the server to fetch a list of results, and that response, in JSON format, is rendered on the fly to the HTML you see on your screen.
We're caching that response (the key is the full GET URL, including parameters); next time we need that data, if we have it on cache we show it immediately, and continue with the HTTP request on the background, updating the interface if/when we get a response.
If you're offline, the request will fail, but you'll see the last cached version of the data; if you are online the page will seem to load much faster.

We don't support offline editing. Our iOS apps do that - you can change data offline, stuff gets queued, when you're back online the queue is sent to the server, conflicts are dealt with if necessary. However, on iOS we have the benefits of a full SQL database. We're keeping an eye on the IndexedDB/Web SQL story, as we'd really like to have offline editing working, but we feel localStorage currently isn't enough.

Offline Application Assets

For the application assets HTML5 defines an application cache: based on a manifest you determine which files should be available offline. You can read more about it here.

Quick recap: there are three sections on the manifest - CACHE (the default section, stuff to be stored offline), NETWORK (stuff that we should always retrieve from the network) and FALLBACK (what to show if we're offline and try to fetch something from the network). Also, every page that points to the cache manifest is automatically added to the cache as a MASTER entry, that for all that matters behaves as if it was in the CACHE section.

So, without further ado, here's a slighly simplified version of our cache manifest, pointed to by all our offline-enabled pages:

CACHE MANIFEST

# Cube Offline Manifest

# pmd: {{ request.person.modification_date_fmt }}
# cmd: {{ request.company.modification_date_fmt }}
/media/{% fingerprint "main.min.css" %}
/media/{% fingerprint "all.min.js" %}
http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
{% if request.company.logo_url %}
{{ request.company.logo_url }}
{% endif %}
/jsi18n/?lang={{ request.LANGUAGE_CODE|default:'' }}&lang_hash={{ request.LANGUAGE_CODE_HASH|default:'' }}
/media/{% fingerprint "favicon.ico" %}
/media/{% fingerprint "ui/bullet_arrow_down.png" %}
# ... more UI assets
/media/{% fingerprint "images/ui-icons_cd0a0a_256x240.png" %}
http://www.google.com/jsapi
http://www.google.com/uds/api/visualization/1.0/efdb9398400374ac755c95e00e6250bc/default%2Cbrowserchart%2Ccolumnchart%2Cpiechart.I.js
http://s3.amazonaws.com/getsatisfaction.com/javascripts/feedback-v2.js

FALLBACK:
/ /offline.html

NETWORK:
*

First of all, notice all the {{ }} and {% %} stuff. That's not in the spec - and that's because our cache manifest is dynamic.

We use Django's templating engine for server side templating. In our example, the most important part is that {{ some.expression }} outputs the value of the expression - so, for instance {{ request.person.modification_date_fmt }} outputs the modification date of the currently logged in user. There's no dependency at all on Django for the stuff I'm about to describe - you could do it just as well in JSP or ASP.

So, why have a dynamic cache manifest? Well, for that we need to discuss how the browser interacts with the network when a cache manifest is specified.

Imagine you arrive at http://cube.bitrzr.com/, having previously logged in and enabled offline mode. The page will be present on the offline cache, so the server will not even try to load it, it will display it right away.
On the background, the browser will download the manifest and check if it matches byte by byte with the version it has stored. Even comments are taken into account. If it does match, well, that's it, job well done.
But if it doesn't, the new manifest is parsed, and all the cached resources will be redownloaded - actually, the normal HTTP flow takes over, so if you have your expiration headers configured appropriately only files that actually changed will be downloaded.

Remember when we said that most templating is done client side? Well, some stuff still comes from the server embeded on the base HTML page - for instance, the links on the bottom of the page, or the logout link on the top left. And, no, we don't really want to change that, at least in the near future.

So imagine you have a non-dynamic cache manifest, and the user goes to the settings page, and changes the language from English to French. He them reloads the page and - bang! - the logout link is still in English.

Because, in fact the cache manifest didn't change. And since it didn't change, http://cube.bitrzr.com/ wasn't downloaded again.

So ideally what we would like would be something like a function that said "Hey browser, I know that my cache just got stale. Next time, please do the whole checking process, not just the cache manifest download thingy".
Sadly, that function doesn't exist. You can tell the browser to go check the manifest right now, so if it has changed the resources will be downloaded right away and will be ready for next time, but if the manifest didn't change, nothing will happen.

So, here's what we actually do. You save the settings. That does a POST to the server, and changes the value on the user's profile. And as a side effect, updates the last modifed date of the user's profile. If the request succeeds, we tell the browser "I have a feelling the cache needs to be updated", by calling window.applicationCache.update(). And take a look at this line in the manifest:

# pmd: {{ request.person.modification_date_fmt }}

This will contain the user's profile last modification date - so, if the user changes a setting, this line will change, so the manifest won't match with the cached version, and the whole thing will be rechecked using the regular HTTP workflow!

Far Future Expires

Before we continue looking at our manifest, let's take a second to discuss HTTP resource caching.
If you tell a browser that a resource won't expire till the end of times the browser won't even check if the resource has changed, and so you save a round trip to the server. Problem is, what if the resource actually changes?

Well, a way to work around that is to make sure the resource's URL also changes. So, for instance, let's say all your app's code is in a file called all.js; instead of serving it from /media/all.js, you could serve it from /media/version1/all.js. If you made changes to the file, you could drop it into /media/version2/all.js and change all references to that new URL. The browser would actually see a new resource, and whatever caching was specified for version 1 wouldn't be in effect.

A slightly smarter and less error prone way to do this would be to calculate a checksum of the file and use that instead of the version; the process could then automated, and no mistakes would be possible. So the file would be served from /media/ABCD5443/all.js and if it was changed, the fingerprint would also change, and it would start being served from something completely, like /media/EFAB3431/all.js.

That's exactly what we're doing. During our build process, after we minify resources et al, we calculate the fingerprint of each of the static resources, and store it in a dictionary. Later on, a custom Django tag, fingerprint, reads the value from that dictionary, and outputs the tagged URL. Back to the cache manifest, where you see:

/media/{% fingerprint "main.min.css" %}

The server will actually send the client something like:

/media/343423ef/main.min.css

And that's also the URL being referenced from the HTML files. So, even without using the fancy HTML5 app cache, we could reduce the amount of required requests to the server, and speed up page loading.

Offline Resource Versioning

Back to the offline app manifest - what happens if you change a JavaScript file that's available offline? Well, nothing - the browser won't check for changes if the manifest doesn't change, right?

That's why some people recommend you include a comment with a version number on your manifest, something like this:

# Cube Offline Manifest v34

The idea is that when you make changes to the resources you increase this number. Well, if you go back to our manifest you'll notice we don't include a version field. And we think you also shouldn't.

Problem with this approach is that it's fragile. What happens if someone changes a file but forgets to update the version? You could come up with a way to automatically update it, but we suggest a different approach: implement the far future expires method explained above.

That way, each time you change a resource its fingerprint will change, and that will cause the manifest file to change, triggering the resource checking. And since the static resources will be cacheable till the end of time only the file that actually changed will be redownloaded.

Fallback URLs

We're not especially satisfied is with the way we're handling offline fallbacks. You see, what we'd like to have is a way to say that if we are offline and the browser is asking for a JSON resource we'd like to return a specific file (in JSON format), else we'd like to return an HTML file.

Since that kind of MIME type sniffing is not possible, and since we didn't want to explicitly list all the different JSON or HTML endpoints in the cache manifest (careful: there's an implicit * at the end of each entry, and you can't have conflicts - each URL must be covered by only one rule), we're including a magic meta tag in our offline.html file, and if a JSON call fails with a parse error we check for that value, and act accordingly.

Conclusion

We're quite happy with our current offline implementation; it gives our users significant functionality and performance advantages.

How do you feel about the techniques we used? Are there better ways to do it? Let us know in the comments!

5 comments:

  1. I like your idea of including a checksum in the URL path. I've also thought about ways to make the App Cache process smarter.

    Curious what you think about http://blog.sethladd.com/2010/10/proposal-to-enhance-html5-app-cache.html

    ReplyDelete
  2. Hi Seth,

    Actually I find your proposal is similar in spirit to the what I'm advocating, regarding avoiding 304s when you already know you'll be getting them.

    Embedding the ETags/Last Modified Dates would also handle versioning of assets -> if the asset changes, those change, and the manifest changes - great!

    But I see two advantages on my approach using fingerprints + far future expires:
    - it'll work even with browsers that don't support appcache, or users that don't enable the functionality (assuming you make it opt in)
    - we can start using it today - without waiting for new browser releases (not a problem with Chrome, but still a problem with the rest of the ecosystem) or for the W3C to include it in the standard.

    Btw, how do you feel about the workaround we're using for not having a way to manually invalidate an entry from the cache (using the user profiles last modified date, etc)?
    I more and more convinced that MASTER entries should have a way to be refreshed on demand - since it's not feasible to change their URL, as I'm doing for CACHE entries.

    Pedro Morais

    ReplyDelete
  3. Hi, thanks for this writeup. It's fantastic. I have a question: do you know how to PREVENT cache updating on app launch? The reason I ask is that on some devices it takes a LONG TIME to check for updates (i.e. a mobile device with a slow connection), and the whole time this is happening, the user cannot interact with the app. While the browser is checking the manifest, the app is frozen. I've tried to set my expires and cache-control headers on the manifest to 1 week, hoping that updating will occur once/week, but this did nothing. Thanks in advance for any insight on this problem.

    ReplyDelete
  4. This is a nice writeup. I really like the idea of "Far Future Expires". Although, I have a question. Will browser treat /media/checksumA/all.js and /media/checksumB/all.js as two different files and store them separately? Browser will not understand the tags and those are two different URLs for it. Still, I am not sure. Thanks!

    ReplyDelete
  5. That's cool and sounds reasonable. That's the best I found about this subject on the net. With it, my first cache manifest started to work. There is so much wrong information on it out there...

    Would you be willing to share your custom django fingerprint tag and how you create the fingerprint files? Please! I am not into django, unfortunately, but am very interested in your Far Future expires solution.

    Thanks anyway!

    Warm regards
    Micsi

    ReplyDelete