With the emergence of heavy javascript / AJAX heavy frameworks and the growing popularity of things like AngularJS, Ember, Backbone.js, CanJS, and even JQuery; making sites and single page apps crawlable to search engines are becoming increasingly difficult. It doesn't have to be.
This presentation takes a look at some of the largest and trending publishers and some of the AJAX features they employ.
9. Mar 2004:“Googlebot/Test” External JS
Mar 2006: Googlebot Uses Onsite Live Chat
June 2010: Caffeine (Full Rollout)
Nov 2010: Instant Preview
May 2014: GWT Fetch & Render
May 2012: Matt PSA. Don’t Block JS & CSS
Oct 2009:AJAX Crawlability _escaped_fragment_
Nov 2007: Spider’sView on Web 2.0
May 2013: MattVideo. Googlebot & AJAX
45. Quote conflated from my favorite ruby XML parser » http://nokogiri.org/
Speed, Performance, and Human Perception » https://www.youtube.com/watch?v=7ubJzEi3HuA
SERoundtable Timeline Links »
http://www.seroundtable.com/google-javascript-webmaster-tools-18602.html
Googlebot/Test External JS » http://www.seroundtable.com/archives/000236.html
Googlebot Uses Onsite Live Chat » http://www.seroundtable.com/archives/003492.html
Spider’sView on Web 2.0 »
http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html
AJAX Crawlability Proposal »
http://googlewebmastercentral.blogspot.com/2009/10/proposal-for-making-ajax-crawlable.html
Caffine Rollout »
http://googlewebmastercentral.blogspot.com/2010/06/our-new-search-index-caffeine.html
Instant Previews »
http://googleblog.blogspot.com/2010/11/beyond-instant-results-instant-previews.html
http://googlewebmastercentral.blogspot.com/2010/11/instant-previews.html
http://googlewebmastercentral.blogspot.com/2011/11/get-post-and-safely-surfacing-more-of.html
https://sites.google.com/site/webmasterhelpforum/en/faq-instant-previews
Matt Cutts PSA: Don’t Block JS & CSS »
http://www.seroundtable.com/googlebot-javascript-css-14930.html
MattVideo: How Does Googlebot handle content loaded via AJAX? »
https://www.youtube.com/watch?v=_6mtiwQ3nvw
REFERENCES
46. GWT Fetch & Render »
http://googlewebmastercentral.blogspot.com/2014/05/rendering-pages-with-fetch-as-google.html
Google Blog: Infinite Scroll Recommendations & Example »
http://googlewebmastercentral.blogspot.com/2014/02/infinite-scroll-search-friendly.html
LA Times Reimagined by Code and Theory »
http://www.codeandtheory.com/things-we-make/the-los-angeles-times-reimagined
Google Blog: Specify your canonical »
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html
Google Blog: Pagination with rel=“next” and rel=“prev” »
http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html
Google Blog:Video about Pagination »
http://googlewebmastercentral.blogspot.com/2012/03/video-about-pagination-with-relnext-
and.html
One Page Wonder: Coverage on QZ »
http://www.foliomag.com/2013/one-page-wonder-infinite-scroll
The Next Web Redesign Coverage »
http://www.niemanlab.org/2012/10/the-next-web-redesigns-to-be-more-app-like/
The Next Web Press Release »
http://thenextweb.pr.co/
010a893a11df2bb61d981b2b0607c1b6784a5ab07b5ab100790b2bb3168a35f8
REFERENCES
47. USA Today Redesign »
http://blog.f-i.com/usatoday-com-redesigning-one-of-americas-most-popular-news-site/
http://designenvy.aiga.org/usa-today-website-redesign-fantasy-interactive/
http://www.businessinsider.com/usa-todays-homepage-redesigns-2012-9
Gawker 1Year Later Success »
http://thenextweb.com/insider/2012/02/02/remember-that-gawker-redesign-a-years-worth-of-
data-says-it-worked/
http://www.businessinsider.com/nick-denton-loses-bet-that-the-gawker-redesign-wouldnt-hurt-
traffic-2011-10
http://www.businessinsider.com/gawker-media-traffic-numbers-2011-4
Gawker Failed Coverage »
http://www.catchmyfame.com/2013/05/02/how-gawker-sabotaged-their-own-network-with-a-
horrible-new-layout/
http://www.theatlantic.com/technology/archive/2011/04/gawkers-traffic-numbers-are-worse-than-
anyone-anticipated/237594/
http://www.webmonkey.com/2011/02/gawker-learns-the-hard-way-why-hash-bang-urls-are-evil/
Paul Irish to Matt CuttsVideo » https://www.youtube.com/watch?v=yiAF9VdvRPw
Google Developer Documentation on AJAX Crawlability »
https://developers.google.com/webmasters/ajax-crawling/
Browser Compatibility Chart » http://caniuse.com/#search=history
Breaking The Web With Hash Bangs »
http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs
REFERENCES
48. Vox Cards: Legalization of Marijuana »
http://www.vox.com/cards/marijuana-legalization/learn-more-about-marijuana-legalization
Bing’s Duane Forrester says still no rel=canonical in http headers »
https://twitter.com/DuaneForrester/status/459387860358295552
Google Blog:A Faster Image Search »
http://googlewebmastercentral.blogspot.com/2013/01/faster-image-search.html
Google Says It’s Better for Webmasters »
http://www.seroundtable.com/google-image-search-design-16259.html
Ilya Grigork discussion around <plaintext> injection »
https://plus.google.com/+IlyaGrigorik/posts/S6j45VxNESB
Vox Workflow for Creating SVG Images »
http://product.voxmedia.com/2013/11/25/5426880/polygon-feature-design-svg-animations-for-fun-
and-profit
One Solution to Responsive Images »
http://www.smashingmagazine.com/2014/02/03/one-solution-to-responsive-images/
Truly Responsive Images » http://davidwalsh.name/responsive-design
AngularJS NYC Meetup: Server-side Template Rendering by HBO »
http://youtu.be/iB7hfvqyZpg?t=58m20s
REFERENCES
49. Vox Cards: Legalization of Marijuana »
http://www.vox.com/cards/marijuana-legalization/learn-more-about-marijuana-legalization
Bing’s Duane Forrester says still no rel=canonical in http headers »
https://twitter.com/DuaneForrester/status/459387860358295552
Google Blog:A Faster Image Search »
http://googlewebmastercentral.blogspot.com/2013/01/faster-image-search.html
Google Says It’s Better for Webmasters »
http://www.seroundtable.com/google-image-search-design-16259.html
Ilya Grigork discussion around <plaintext> injection »
https://plus.google.com/+IlyaGrigorik/posts/S6j45VxNESB
Vox Workflow for Creating SVG Images »
http://product.voxmedia.com/2013/11/25/5426880/polygon-feature-design-svg-animations-for-fun-
and-profit
One Solution to Responsive Images »
http://www.smashingmagazine.com/2014/02/03/one-solution-to-responsive-images/
Truly Responsive Images » http://davidwalsh.name/responsive-design
Serious Angular SEO » http://www.ng-newsletter.com/posts/serious-angular-seo.html
AngularJS NYC Meetup: Server-side Template Rendering by HBO »
http://youtu.be/iB7hfvqyZpg?t=58m20s
REFERENCES
50. Josh Kadis Quartz onVIP WordpressVideo »
http://vip.wordpress.com/2013/09/26/josh-kadis-qz-wordpress/
https://docs.google.com/file/d/0B2Z4K6ynFLg5TVdvWVV1aTRmYUU/edit?pli=1
AirBNB: Our First Node.js App »
http://nerds.airbnb.com/weve-launched-our-first-nodejs-app-to-product/
AirBNB: Rendr (Backbone in the Browser and Node) »
http://nerds.airbnb.com/weve-open-sourced-rendr-run-your-backbonejs-a/
StackOverflow: PushState, Backbone, and Node »
http://stackoverflow.com/questions/7098130/reusing-backbone-views-routes-on-the-server-when-
using-backbone-js-pushstate-for
Google: How do I create an HTML Snapshot (HIJAX) »
https://developers.google.com/webmasters/ajax-crawling/docs/html-snapshot
REFERENCES
Editor's Notes
Me on the Web
Places I’ve worked & their sites
More and more publishers are using AJAX for everything.
citations:
Quote conflated from my favorite ruby XML parser » http://nokogiri.org/
Why more AJAX?
Speed: Smaller the Better. 10k Challenge
Performance: Under 100 ms is the avg. threshold of human reaction time
Human Perception: 16ms == 60 FPS for silky smooth movement
citations:
Speed, Performance, and Human Perception » https://www.youtube.com/watch?v=7ubJzEi3HuA
Chances are you’re using at least jQuery on your sites, and if you have or are thinking about having a Single Page App (SPA) or an AJAX heavy site, you might be using backbone, angular, or ember in the near future.
I’ve sped up my site, what does that mean for SEO?
SPIN spent March – May speeding up the sites + other “basic” SEO improvements (wasn’t just speed)
Over next 3 months saw increase # of pages crawled per day 80%
WHY?
Not entirely sure, but there are a number of factors … we assume they’re due to secondary search signals.
Increased # of PV/V results in additional social shares which lead to additional links.
Decreased % of bounces results in fewer search refinements
Brand equity increases over time which results in higher CTR and branded searches.
What’s the problem?
While Googlebot can technically crawl javascript, it doesn’t get everything all the time.
Running a headless browser at webscale is nuts when you consider the events, callbacks, and triggers
Citations:
SERoundtable Timeline Links » http://www.seroundtable.com/google-javascript-webmaster-tools-18602.html
Googlebot/Test External JS » http://www.seroundtable.com/archives/000236.html
Googlebot Uses Onsite Live Chat » http://www.seroundtable.com/archives/003492.html
Spider’s View on Web 2.0 » http://googlewebmastercentral.blogspot.com/2007/11/spiders-view-of-web-20.html
AJAX Crawlability Proposal » http://googlewebmastercentral.blogspot.com/2009/10/proposal-for-making-ajax-crawlable.html
Caffine Rollout » http://googlewebmastercentral.blogspot.com/2010/06/our-new-search-index-caffeine.html
Instant Previews »
http://googleblog.blogspot.com/2010/11/beyond-instant-results-instant-previews.html
http://googlewebmastercentral.blogspot.com/2010/11/instant-previews.html
http://googlewebmastercentral.blogspot.com/2011/11/get-post-and-safely-surfacing-more-of.html
https://sites.google.com/site/webmasterhelpforum/en/faq-instant-previews
Matt Cutts PSA: Don’t Block JS & CSS » http://www.seroundtable.com/googlebot-javascript-css-14930.html
Matt Video: How Does Googlebot handle content loaded via AJAX? » https://www.youtube.com/watch?v=_6mtiwQ3nvw
GWT Fetch & Render » http://googlewebmastercentral.blogspot.com/2014/05/rendering-pages-with-fetch-as-google.html
One of these most common AJAX thing in publishing. The basic example is navigational / pagination.
For responsive sites, the infinite scroll on a mobile experience is really a great time saver and a great user experience.
Think default WordPress Blog
Check out the Google Example
http://googlewebmastercentral.blogspot.com/2014/02/infinite-scroll-search-friendly.html
Citations:
Google Blog: Infinite Scroll Recommendations & Example » http://googlewebmastercentral.blogspot.com/2014/02/infinite-scroll-search-friendly.html
At minimum have a crawlable link to the next page.
The load more button doesn’t need to be constantly present. Think Old Skool Facebook.
LA Times does a nice job of linking deeper
Citations:
LA Times Reimagined by Code and Theory » http://www.codeandtheory.com/things-we-make/the-los-angeles-times-reimagined
For the crawlable pages and for series pages like navigational pages, use rel=next/prev + canonical to consolidate.
See Maile’s great video on the topic
Citations:
Google Blog: Specify your canonical » http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.htmlGoogle Blog: Pagination with rel=“next” and rel=“prev” » http://googlewebmastercentral.blogspot.com/2011/09/pagination-with-relnext-and-relprev.html
Google Blog: Video about Pagination » http://googlewebmastercentral.blogspot.com/2012/03/video-about-pagination-with-relnext-and.html
Pointer URLs are a recommended best practices from both a re-crawlability and user experience stand point …
But, they can actually require a good amount of technical overhead to do correctly. Additionally, currently it doesn’t seem to have a tremendous negative impact on crawl discoverability.
So in practice, I wouldn’t use these for publishing sites, at least for the navigational pages since most users aren’t really sharing pagination pages too often
However, I’d imagine it’s different in the eCommerce world.
The more interesting infinite scroll experiences that are immerging are on
Quartz, Mashable, Gawker, LA Times, VOX, USA Today, TheNextWeb
The idea that “Every page is a homepage”
Gawker was a “big failure” but made out in the end.
Same with USA Today, TheNextWeb, and I’d imagine the LA Times will be a similar story.
Citations:
One Page Wonder: Coverage on QZ » http://www.foliomag.com/2013/one-page-wonder-infinite-scroll
The Next Web Redesign Coverage » http://www.niemanlab.org/2012/10/the-next-web-redesigns-to-be-more-app-like/
The Next Web Press Release » http://thenextweb.pr.co/010a893a11df2bb61d981b2b0607c1b6784a5ab07b5ab100790b2bb3168a35f8
USA Today Redesign »
http://blog.f-i.com/usatoday-com-redesigning-one-of-americas-most-popular-news-site/
http://designenvy.aiga.org/usa-today-website-redesign-fantasy-interactive/
http://www.businessinsider.com/usa-todays-homepage-redesigns-2012-9
Gawker 1 Year Later Success »
http://thenextweb.com/insider/2012/02/02/remember-that-gawker-redesign-a-years-worth-of-data-says-it-worked/
http://www.businessinsider.com/nick-denton-loses-bet-that-the-gawker-redesign-wouldnt-hurt-traffic-2011-10
http://www.businessinsider.com/gawker-media-traffic-numbers-2011-4
Gawker Failed Coverage »
http://www.catchmyfame.com/2013/05/02/how-gawker-sabotaged-their-own-network-with-a-horrible-new-layout/
http://www.theatlantic.com/technology/archive/2011/04/gawkers-traffic-numbers-are-worse-than-anyone-anticipated/237594/
http://www.webmonkey.com/2011/02/gawker-learns-the-hard-way-why-hash-bang-urls-are-evil/
How do you do it right?
Personal preference _escaped_fragment_ :
Don’t tend to see lots of hash values in SERPs
It’s ugly & confusing
Don’t like serving something different to just Googlebot (this can be a slippery slope)
History.pushState FTW! (Paul Irish to Matt Cutts » https://www.youtube.com/watch?v=yiAF9VdvRPw)
Like a stack of index cards, pushState adds more cards on top. ReplaceState swaps it out.
Graceful degradation … don’t AJAX.
Reminder: For continuous content, you really don’t want to use rel=next/prev unless they’re truly in a series you want to consolidate together
Citations:
Paul Irish to Matt Cutts Video » https://www.youtube.com/watch?v=yiAF9VdvRPw
Google Developer Documentation on AJAX Crawlability » https://developers.google.com/webmasters/ajax-crawling/
Browser Compatibility Chart » http://caniuse.com/#search=history
Breaking The Web With Hash Bangs » http://isolani.co.uk/blog/javascript/BreakingTheWebWithHashBangs
Moar AJAX!
SpinMedia saw increase in PV/V, Reduced Bounce Rate, and Flat Time on Site (we’re faster … more PV/V)
Galleries are pretty much the same as Continuous Content
This is where you’ll use rel=next/prev for each slide
You can even transition into the next gallery just like Continuous Content
The next button should be the link to the next slide
The prev button to the previous slide
Can load the whole JS bundle onto the page ahead of time or pull via JSON
Citations:
Vox Cards: Legalization of Marijuana » http://www.vox.com/cards/marijuana-legalization/learn-more-about-marijuana-legalization
2 types:
Lazy Loading
Responsive Images
Both great for mobile
I Wish Googlebot and Bingbot would support rel=canonical in http headers for images. But they don’t. Bingbot doesn’t even support the http header
(Trust we tried really hard to make this work)
Citations:
Bing’s Duane Forrester says still no rel=canonical in http headers » https://twitter.com/DuaneForrester/status/459387860358295552
Reminder:
We not making this optimization for Google Image search traffic.
Google’s Jan 2013 Image Page Redesign that’s “better for the UX”
We do it for the better UX which leads to secondary search signals. Because sadly, there’s no good crawlability option to date.
Although UX from Image Search Sessions improved, the overall net was worse.
PageViews / Session
Citations:
Google Blog: A Faster Image Search » http://googlewebmastercentral.blogspot.com/2013/01/faster-image-search.html
Google Says It’s Better for Webmasters » http://www.seroundtable.com/google-image-search-design-16259.html
When lazy loading there are many options.
1x1s (should set image size)
Skeleton Screens are a cool “human perception” experience
Don’t sweat the navigational / aggregate pages.
Make sure images are fully crawlable on the article / story / gallery page
This is still a mess.
There is no good standard. <picture> and srcset seem to be the way of the future, but it’s still limited.
Srcset
Javascript
Browser detection
CSS Queries for Double Density
SVG solutions (but this isn’t quite practical at this time): Vox Workflow
Creative Solution: Inject <plaintext>
Citations:
Ilya Grigork discussion around <plaintext> injection » https://plus.google.com/+IlyaGrigorik/posts/S6j45VxNESB
Vox Workflow for Creating SVG Images » http://product.voxmedia.com/2013/11/25/5426880/polygon-feature-design-svg-animations-for-fun-and-profit
The most scalable quick solution for now is NOSCRIPT
Caution: <noscript> has traditionally been a spammy place … but it’s probably still worth the risk. Just like display:none and -9999px
Libraries to take advantage of the data-src attributes.
Citations:
One Solution to Responsive Images » http://www.smashingmagazine.com/2014/02/03/one-solution-to-responsive-images/
Truly Responsive Images » http://davidwalsh.name/responsive-design
An interesting concept … I’m not sure I’d fully go this route, but worth looking at.
They’ve had problems with SEO … “affiliate” links and incentives
Looking at their HTML, it doesn’t appear as though they spent a large amount of time on on-site SEO.
However, they do well and have an interesting user experience.
Search for some lyrics
Get a cool focused, and targeted UX.
Used to be done with referrer sniffing before “Not Provided”
Javascript Redirects …. Something I wouldn’t recommend or do on my own sites … it’s too close to what blackhats use.
However, I couldn’t say I’d have another solution to achieve their UX.
If you’re not the implementer, chances are you’ll have to convince your engineering team of what the right possible solutions might be.
Make friends with your engineering team, and know what you’re talking about before requesting it.
2 camps
Pre-render or Server Side Render
Neither is right or wrong, just different. Pick what works for your technology.
I’ll cover the most popular implementations right now, but with tech anything goes. Make it what you want.
Pros:
Single MVC / MVW
Single Routing Logic
Cons:
Cron / Cache Expiration Headache
Render Could be Different
Potentially Serving Something different for Googlebot
_escaped_fragment_
Caveat: I’ve never used any of these services listed … proceed with caution.
Citations:
AngularJS NYC Meetup: Server-side Template Rendering by HBO » http://youtu.be/iB7hfvqyZpg?t=58m20s
QZ Architecture
Citations:
Josh Kadis Quartz on VIP Wordpress Video »
http://vip.wordpress.com/2013/09/26/josh-kadis-qz-wordpress/
https://docs.google.com/file/d/0B2Z4K6ynFLg5TVdvWVV1aTRmYUU/edit?pli=1
Clean URLs aren’t necessarily specific to Server Side Rendering, you can have them with pre-rendering … but it’s not common with pre-rendering solutions.
No more secondary caching headache. Expires on data update or by standard tested practices.
QZ is the main example where there’s 2 templates for the view.
The proposed alternate solution is to consolidate to 1 template.