SES in London is going on, and Vanessa Fox, who is over the Sitemaps team spoke today at the Successful Site Architecture panel. She was kind enough to offer up some tips for building crawlable sites for those who weren’t in London.
By the way, Vanessa is extremely bright and intelligent. I have had the pleasure of talking about site architecture in the past at Webmaster World – she knows her stuff.
Make sure visitors and search engines can access the content
- Check the Crawl errors section of webmaster tools for any pages Googlebot couldn’t access due to server or other errors. If Googlebot can’t access the pages, they won’t be indexed and visitors likely can’t access them either.
- Make sure your robots.txt file doesn’t accidentally block search engines from content you want indexed. You can see a list of the files Googlebot was blocked from crawling in webmaster tools. You can also use our robots.txt analysis tool to make sure you’re blocking and allowing the files you intend.
- Check the Googlebot activity reports to see how long it takes to download a page of your site to make sure you don’t have any network slowness issues.
- If pages of your site require a login and you want the content from those pages indexed, ensure you include a substantial amount of indexable content on pages that aren’t behind the login. For instance, you can put several content-rich paragraphs of an article outside the login area, with a login link that leads to the rest of the article.
- How accessible is your site? How does it look in mobile browsers and screen readers? It’s well worth testing your site under these conditions and ensuring that visitors can access the content of the site using any of these mechanisms.
- Ensure the important text and navigation in your site is in HTML, not in images, and make sure all images have ALT text that describe them.
- If you use Flash, use it only when needed. Particularly, don’t put all of the text from your site in Flash. An ideal Flash-based site has pages with HTML text and Flash accents. If you use Flash for your home page, make sure that the navigation into the site is in HTML.
- Make sure each page has a unique title tag and meta description tag that aptly describe the page.
- Make sure the important elements of your pages (for instance, your company name and the main topic of the page) are in HTML text.
- Make sure the words that searchers will use to look for you are on the page.
- If possible, avoid frames. Frame-based sites don’t allow for unique URLs for each page, which makes indexing each page separately problematic.
- Ensure the server returns a 404 status code for pages that aren’t found. Some servers are configured to return a 200 status code, particularly with custom error messages and this can result in search engines spending time crawling and indexing non-existent pages rather than the valid pages of the site.
- Avoid infinite crawls. For instance, if your site has an infinite calendar, add a nofollow attribute to links to dynamically-created future calendar pages. Each search engine may interpret the nofollow attribute differently, so check with the help documentation for each. Alternatively, you could use the nofollow meta tag to ensure that search engine spiders don’t crawl any outgoing links on a page, or use robots.txt to prevent search engines from crawling URLs that can lead to infinite loops.
- If your site uses session IDs or cookies, ensure those are not required for crawling.
- If your site is dynamic, avoid using excessive parameters and use friendly URLs when you can. Some content management systems enable you to rewrite URLs to friendly versions.
Make sure your content is viewable
Keep the site crawlable
A most excellent post from Vanessa with a lot of excellent resources.