Content & Structure Optimization
To strategically organize web content, one of the primary questions to ask is, "Is the site easy to navigate and can users find the information they seek?"
To help improve the chances of your website ranking higher in the search engines, it is important to ensure that the site is easy to navigate. This is accomplished by arranging information in a logical pattern, by thematically arranging content silos through URL structures and internal linking by applying appropriate keywords for page theming, etc. This gives structure as opposed to scattering random articles and disorganized thought-flow on the site. The purpose of good website organization is to ensure solid structure and consolidation of content in a logical pattern, so as to be straight-forward to navigate, both for users and search engines as they crawl and index your page(s).
Topic Scattering Analysis
Topic Scattering Analysis is the process of analyzing the existing content on your website to identify areas where content for specific topics is scattered across multiple pages/sections, and purposefully organizing that content into larger cohesive units of information. The analysis begins by checking whether or not the topic is being covered in multiple silos or sections of the site, then organizing the information and content in a way that makes sense and is easy to digest. Topic Scattering Analysis is crucial for consolidating the information and content on your website and providing search engine spiders logically structured information to digest.
As you evaluate topics on each page, analysis must be made to see if there is any opportunity to merge the information onto one location. For example, if you have a page covering details on a wooden table, and on another page with details about a red table, it might be helpful to combine and consolidate the information into one space, if the content does not benefit from having two distinct pages for fairly similar information. Combining content enables Google to consolidate ranking signals on important pages and can crawl your site more effectively.
Themeing Evaluation (Siloing)
Themeing or siloing refers to the organization of a website's content by concentrating related topics within a well-thought-out directory structure which houses content that targets keywords with progressive specificity.
- www.example.com/widgets : A section which would target top-level and generic keywords about widgets.
- www.example.com/widgets/counterfeit.html : A section which would target secondary keywords having do to only with counterfeit widgets.
- www.example.com/widgets/counterfeit/how-to-recognize.html A very specific page targetting keywords which have to do with learning how to identify counterfeit widgets.
This exercises the same concept of logically arranging information on your website, except instead of consolidating data onto one page, it analyzes where separation of content can be made. In the case where too much information is being packed onto one page, you have to evaluate the content distribution and see if there might be a better placement on the site. Analyze which topics might be expanded, which subject you could be separated onto different pages, for a more user-friendly website.
Search engines consider every URL to be a unique object or page. Every instance of duplicated content, regardless of the purpose of the page, will negatively affect rankings if it is allowed to be crawled by a search engine. It is sometimes necessary to have two (or more) pages with the same content; however, even if the content is helpful to users and makes sense, it's presence in the search engine indices will cause ranking problems. It is recommended to exclude exact (or even similar) copies of any content from the search engines, or if possible to avoid having duplicate content to being with.
Duplicate content can be caused by a number of things, including URL parameters, printer-friendly versions of pages, session IDs, and sorting functions. These kinds of pages tend to be a normal, helpful part of a website but they still need to be addressed in order to avoid serving a duplicate page to the search engines. There are several recommended methods one can go about in fixing duplicate content: 301 redirects, the rel="canonical" tag, robots.txt exclusions, and noindex meta tag.
A 301 redirect, or permanent redirect, sends both users and spiders who arrive on a duplicate page, directly to the original content page. These redirects can be used across subfolders, subdomains and entire domains as well.
The rel="canonical" attribute acts similarly to a 301 redirect, with a few key differences. The first being that while the 301 redirect points both spiders and humans to a different page, the canonical attribute is strictly for search engines. With this method, webmasters can still track visitors to unique URLs without incurring any penalty. The tag which can carry the canonical attribute is structured as follows.
<link rel="canonical" href="http://www.example.com/original-content.html" />
The tag would be placed in theof the HTML document which needs to assign attribution to the page which the search engines should deem the original. Webmasters can also exclude pages from search engines through the use of a noindex meta tag on specific pages. Using the noindex meta tag, webmasters can ensure the content of that page will not be indexed and displayed in the search result pages.
<meta name="robots" content="noindex" />
<meta name="robots" content="noindex,nofollow" />
The final recommended method involves using a robots.txt file. Using robots.txt, webmasters can provide directives to search engine spiders to keep them from indexing certain parts of a website. The URL of these pages may still show up in some search engine indexes, but only if the URL of the page is search for specifically. Tip: While official search engine bots (spiders) will follow robots.txt protocol, malicious bots often ignore them entirely.
If placed within the robots.txt file the below directive would prohibit the bing spiders from crawling and indexing the 'widgets' directory.
Thin content describes, both, pages which have very little content, or pages which may have a lot of content of little value. The latter is more accurate a description as there can be pages with very little content which are useful (i.e., if a topic only takes a few sentences to cover/describe, then it is not necessary to generate a encyclopedic volumes of content for it).
According to Matt Cutts, the head of Google's web spam team, thin content contributes either very little or no new information to a given search. This problem is particularly common for e-commerce sites that may have hundreds or thousands of pages for different products with only minimal product details and information.
The best long-term solution to this problem is simply to create unique content for every web page which might contain duplicate or lackluster information. By supplementing repeated information with sections of unique text, like a thorough description, review, opinion, video, or brief editorial, webmasters can increase their website's relevance to search engines.
The canonical can be used to help avoid creating duplicate content by specifying the original publication page of a piece of content. One specific use for the canonical tag would be on a page which lists products, and that has sorting functions which produce different URLs depending on how the products are being sorted. In this case, any variation in sorting from the default presentation can utilize a canonical tag to indicate that the the original URL is the only one that should be indexed.
For example, if a webpage is listing a variety of widgets, and has the URL www.example.com/widgets.html, and offers a sorting link (www.example.com/widgets.html&sort=price)to allow for the widgets to be sorted based on price, then it will become necessary to utilize the canonical tag on www.example.com/widgets.html&sort=price to indicate that the original content is housed at www.example.com/widgets.html, and that www.example.com/widgets.html&sort=price should not be indexed. The canonical tag on www.example.com/widgets.html&sort=price would look like this and be placed in theof the document.
<link rel="canonical" href="http://www.example.com/widgets.html">