In general, trying to make search engines crawl and index more content is a real headache for SEO. If you are not listed and indexed, you can have no rankings. When a website has reached a certain scale, it’s not easy to have it fully indexed, especially for websites with hundreds of thousands of pages. No matter how well designed and optimized the site structure is, 100% indexation by the search engines is impossible. All you can do is try to maximize the collection rate.
But sometimes preventing search engines from indexing may also become a problem. For example, confidential information, copied contents, and advertising links may be content you may wish to hide from search engines. In the past, it could be hidden by the methods of password protection, putting the content back on a table, using JS / Ajax, or using Flash, etc. However, according to a recent article from Google Webmaster blog, these methods are no longer 100% effective.
Google began trying to crawl Flash content a few years ago. In fact, simple text content has been able to be crawled, while the links in Flash can also be tracked.
Google spiders can not only fill in a form but also capture the POST request page. This has long been seen from the Google blog.
JS / Ajax
JS with links had not been considered to be search engine friendly and the view was it can stop spiders crawling. But a couple of years ago we saw that JS links does not stop Google spiders crawling the URL that appears. Not only will links in the JS will be crawled, but also simple JS can be performed to find more URLs as well.
A few days ago it was found that many website comments which use the Facebook Comments plug-in were crawled and listed, even though the plug-in itself is AJAX. This is good news. Most websites can benefit greatly with the Facebook comments plug-in, yet the only problem is that the comments are implemented by AJAX, which means they cannot be crawled, and product reviews to be listed is one of the purposes (producing original content). In the past we have no solution but only to put Facebook comments plug-in and open the comments feature of the shopping cart itself. Now, since Facebook comments can be listed, we don’t need two sets of comments on functionality.
Another way to ensure the content is not to be listed is to setup a prohibit command in the robots file. There is a disadvantage in this in that the page strength will drop, and even though the content is not listed, the webpage will only accept page strength from links but not transfer page strength.
Nofollow does not guarantee no listing. Even if all your site links to the page have NF added, there is no guarantee that people will not get hold of the site and link to the page. Search engines can still find the page.
Meta Noindex + Follow
To transfer page strength and prevent being listed at the same time, we can use the meta noindex and meta follow on the page. This page will not be listed, but can transfer page strength. Indeed, it is also a good solution. The only problem is it will waste the time for the spider to crawl the site.
How to make a website page not to be listed is a question worth considering. If you are not realizing the seriousness, you can think about how much duplicate content and low quality content there is and whether this will penalize your site if indexed. Sometimes you may not want search engines to index certain pages but still wish to maintain the content because users find it convenient and useful.