Search Engine Optimization : 25 Killer Combos for Google's Site: Operator

There’s an app for everything – the problem is that we’re so busy chasing the newest shiny toy that we rarely stop to learn to use simple tools well. As a technical SEO, one of the tools I seem to never stop finding new uses for is the site: operator. I recently devoted a few slides to it in my BlueGlassX presentation, but I realized that those 5 minutes were just a tiny slice of all of the uses I’ve found over the years.
People often complain that site:, by itself, is inaccurate (I’ll talk about that more at the end of the post), but the magic is in the combination of site: with other query operators. So, I’ve come up with two dozen killer combos that can help you dive deep into any site.

1. site:example.com

Ok, this one’s not really a combination, but let’s start with the basics. Paired with a root domain or sub-domain, the [site:] operator returns an estimated count of the number of indexed pages for that domain. The “estimated” part is important, but we’ll get to that later. For a big picture, I generally stick to the root domain (leave out the “www”, etc.).
Each combo in this post will have a clickable example (see below). I'm picking on Amazon.com in my examples, because they're big enough for all of these combos to come into play:

site:amazon.com

You’ll end up with two bits of information: (1) the actual list of pages in the index, and (2) the count of those pages (circled in purple below):

I think we can all agree that 273,000,000 results is a whole lot more than most of us would want to sort through. Even if we wanted to do that much clicking, Google would stop us after 100 pages. So, how can we get more sophisticated and drill down into the Google index?

2. site:example.com/folder

The simplest way to dive deeper into this mess is to provide a sub-folder (like “/blog”) – just append it to the end of the root domain. Don’t let the simplicity of this combo fool you – if you know a site’s basic architecture, you can use it to drill down into the index quickly and spot crawl problems.

site:amazon.com/books

3. site:sub.example.com
You can also drill down into specific sub-domains. Just use the full sub-domain in the query. I generally start with #1 to sweep up all sub-domains, but #3 can be very useful for situations like tracking down a development or staging sub-domain that may have been accidentally crawled.

site:local.amazon.com

4. site:example.com inurl:www

The "inurl:" operator searches for specific text in the indexed URLs. You can pair “site:” with “inurl:” to find the sub-domain in the full URL. Why would you use this instead of #3? On the one hand, "inurl:" will look for the text anywhere in the URL, including the folder and page/file names. For tracking sub-domains this may not be desirable. However, "inurl:" is much more flexible than putting the sub-domain directly into the main query. You'll see why in examples #5 and #6.

site:amazon.com inurl:local

5. site:example.com -inurl:www

Adding [-] to most operators tells Google to search for anything but that particular text. In this case, by separating out "inurl:www", you can change it to "-inurl:www" and find any indexed URLs that are not on the "www" sub-domain. If "www" is your canonical sub-domain, this can be very useful for finding non-canonical URLs that Google may have crawled.

site:amazon.com -inurl:www

6. site:example.com -inurl:www -inurl:dev -inurl:shop

I'm not going to list every possible combination of Google operators, but keep in mind that you can chain most operators. Let's say you suspect there are some stray sub-domains, but you aren't sure what they are. You are, however, aware of "www.", "dev." and "shop.". You can chain multiple "-inurl:" operators to remove all of these known sub-domains from the query, leaving you with a list of any stragglers.

site:amazon.com -inurl:www -inurl:local -inurl:aws

7. site:example.com inurl:https

You can't put a protocol directly into "site:" (e.g. "https:", "ftp:", etc.). Fortunately, you can put "https" into an "inurl:" operator, allowing you to see any secure pages that Google has indexed. As with all "inurl:" queries, this will find "https" anywhere in the URL, but it's relatively rare to see it somewhere other than the protocol.

site:amazon.com inurl:https

8. site:example.com inurl:param

URL parameters can be a Panda's dream. If you're worried about something like search sorts, filters, or pagination, and your site uses URL parameters to create those pages, then you can use "inurl:" plus the parameter name to track them down. Again, keep in mind that Google will look for that name anywhere in the URL, which can occasionally cause headaches.

site:amazon.com inurl:ref

Pro Tip: Try out the example above, and you'll notice that "inurl:ref" returns any URL with "ref" in it, not just traditional URL parameters. Be careful when searching for a parameter that is also a common word.

9. site:example.com -inurl:param

Maybe you want to know how many search pages are being indexed without sorts or how many product pages Google is tracking with no size or color selection – just add [-] to your "inurl:" statement to exclude that parameter. Keep in mind that you can combine "inurl:" with "-inurl:", specifically including some parameters and excluding others. For complex, e-commerce sites, these two combos alone can have dozens of uses.

site:amazon.com -inurl:ref

10. site:example.com text goes here

Of course, you can alway combine the "site:" operator with a plain-old, text query. This will search the contents of the entire page within the given site. Like standard queries, this is essentially a logical [AND], but it's a bit of a loose [AND] – Google will try to match all terms, but those terms may be separated on the page or you may get back results that only include some of the terms. You'll see that the example below matches the phrase "free Kindle books" but also phrases like "free books on Kindle".

site:amazon.com free kindle books

11. site:example.com “text goes here”

If you want to search for an exact-match phrase, put it in quotes. This simple combination can be extremely useful for tracking down duplicate and near-duplicate copy on your site. If you're worried about one of your product descriptions being repeated across dozens of pages, for example, pull out a few unique terms and put them in quotes.

site:amazon.com "free kindle books"

12. site:example.com/folder “text goes here”

This is just a reminder that you can combine text (with or without quotes) with almost any of the combinations previously discussed. Narrow your query to just your blog or your store pages, for example, to really target your search for duplicates.

site:amazon.com/books "harry potter"

13. site:example.com this OR that

If you specifically want a logical [OR], Google does support use of "or" in queries. In this case, you'd get back any pages indexed on the domain that contained either "this" or "that" (or both, as with any logical [OR]). This can be very useful if you've forgotten exactly which term you used or are searching for a family of keywords.

site:amazon.com edward or jacob

**14. site:example.com “top * ways”**

The asterisk [*] can be used as a wildcard in Google queries to replace unknown text. Let's say you want to find all of the "Top X" posts on your blog. You could use "site:" to target your blog folder and then "Top *" to query only those posts.

site:amazon.com "top * books"

Pro Tip: The wild'card [*] operator will match one or multiple words. So, "top * questions" can match "Top 40 Books" or "Top Career Management Books". Try the sample query above for more examples.

15. site:example.com “top 7..10 ways”

If you have a specific range of numbers in mind, you can use "X..Y" to return anything in the range from X to Y. While the example above is probably a bit silly, you can use ranges across any kind of on-page data, from product IDs to prices.

site:amazon.com "top 5..10 novels"

16. site:example.com ~word

The tilde [~] operator tells Google to find words related to the word in question. Let's say you wanted to find all of the posts on your blog related to the concept of consulting – just add "~consulting" to the query, and you'll get the wider set of terms that Google thinks are relevant.

site:amazon.com ~management

17. site:example.com ~word -word

By using [-] to exclude the specific word, you can tell Google to find any pages related to the concept that don't specifically target that term. This can be useful when you're trying to assess your keyword targeting or create new content based on keyword research.

site:amazon.com ~management -management

18. site:example.com intitle:”text goes here”

The "intitle:" operator only matches text that appears in the <TITLE></TITLE> tag. One of the first spot-checks I do on any technical SEO audit is to use this tactic with the home-page title (or a unique phrase from it). It can be incredibly useful for quickly finding major duplicate content problems.

site:amazon.com intitle:"harry potter"

**19. site:example.com intitle:”text * here”**

You can use almost any of the variations mentioned in (12)-(17) with "intitle:" – I won't list them all, but don't be afraid to get creative. Here's an example that uses the wildcard search in #14, but targets it specifically to page titles.

site:amazon.com intitle:"the * games"

Pro Tip: Remember to use quotes around the phrase after "intitle:", or Google will view the query as a one-word title search plus straight text. For example, "intitle:text goes here" will look for "text" in the title plus "goes" and "here" anywhere on the page.

20. intitle:”text goes here”

This one's not really a "site:" combo, but it's so useful that I had to include it. Are you suspicious that other sites may be copying your content? Just put any unique phrase in quotes after "intitle:" and you can find copies across the entire web. This is the fastest and cheapest way I've found to find people who have stolen your content. It's also a good way to make sure your article titles are unique.

intitle:"fifty shades of grey"

21. “text goes here” -site:example.com

If you want to get a bit more sophisticated, you can use "-site:" and exclude mentions of copy on any domain (including your own). This can be used with straight text or with "intitle:" (like in #20). Including your own site can be useful, just to get a sense of where your ranking ability stacks up, but subtracting out your site allows you to see only the copies.

"amazon kindle" -site:amazon.com

22. site:example.com intext:”text goes here”

The "intext:" operator looks for keywords in the body of the document, but doesn't search the <TITLE> tag. The text could appear in the title, but Google won't look for it there. Oddly, "intext:" will match keywords in the URL (seems like a glitch to me, but I don't make the rules).

site:amazon.com intext:"best book ever"

23. site:example.com ”text goes here” -intitle:"text goes here"

You might think that #22 and #23 are the same, but there's a subtle difference. If you use "intext:", Google will ignore the <TITLE> tag, but it won't specifically remove anything with "text goes here" in the title. If you specfically want to remove any title mentions in your results, then use "-intitle:".

site:amazon.com intext:"best book ever" -intitle:"best book ever"

24. site:example.com filetype:pdf

One of the drawbacks of "inurl:" is that it will match any string in the URL. So, for example, searching on "inurl:pdf", could return a page called "/guide-to-creating-a-great-pdf". By using "filetype:", you can specify that Google only search on the file extension. Google can detect some filetypes (like PDFs) even without a ".pdf" extension, but others (like "html") seem to require a file extension in the indexed document.

site:amazon.com filetype:xls

25. site:.edu “text goes here”

Finally, you can target just the Top-Level Domain (TLD), by leaving out the root domain. This is more useful for link-building and competitive research than on-page SEO, but it's definitely worth mentioning. One of our community members, Himanshu, has an excellent post on his own blog about using advanced query operators for link-building.

site:.edu "online marketing"

Why No Allintitle: & Allinurl:?

Experienced SEOs may be wondering why I left out the operators "allintitle:" and "allinurl:" – the short answer is that I've found them increasingly unreliable over the past couple of years. Using "intitle:" or "inurl:" with your keywords in quotes is generally more predictable and just as effective, in my opinion.

Putting It All to Work

I want to give you a quick case study to show that these combos aren't just parlor tricks. I once worked with a fairly large site that we thought was hit by Panda. It was an e-commerce site that allowed members to spin off their own stores (think Etsy, but in a much different industry). I discovered something very interesting just by using "site:" combos (all URLs are fictional, to protect the client):

(1) site:example.com = 11M

First, I found that the site had a very large number (11 million) of indexed pages, especially relative to its overall authority. So, I quickly looked at the site architecture and found a number of sub-folders. One of them was the "/stores" sub-folder, which contained all of the member-created stores:

(2) site:example.com/stores = 8.4M

Over 8 million pages in Google's index were coming just from those customer stores, many of which were empty. I was clearly on the right track. Finally, simply by browsing a few of those stores, I noticed that every member-created store had its own internal search filters, all of which used the "?filter" parameter in the URL. So, I narrowed it down a bit more:

(3) site:example.com/stores inurl:filter = 6.7M

Over 60% of the indexed pages for this site were coming from search filters on user-generated content. Obviously, this was just the beginning of my work, but I found a critical issue on a very large site in less than 30 minutes, just by using a few simple query operator combos. It didn't take an 8-hour desktop crawl or millions of rows of Excel data – I just had to use some logic and ask the right questions.

How Accurate Is Site:?

Historically, some SEOs have complained that the numbers you get from "site:" can vary wildly across time and data centers. Let's cut to the chase: they're absolutely right. You shouldn't take any single number you get back as absolute truth. I ran an experiment recently to put this to the test. Every 10 minutes for 24 hours, I automatically queried the following:

site:seomoz.org
site:seomoz.org/blog
site:seomoz.org/blog intitle:spam

Even using a fixed IP address (single data center, presumably), the results varied quite a bit, especially for the broad queries. The range for each of the "site:" combos across 24 hours (144 measurements) was as follows:

67,700 – 114,000
8,590 – 8620
40 – 40

Across two sets of IPs (unique C-blocks), the range was even larger (see the "/blog" data):

67,700 – 114,000
4,580 – 8620
40 – 40

Does that mean that "site:" is useless? No, not at all. You just have to be careful. Sometimes, you don't even need the exact count – you're just interested in finding examples of URLs that match the pattern in question. Even if you need a count, the key is to drill down. The narrowest range in the experiment was completely consistent across 24 hours and both data centers. The more you drill down, the better off you are.
You can also use relative numbers. In my example above, it didn't really matter if the 11M total indexed page count was accurate. What mattered was that I was able to isolate a large section of the index based on one common piece of site architecture. Assumedly, the margin of error for each of those measurements was similar – I was only interested in the relative percentages at each step. When in doubt, take more than one measurement.
Keep in mind that this problem isn't unique to the "site:" operator – all search result counts on Google are estimates, especially the larger numbers. Matt Cutts discussed this in a recent video, along with how you can use the page 2 count to sometimes reduce the margin of error:

The True Test of An SEO

If you run enough "site:" combos often enough, even by hand, you may eventually be greeted with this:

If you managed to trigger a CAPTCHA without using automation, then congratulations, my friend! You're a real SEO now. Enjoy your new tools, and try not to hurt anyone.

Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don't have time to hunt down but want to read!

Answer Customer Needs by Building a Customer Advisory Board

Posted: 23 Jan 2013 05:13 AM PST

Posted by JackieRae
We sure do love feedback at Moz. One of our biggest contributors to feedback is our Customer Advisory Board (which we lovingly call CAB). Who doesn’t love sharing the work they do with a group of awesome people, hearing their insights, and learning how to provide the most value to users based on their feedback?

Dana Lookadoo sporting her Customer Advisory Board shirt at MozCon.

A few weeks ago, our VP of Growth Marketing, Joanna Lord, did a Whiteboard Friday about 10 Ways to Get Feedback. I’d love to expand on this topic and share how we developed our CAB, what's worked well, and how we've improved.

Framing the CAB

We started the process of creating our Board over a year ago. Fortunately, our Director of Product, Samantha Britney, already had a fabulous framework formulated for us (how’s that for alliteration). It was important for us to flesh out and gather ideas in a document so we could define the purpose of the CAB, understand what it would take to be successful, and mitigate any risks that might occur.
First, we defined the purpose. Our Board members would weigh-in and validate product decisions, and they would provide feedback during early stages of product planning and design. The feedback gathered from the Board would need to be strategic and tactical, and would be used to help unveil any issues that may arise and expand on ideas we might not have thought about yet. We also wanted to develop relationships with folks in the industry in order to better understand our customers' needs. Finally, if CAB members love the work that we do, they may, over time, turn into our biggest evangelists.
Once our goals were set, it was time to move on to the "what ifs" of implementing this new program. There were a few risks we acknowledged before creating the Customer Advisory Board, which allowed us to think critically about the feedback we would received. For example, many of our contacts are relatively close to our company, brand, and/or product. This is awesome! However, their feedback might be swayed by their preconceptions about us and potentially lead to “group think.” We took steps to move away from these notions and help keep our CAB members neutral.
It was also important to set goals to ensure that our CAB was functioning like we intended it to. We defined indicators to use as benchmarks, such as participation of the CAB, quality of the feedback loops, and amount of CAB member's time we use (that last one was important, because we didn’t want to violate any promises we made to those helping us). This allowed us to gauge our success and to determine when it was time to revise our original framework.

Selecting the team

The final piece of our framework was to list the type of candidate we want to engage with. We used written "personas" to group potential candidates into more manageable sectors during our selection process. We originally called for 20-25 people (although the number has currently been upped to 35). These CAB members are open and honest with us, even if they provide negative feedback. They represent a diverse segment of our users and work in small to large companies, from in-house SEO’s to independent consultants.
To make the selection process as neutral as possible, we compiled of a list of candidates from internal recommendations, active community members, and folks who gave feedback to our Product Team in the past to make our final selections. With our final group, we were sure that we would not only hear praise, but would be provided with a significant amount of "tough love" that was needed to make the CAB project a success.

CAB members challenge us, allowing us to make better products.

Getting to know our CAB

Now that you know the driving factors behind our process, it's time to learn a little bit more about who makes up our Customer Advisory Board.
The folks on our Board have very different backgrounds, areas of expertise, and passions, which means their feedback can be quite diverse. When a CAB member first joins, we ask them to fill out a little “getting to know you” survey. The goal is to understand that no two people are going to want the same thing, and it helps set context around the feedback that we receive.
We have 35 CAB members from seven different countries, including the U.S., Portugal, Spain, England, Austria, Australia, Brazil, and Canada. They’ve been customers of SEOmoz’s anywhere from 1 to 6 years, and some have more experience with our brand and product than others. They work in different a variety of different environments so we can field customer needs from all sides of the inbound marketing process.

Every CAB member has different expectations at work.

So, have we been successful?

Success? I think YES! We’ve had several formal feedback loops (11 and counting), and many more informal conversations with our Customer Advisory Board members. From these meetings, we've been able to collect the following data:

Of the original 15 members that joined, all 15 are still active members.
We have an average of 9 out of 10 Customer Advisory Board members giving feedback when asked (although we’re getting closer to 9.5 out of 10!).
Every member that opts-in to a feedback loop (whether the loop is a survey, email, or in-person interview) has finished the feedback loop.
We've found the feedback is so useful that we went a step further and created a Local Customer Advisory Board for GetListed last month.

We've been thrilled with the success of our CAB so far. Though it took some time to get this process in place, our gains from answering to and gaining feedback from our CAB have far outweighed the time and effort it took to get it up and running.

Páginas

How Importance Is SEO?

Friday, January 25, 2013

25 Killer Combos for Google's Site: Operator