The Googlebots are coming: 4 steps to fine-tuning your technical SEO guide for starters
Posted by Greg McLoughlin
June 22nd, 2018
In an age where everybody’s looking for an immediate, gimme-dat-sweet-ROI-ASAP marketing strategy for their business, SEO can often be left in the dark.
The issue is that a lot of marketers aren’t thinking about the bigger picture.
Where channels like social media and PPC can get you a bulk of traffic for €x per click and conversions at a healthy cost, organic doesn’t exactly offer the same immediate luxury.
As an SEO and inbound buff in a Content Marketing agency, it’s always difficult to convey to clients that their investment in SEO won’t necessarily start working for them straight away.
They want return and they want it now.
What they don’t realise is: once they make that investment and commit to it – that’s it!
Investing in an SEO strategy will help grow your site’s authority on search engines and keep traffic and leads coming in. Now, we all know that regularly creating high-quality content for your website is a big plus for organic, but search engines take a bunch of other factors into consideration.
If you have a website with a lot of moving parts, a technical SEO strategy ensures that you’re abiding by Google’s rules and guidelines – and that any positive results you’ve amassed over the years will be maintained.
At first, this be can pretty intimidating, but I’m here to give you the jumpstart you need to get motoring! Here are some technical SEO tips to get you started.
1. Get Google Search Console for your website
I thought I’d start off with a simple one.
If you’re not already familiar, Google Search Console is the SEO’s go-to tool for tracking the search performance of a website.
You can check for search analytics (SERP CTR, keywords vs clicks, pages…), search errors, backlinks to your site, crawl stats, index status updates, educational resources and so, so much more.
If you don’t have Google Search Console for your site, you’re really missing a trick. Before you read on or do anything else (anything ellllllllllllllse) with your life, I’d highly recommend that you get it set up pronto.
2. Create an XML sitemap (and submit it to Google Search Console!)
A sitemap is pretty standard across all websites, but even then, you’d be surprised by how many sites still don’t have one.
An XML sitemap contains the important pages of a site to help both search engines and users determine its structure. You can find out if you have one by typing /sitemap.xml at the end of your root domain.
In order for Google to find and index your pages, they send crawlers to your site. These ‘Googlebots’ discover and scan websites by following links from one webpage to another. New pages or sites, changes to existing pages, and dead links are noted and used to update the majestic Google index.
As a result, crawlers are vital for telling Google that your content is both optimised and SEO-friendly.
While sitemaps don’t guarantee that your most important pages are crawled every time, they provide additional info for crawlers and give you a better chance for crawling, so go to the effort and get one if you don’t already.
Most good CMSs have the ability to generate sitemaps. Alternatively, you could always use a plugin or a server-side program. My personal favourite is Yoast’s free WordPress plugin, which automatically creates and categorises your sitemaps by subdirectory, which can also give crawlers an easier-peasier-lemon-squeezier ride.
Lastly, since you implemented Google Search Console before reading this part (*shakes fist at you if you haven’t*), you can submit your XML sitemap to it. Again, this is all about making your important pages more accessible to Google.
3. Increase crawler accessibility and maximise crawl budget
Google once defined crawl budget as “prioritising what to crawl, when, and how much resource the server hosting the website can allocate to crawling. [It’s] more important for bigger websites, or those that auto-generate pages based on URL parameters…”
In short, crawlers simply can’t scan every single page on the internet, so in order for their Googlebots to work efficiently, they allocate a crawl budget for each website. Google takes a number of factors into consideration, but for starters, the top 3 are:
- Duplicate content.
- Redirect chains, 404s and soft 404s.
- Ratio of low quality content to high quality content.
Google doesn’t disclose the budget that’s available to you, but in order to maximise it, you need to make sure you’re dotting the i’s and crossing the t’s and maintaining your site with crawlers in mind.
For starters, you need a tool that can give you this information quite quickly. Honestly, SEO tool Screaming Frog is the only one you’ll ever need. I couldn’t recommend it enough. For its price (£149 per year), it goes beyond the call of duty to give you what you need.
Screaming Frog can crawl your whole site and extract every URL, image, response codes, metadata and a lot more. It’s pretty tricky to get to grips with, but once you do, you’ll never go back.
So let’s go through each factor above a little deeper.
The technical SEO checklist (or: avoid at all costs)
Duplicate content is a nightmare for SEO buffs.
It’s defined as different URLs serving up the exact same content.
That’s bad because your crawl budget is wasted on URLs that don’t need to be crawled – which can potentially stop crawling and hinder SEO.
*In best Darth Vader voice* NOOOOOOOOOOOOOOOOOOOOOOO.
Crawlers don’t simply start crawling from the root domain; they can start anywhere they freaking like! So it’s super important that they crawl the definitive pages every time.
For smaller websites, the best way to sort this is by adding a rel=canonical attribute to the HTML of the duplicate page. You’re telling Google that you want to attribute all of the SEO value to the preferred page. For example:
If you have a page with the URL: https://example.com/mens-shoes
And another URL with a parameters that’s serving up the exact same content: https://example.com/mens-shoes?size=9
If you add the rel=canonical attribute < link rel="canonical" href="https://example.com/mens-shoes" /> to the HTML of the latter page, you’re telling Google that the preferred version is https://example.com/mens-shoes.
This means that all SEO value from the duplicate page is fed to the definitive URL.
For bigger sites (more than 10,000 pages), the best way to control crawl budget is to use your website’s robots.txt file (find it by typing /robots.txt after your root domain).
This is the very first place crawlers check before crawling your site and acts as a set of rules that crawlers must follow, so you can tell Googlebots what you don’t want crawled (while we're here, since crawlers visit this first, add that XML sitemap URL to your robots.txt file to increase chances of better crawling).
Like above, URL parameters always cause a headache for crawlers, so you disallow URLs with certain parameters trails from being crawled by search engines.
(Before you go the robots.txt route, remember that any existing backlinks to disallowed pages will no longer hold value to your site, among a bunch of consequences. Not knowing how to leverage your robots.txt file can be detrimental to your site, so I’d highly recommend that you consult an experienced SEO specialist before proceeding.)
Redirect chains, 404s and soft 404s
Every time a crawler gets a 3xx response code, it’s wasting crawl budget. In the past, I’ve seen redirect chains of six URLs, so that’s five more URLs that a crawler had to go through to get to the final one.
Ultimately, on-site redirect chains and 4xx cause unnecessary work for crawlers, so always ensure that the final URL is only present on your website.
Screaming Frog can extract on-site redirects for you pretty easily, but Ayima has a great Chrome extension so you can track redirects on the go.
Ratio of low-quality/Thin content to high quality content
Pages with little-to-no HTML on the page will ultimately do more harm than good.
Again, URL parameters or stand-alone blog topic pages can do this. Overall, the more low-quality pages that crawlers encounter, the less quality Google thinks your website is, which decreases your crawl budget.
If these pages aren’t doing anything for your site (both backlink-wise or traffic-wise), simply 404 the heckin' heck out of them! Alternatively, you can canonicalise or utilise your robots.txt file to control crawling.
This is where blogging is vital for SEO; Google likes high-quality content. Regularly feeding content to Google lets it know that your site holds value and it's worth crawling more regularly.
Whether it’s blogs, whitepapers, videos, tools… Don’t leave your website dead in the water. Always feed it quality content for both search engines and users. The higher the quality, the better your site will perform.
Google recently announced that site speed is in fact a ranking signal for both desktop on mobile (ESPECIALLY on mobile). Although it only affects a small percentage of queries, you can bet your bottom diddly-dollar that it will be even more important in the future.
Simply put, site speed refers to the amount of time it takes for a page to load. In the interest of both users and crawlers, you want your pages to load as fast as possible.
For starters, Google's Pagespeed Insights tool gives you a pretty comprehensive breakdown of the current speed of your website and places where you can improve. Simply just paste your URL in and you'll get everything you need to get started. Here's a quick breakdown of what you'll need to do.
Compress file and images wherever possible
For images, web server applications tend to sacrifice the quality of the image for file size, so you're better off using a tool like Photoshop to control the overall quality. Also, make sure to size images to fit your website's dimensions. Any bigger will just create more work for loading!
Removing any unnecessary or redundant data will give your site speed a good boost. Google recommends using CSSNano and UglifyJS but you can also install Google's PageSpeed module which integrates with an Apache or Nginx web server.
Just like crawl budget, the more redirects that are present in a URL trail, the more work crawlers have to do to get to the finish line.
Make sure that redirects are at the absolute minimum so the page can load as quickly as possible.
Leverage browser caching
Read up on Google's recommendations for server cache control. All in all, setting your 'expires' header to one year is a reasonable time period.
Looking for a knockout SEO strategy?
We’re here to help! Our experienced SEO professionals will give you exactly what you need to shoot up the SERPs with high-quality content. Get in touch to find out more about our services.
You may also like...
Nicole Thomsen | Nov 6, 2023
Nicole Thomsen | Sep 15, 2023