Syndicate Old Blog Posts in Jekyll without Screwing Your SEO

Post from the Jekyll category

Fire There's no clear explanation on the Internet on how to do this properly, so I built my own solution.

Those who know me well will be tired of hearing me say all the time that if you're not part of the solution, you're part of the problem. I'm a firm believer of this conviction and I say it all the time. Most likely, when I die, I will have that written on my tombstone.

The Problem

I have been learning marketing on my own for the past years, and most of what I have found is that marketers tend to exaggerate A LOT.

Marketers like to sell their expertise on things that are, in most of the cases, just common sense. That's it, I dropped the bomb, but I stand firmly by this affirmation. And while I still haven't accomplished anything too big, marketing-wise, I must say I am fairly content with what I've done with both MarsBased and Startup Grind Barcelona.

So many studies, so many rules, so many theories, workshops, webinars and astrology about SEO and no one has quite figured out how to syndicate old blog posts correctly. That is the case of this very blog which I started in late March 2016. I want to aggregate here all the blog posts I've produced for MarsBased, Startup Grind, Barcelona Ventures and other sites I've written for throughout the years.

That would have been simple if we were not talking about risking a Google penalty for content duplication! Serious stuff here.

So, how are we supposed to syndicate our own old stuff into a new site?

The Solution Part 1: The Tech Stuff

Full disclaimer, if you came for the SEO stuff, skip this section.

First and foremost, I am going to describe how I implemented this in Jekyll, my static site generator of choice. And yours, if you're right about stuff.

I'm going to assume you know how to run a blog on Jekyll and that you've got basic coding skills. No rocket science required, though.

For this, we add a new set of variables called original on the Jekyll front matter, such as the one highlighted below:

Jekyll Front Matter

I have all sorts of useful variables there, but let's focus on original. You can either write it yourself, or you can copy them from here (make sure you respect the indentation):

original:
  link: https://medium.com/@lexrodba/i-love-telling-stories-97367d98caf1#.6hipt2gsc
  date: 2014-12-11
  site: Medium

That means that we have a variable called original that includes three other variables that I describe here:

  • link: Contains the URL of the original post or content.
  • date: Contains the original post or content's publishing date.
  • site: Contains the name of the site where the post or content was published.

You will create a file named post-original.erb under the _includes folder with the following content:

{% if page.original %}

  {% if include.param == 'header' %}

    <link rel="canonical" href="{{ page.original.link }}" />

  {% elsif include.param == 'body' %}

  <p class="original-post">This post was originally published on {{ page.original.site }} on {{ page.original.date | date: "%b %-d, %Y"  }}: <a href="{{ page.original.link }}" title="{{ page.title }}" target="_blank">{{ page.title }}</a>.</p>

  {% endif %}

{% endif %}

This file basically means if the blog post has got the 'original' variable set (that means, it's a syndicated post) and output HTML according to the parameter that will be passed on during the call.

Why this? Well, I wanted to group the logic in a simple file, instead of calling two different files. We will call this file twice per post, once in the header and once after the content. More details in the marketing section below.

Then, you will locate the _layouts/posts.html and add the two calls that you see in the snippet below. As mentioned above, one in the header and one after the content, with its corresponding parameter.

Jekyll SEO stuff

Customize to your heart's content from here.

The Solution Part 2: The Marketing Stuff

Let's start with the basics: Google does not approve of duplicated content. What's more: if they find out you duplicate content, they can potentially penalise you.

What's the best way to avoid being penalised for duplicating content? Obviously, it is not duplicating content if you want to play it 100% safe. However, if you really need to do it and are a little adventurous yourself, then follow me.

Most marketers will agree that you should never publish using the old date. Even if you're re-posting stuff, you should do it using today's time, linking back to the original post.

Screw that: even if Google uses bots to crawl the sites, your visitors are humans. Use common sense to make their life easier and their experience more enjoyable.

In the previous section we developed a plugin that will add the following two crucial code snippets to your reused blog post:

  • A rel=canonical metatag in the header.
  • A written reference to the original post.

The first one is a hidden reference to the original post using a metatag. Metatags don't show up to visitors, but stay hidden in the source code, hence being readable by bots & crawlers alike.

To understand this metatag better, I suggest you read these two articles:

This hidden code snippet will appear like this in your source code:

Canonical link HTML code output.

The second code snippet is a written reference to the original content, and will appear visibly at the end of the content. Both the visitor and the crawler will understand.

Canonical link Simple and elegant.

This way, we have developed a solution that's both functional and elegant and will let you syndicate old stuff in your site without having SEO issues or incurring into penalisations because of content duplication.

Let me know if I accidentally skipped anything in the comments section below or what have you done to solve this problem before reading this!

Now Playing: Nacho Vegas - Dry Martini, S.A..