Archived Writing, Syndication and SEO Black Magic

I've watched as many of the projects I've worked on in the past have come to an end. In each case I've usually contributed some amount of writing to a blog about that project. One annoying side effect of the end of these projects is that my writing oftentimes ends up committed to the void, never to be seen again. You could archive the content and reconstitute it somewhere else (I did that with my old grazr blog posts here) but that's not always possible or practical. While not a major issue, it's been annoying enough that I wanted to come up with a solution.

Another motivation involves the new company I've been working on, TenZeroLab. TenZeroLab is founded partially on the concept of experimenting with different ideas, building minimum viable products, and learning from each iteration. Logically, for each project, there should be some kind of blog. I was reticent, however, to split writing across multiple blogs especially with the possibility that some projects may only exist for a short time.

Solution - Syndication

I decided that a simple solution would be to continue writing at mikepk.com and then syndicate the content specific to each project to its own stand alone blog. I've started doing that with snapmyinfo and its blog. In theory, this will have the positive effect of keeping all relevant posts about a project grouped together, tied to that project, but also keep a record of my own writing here on my personal blog. Those interested in everything I'm doing could subscribe and read along here, and those interested just in sub projects could follow the individual project blogs. This also means that as other people contribute to the projects, they can write specifically on the project blogs and not have it show up on my personal blog.

To syndicate content across mutiple blogs, I'm using the FeedWordpress Wordpress plugin. For each project blog, I'm pointing the source feed to the category feed for each project from mikepk.com. Then when I post to mikepk.com, I add the correct project category to the post and then sync the individual blogs so they pull in the content relevant to each of them. Essentially what I have is a mini blog network.

blog network

It's not perfect, but it's a solution that solves my primary concern.

Hold on Cowboy! What about SEO!?

In the past, I haven't generally worried about SEO. Even though several of the projects I've been involved in were in some ways, SEO tools, SEO for my own projects has usually involved the basics (nice permalinks, some meta tags, etc...). For my new blogs I've been using the Thesis Wordpress Theme, which I highly recommend, especially since it does some amount of basic SEO out of the box.

Thinking about this solution, though, I became concerned about its SEO implications. This is essentially duplication of content and this mini blog network could easily be construed as a spam blog depending on how a search engine reads it. Search engines, and especially Google, are black boxes so there's no telling how to safely syndicate your own content. The last thing I wanted to do was anger Google and be banned from their index so I've been trying to find what best practices are for these scenarios and searching I did find some discussion on the topic.

Most of the advice seems to revolve around using canonical link tags in the header to give Google the hint as to which page is the "main" content page. I decided that the pages on the project blogs should be the canonical pages. Pointing search engines at the projects seemed more logical, and better for the individual projects.

This led me down a Wordpress rathole trying to come up with a solution I was happy with. I ended up with this custom php function, added to the thesis custom_functions.php file on mikepk.com. I don't usually program in php or hack wordpress so there is more than likely a more elegant way to do this. It's basically the quickest, dirtiest, hack I could come up with that seems to work :). The function checks the categories on individual posts, and if there's a match in the 'syndicated_domains' array, it rewrites the canonical url to the project's blog url.

function canonical_urls() {
    # categories that get remapped to other project blogs
    $syndication_domains = array(
        'snapmyinfo'=>'snapmyinfo.com/blog',
        'tenzerolab'=>'tenzerolab.com/blog',
        'pybald'=>'pybald.com/blog');
    # this is a single post
    if (is_singular()) {
        # check the array for a category hit on subdomain syndication
        # expects only one of the syndication categories to hit
        foreach((get_the_category()) as $category) {
            $new_host = ($syndication_domains[$category->name])?$syndication_domains[$category->name]:$new_host;
        }
        # this is a post that's syndicated to another blog
        # I want the canonical to point there rather than here
        if ($new_host) {
            # parse the permalink
            $parsed = parse_url( get_permalink() );
            # this is a hack to change the /month/date pattern of my usual permalinks into
            # the prettier no date form on the sub project blogs
            # obviously if you're using different permalink structures, you should change this
            $new_path = preg_replace('/\/\d+\/\d+/', '', $parsed['path']);
            # replace the host with the new host (and sub path)
            # obviously this doesn't work with password,port or other more
            # complicated url permuations
            $url = 'http://'.$new_host
                .$new_path
                .$parsed['query']
                .$parsed['fragment'];
        }
        # no syndicated category
        else {
            $url = get_permalink();
        }
    }
    # these canonical bits lifted from thesis normal canonical url parsing
    # function
    elseif (is_author()) {
            $author = get_userdata(get_query_var('author'));
            $url = get_author_link(false, $author->ID, $author->user_nicename);
    }
    elseif (is_category())
        $url = get_category_link(get_query_var('cat'));
    elseif (is_tag()) {
        $tag = get_term_by('slug', get_query_var('tag'), 'post_tag');
        if (!empty($tag->term_id))
            $url = get_tag_link($tag->term_id);
    }
    elseif (is_day())
        $url = get_day_link(get_query_var('year'), get_query_var('monthnum'), get_query_var('day'));
    elseif (is_month())
        $url = get_month_link(get_query_var('year'), get_query_var('monthnum'));
    elseif (is_year())
        $url = get_year_link(get_query_var('year'));
    # home page
    elseif (is_home()) {
        $url = get_bloginfo('wpurl');
    }
    echo '<link href="'.%24url%20.'" rel="canonical"/>';
}
# remove the default canonical url function
remove_action( 'wp_head','rel_canonical');
# add the custom one
add_action('wp_head', 'canonical_urls');

I'm likely going to follow the other piece of advice, adding the "no index" meta tag to the syndicated posts on mikepk.com as well. I'm not as concerned about these posts not showing up under mikepk.com in Google's index as I am about just keeping them in a permanent record.

Comments

The last piece of the puzzle I haven't quite figured out yet is synchronizing comments. Ideally, I'd like for the project pages and the posts here on mikepk.com to share the same comments. I thought, since I'm using the third party Disqus comment system that I'd be able to accomplish this somehow. Unfortunately, I haven't found a good solution so I'd be interested if anyone has ideas on this.

A Good Solution?

Only time will tell if this works or not. Unfortunately a lot of this SEO stuff is black magic, and the internet is filled with tales of woe of those that have been blacklisted by Google.