Skip to content

Silly WordPress Tricks, Part I: Exporting Blog Posts as HTML

Dear Reader,

I’m making some changes to my blog. Specifically, I am moving most of my Public Speaking blog posts to my mailing list. I’ve got more of those than I realized. Being a program, my thought process was of course “Why spend an hour copying and pasting all of these posts when I can spend two hours and write a script?” :) So I wrote a simple script using WP-CLI to gather the information. So I hammered out some bash to get the job done. I am proud that other than wp-cli, I did not have to resort to any additional PHP code to do the job. It was tempting at times, but I did it.

Yes, I am aware that wp-cli will export to a WXR file. I wanted something simpler.

No, this is not a complete solution. It doesn’t deal with attachments, comments, or metadata. I don’t need those for this project.

Purpose

This bash script will export all of the blog posts in a given WordPress category into individual HTML files.  There is no templating to control how they are output, it is not that smart. It takes no parameters, everything is hard coded.

Assumptions

  1. You have wp-cli installed on your machine, it is named wp, and it is in your path.
  2. You have a WordPress blog

Here is the script. Below, I will break it down line-by-line in case it’s not clear.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
HOME_DIR=~
WP_DIR=/path/to/public_html/
CATEGORY=speaking
mkdir -p $HOME_DIR/blog/$CATEGORY
cd $WP_DIR
 
for LINE in $(wp post list --category_name=$CATEGORY --fields=ID,post_name --format=csv| tail -n +2); do
        ID=$(echo $LINE | cut -f1 -d,)
        SLUG=$(echo $LINE | cut -f2 -d,)
        TITLE=$(wp post get $ID --field=post_title)
        POST_DATE=$(date -d "$(wp post get $ID --field=post_date)" +"%Y-%m-%d")
        AUTHOR=$(wp user get $(wp post get $ID --field=author) --field=display_name)
        echo "Processing $TITLE"
        echo "<h1>$TITLE</h1>" > $HOME_DIR/blog/$CATEGORY/$SLUG.blogpost.txt
        echo "<strong>Author: </strong>$AUTHOR</storng><br />"  >> $HOME_DIR/blog/$CATEGORY/$SLUG.blogpost.txt
        echo "<strong>Date Published : </strong>$POST_DATE<br />" >> $HOME_DIR/blog/$CATEGORY/$SLUG.blogpost.txt
        wp post get $ID --field=post_content >> $HOME_DIR/blog/$CATEGORY/$SLUG.blogpost.txt
done

2: This is the home directory. A directory named blog/CATEGORY will be created under this directory. It is set to the user’s home directory.

3: This is the root of your WordPress installation.

4: This is the category that you want to export.

5: Create the directory to hold the posts

8: Get a list of the post IDs for the given category. Execute lines 9-18 once for each post. The wp command in the for loop will return a csv list of ID and post names. (the slug)

9: Get the post ID from the CSV line using cut.

10: Get the slug.

11: Use wp to get the title of the post.

12: Use wp to get the post date. Use date and a format of YYYY-MM-DD to strip off the time.

13: Use wp to get the author id and feed that to wp to get the author’s display name.

14: Let the user know what we are currently processing.

15: Output the Title of the post as an H1

16: Output the By-line.

17: Output the post date

18: Use wp to gather the content of the actual post and output it.

Lather, rinse, repeat.

 

:Conclusion

This is one of those “gets the job done” scripts. It is brittle and it is fragile. There are a lot of ways to break it and there is zero error-handling in it. All that having been said, it gets the job done. More importantly, it illustrates one of the cool things about wp-cli, scriptability of WordPress. I live in the command line, wp-cli has quickly become one of my most used tools. This, however, is the first time I’ve used it as part of a larger script. I think that’s cool. :)

Until next time,
I <3 |<
=C=

LinkedIn and Let’s Encrypt

letsencryptDear Reader,

Last night I was playing around with the LinkedIn REST API and quite by accident, I discovered something. If you have installed a Let’s Encrypt certificate on your site, LinkedIn will not read images included in your OpenGraph tags.

WTFBBG?!?

A little primer for my non-tech friends

Ok, for those of my readers who are not programmers, Open Graph is how sites like Facebook, twitter, and sometimes even LinkedIn display an image, a title, and a summary of a web page automagically when all you do is share the URL. Let’s look at an example.

If you go to my recent postcard, https://blog.calevans.com/2016/05/16/postcards-life-010/, and view the source of the page, you will eventually find a section with a bunch of meta tags. Some of them will look like this.

<meta property="og:title" content="Postcards From My Life - 010 - Postcards From My Life" />
<meta property="og:url" content="https://blog.calevans.com/2016/05/16/postcards-life-010/" />
<meta property="og:site_name" content="Postcards From My Life" />
<meta property="og:updated_time" content="2016-05-15T14:50:55-05:00" />;
<meta property="og:image" content="https://blog.calevans.com/wp-content/uploads/2016/05/palm_beach_001.jpg" />

See the “og:” there? that is your indicator that these are Open Graph tags. They give any site that pays attention vital information that otherwise, they would have to grep the HTML and attempt to infer. In the case of the tags above:

  • The page’s title
  • The page’s URL
  • The name of the site the URL comes from
  • When it was last updated
  • The image to use when displaying this URL.

That last one is important as it’s the one that LinkedIn is failing on.

Developers, start reading again

Behind the scenes, LinkedIn has a process that reads a webpage, finds the image, and hands all the open graph info back to the browser you are using via JavaScript. (What we used to call ajax) Somewhere in that chain

<pre>

Browser->LinkedIn Service->LinkedIn Image Service->Browser

</pre>

something is broken. Something doesn’t like Let’s Encrypt. How do I know, let’s run a quick test.

  1. Open LinkedIn.com in a separate tab and if you aren’t already so, log in.
  2. Click the “Share an Update” button
    Screen Shot 2016-05-18 at 10.04.38 AM
  3. Past in this link. https://blog.calevans.com/2016/05/16/postcards-life-010/
  4. Notice that you see the title and the copy, but not the image. The image is blank.
    Screen Shot 2016-05-18 at 10.04.55 AM
  5. Ok, abort this update and start a new one
  6. Paste in this URL https://voicesoftheelephpant.com/2016/05/10/interview-helen-housandi/
  7. See how the image appears? That is what is supposed to happen.
    Screen Shot 2016-05-18 at 10.05.24 AM
  8. You can abort this update now as well. (Or post it, it’s a good interview)

Voices of the ElePHPant has a different cert because Apple doesn’t like Let’s Encrypt either.

Conclusion

If posting to LinkedIn is important to you – and it is not to me – then do not use a Let’s Encrypt certificate. Get you a cheap one from ssls.com.

 

Until next time,
I <3 |<
=C=

What do developers look for when they scan a job ad?

Dear Reader,

In my book “Culture of Respect” I have a section on writing job ads that will attract developers. I am in the process of revising that chapter, so I thought I would ask the people who actually read the job ads what they look for. The results weren’t that surprising to me. Having read a lot of job ads though, I am guessing that the results will be surprising to some managers out there.

I’ll let you read the results for yourself.

Until next time,
I <3 |<
=C=

 

When Giants Battle. Google, Twitter, Apple, and encrypting the web. A podcaster’s story

Dear Reader,

This is a story about my fight to change the feed on my little podcast, Voices of the ElePHPant.

 

Google

Google's attitude towards encryption on the web

Twitter

Twitter's attitude towards encryption on the web

Apple

Apple's attitude towards encryption on the web

As a podcaster, I’m in a quandary. I know that encrypting the web is a good thing. Most of my sites now sport a Let’s Encrypt cert. (See this post for how I automated let’s Encrypt on CentOS) However, several giants of the web have different views on encryption.

  • Google – Encrypt everything.
    That’s good, right? I mean encryption is good and Google will reward encrypted sites with better ranking.
  • Twitter – Encrypt Player-cards and all the assets.
    Ok, Since I’m encrypting everything, encrypting my player cards and all their assets are encrypted. I even wrote a plugin to do thePlayer-cards and part of it makes sure that everything is served over https.
  • Apple – Don’t encrypt your feed or anything in it.
    Ok, Apple doesn’t come right out and say this, they simply give you a “Can’t read your feed” error if you submit a https URL and you are left to try and figure out why. Turns out that with a little experimentation you will find that not only that your feed can’t be encrypted, nothing in your feed can be encrypted.

Welcome to the modern web.

Trying to play nice

iTunes is still the most popular destination for podcast discovery. So you’ve got to have your feed there. iTunes has a lot of rules though. Early on, we had tools like FeedBurner that would “wash” our feeds and make them iTunes friendly. Feedburner WOULD accept an encrypted URL and spit out unencrypted.

As with all free services on the web, priorities change and services die. FeedBurner hasn’t had an update in many years and if you listen closely, you can hear the rumblings about Google shuttering it soon. That’s fine, long ago, I moved my entire podcast infrastructure to using the Bluberry Powerpress plugin for WordPress and it puts out an iTunes friendly feed by itself. So you would think that it’s just a matter of telling iTunes you have a new feed right? Well, um…no.

The Fix

The problem is that all three of these giants, Google, Twitter, and Apple, have influence over my little podcast.  After hours of searching for a solution, and switching to a paid cert simply because someone suggested that Apple simply didn’t like Lets Encrypt (not true) I finally decided to try an unencrypted feed. I turned off the automatic redirect to https that I had in my Apache configuration file for my podcast site. This almost worked.

I resubmitted the regular HTTP URL.  I could see from tailing my log files that iTunes accepted it and actually read it this time. I was elated for about half a second until Apple came back with an error message telling me that the podcast image couldn’t be encrypted either. <sad trombone />

Making Progress

Ok, I was making progress. This was further than I had gotten in several weeks of research. I jumped to the conclusion that Apple didn’t want to see any https inside the feed. This presented an interesting challenge because of the way that WordPress and thus PowerPress handle feeds.

The feed is generated in feed-podcast.php inside the PowerPress plugin directory. Looking it over, this is just a big while loop that pulls in the necessary information and then echoes out XML, one post at a time. Since it uses WordPress’s native functions to get urls and such, everything says https. I didn’t want to change WordPress because then my entire site would be back to unencrypted. I also didn’t want to check each echo statement for https and remove the s if it was found. This could get messy in a hurry.

It became obvious that I needed to grep the entire feed and replace https with HTTP. I’m a programmer, this should be easy. Except that there is no point in the WordPress flow where I can intercept the entire feed before it is sent out. WordPress has a complex system of hooks and filters but none of them were “RSS_FEED_BEFORE_IT_IS_SENT_OUT”. <sad trombone />

Since WordPress doesn’t gather the entire feed into a variable and then spit it out, a different solution is needed. WordPress treats the feed like any other page, it has a template that is executed that directly outputs XML. This was the absolute worst case scenario. Since Powerpress controlled the feed for my podcast, it looked like I was going to have to hack the core of the PowerPress plugin itself.

Old-School PHP to the rescue

Digging around for solutions, I came across a snippet of code that suggested using PHP’s Output buffer. Something I’ve not done in a very long time. The code in the snippet was not helpful, but the idea it sparked was what worked.

function my_callback($buffer) 
{
    return str_replace('https://','http://',$buffer);
}
 
ob_start("my_callback");

The heart of the solution is the function my_callback. In it, I simply replace HTTPS:// with HTTP://. Not terribly difficult to do in PHP. If I wanted to get fancy, I could probably have used an anonymous function int he ob_start() command, this solution is easier to read.

Next, we put in the ob_start(). In it we put the optional callback parameter and specify the name of the function we created just above it. For those not familiar with the callback parameter, here is an excerpt from the manual.

An optional output_callback function may be specified. This function takes a string as a parameter and should return a string. The function will be called when the output buffer is flushed (sent) or cleaned (with ob_flush(), ob_clean() or similar function) or when the output buffer is flushed to the browser at the end of the request. When output_callback is called, it will receive the contents of the output buffer as its parameter and is expected to return a new output buffer as a result, which will be sent to the browser. If the output_callback is not a callable function, this function will return FALSE.

tl;dr my_callback is called and passed everything that output buffering collected. I can make any modifications to it and whatever I return is what is actually sent out. That is exactly what I did.

At the bottom of the feed template, I simply put

<?php ob_end_flush(); ?>

This triggers the callback that is the secret sauce to the solution.

That’s all it took. I was able to exactly what I needed to do by going old-school on it’s butt and using output buffering. This gathered everything into a single variable that I was able to wash before outputting.

Once this was in place, I resubmitted the newly cleansed feed to Apple and not only did it accept it, the change was completed and visible in iTunes within about 30 minutes. (See the pretty new logo that the lovely and talented Kathy did for us two years ago? Until yesterday we still had our old logo there)

 

Screen Shot 2016-04-09 at 12.01.09 PM
 

fClose()

This blog post is not about the horrors of using a free service like FeedBurner. FeedBurner served me well for many years. It was my fault to begin with for submitting the FeedBurner url itself and not one hosted on my site as a 301 Redirect. The only lesson there is to make sure you own all the important pieces of your project, like feed URLs. :)

Here are the takeaways though.

  • iTunes does not hate Let’s Encrypt, iTunes hates encryption.
  • My site now runs both https and HTTP but I only advertise https
  • My feed is now exclusively HTTP, everything else on the site is HTTPS
  • I had to hack the core, which sucks but sometimes is necessary.

With regards to that last point. I believe that because PowerPress is awesome, there is a way to do this by specifying my own feed template. Reading the code, it looks like it is possible. I’ve written them to get clarification and will update this post when they get back to me.

Honestly, I don’t know how muggles deal with this. I was able to solve this because I am a programmer.

Until next time,
I <3 |<
=C=

Seven Words You Can Never Say on Television…But Can Apparently Say In Code

Dear Reader,

PROFANITY WARNING!

(Mom, don’t read this one)

The late great George Carlin had many awesome comedy skits. One of them – possibly his most famous – is “Seven Words You Can Never Say on Television” from the comedy album “Class Clown”. In it he gives his list of  seven words that – at the time – were inappropriate for over the air broadcast in the United States.

I thought it would be fun – if for no other reason than clickbait – to run the 7 dirty words against Github to see who is using what, and where. I took screenshots so that you can see each word and which languages use it the most. I also list PHP’s rating for each word out of the top 10 languages.

So, without further adieu here are Carlin’s “Seven Words You Can Never Say on Television” but you can apparently use in code with impunity.

The List

  • Shit

    (PHP – 9/10)

    shit

  • Piss

    (PHP – 5/10)
    piss

  • Fuck

    (PHP – 8/10)
    fuck

  • Cunt

    (PHP – 3/10) :facepalm:

    cunt

  • Cocksucker

    (PHP – 5/10)

    cocksucker

  • Motherfucker

    (PHP – 4/10)
    motherfucker

  • Tits

    (PHP – 4/10)
    tits

fClose()

The only conclusion that can be drawn is that we have lost the ability to communicate without profanity. :)

Until next time,
I <3 |<
=C=