Skip to content

Silly WordPress Tricks, Part I: Exporting Blog Posts as HTML

Dear Reader,

I’m making some changes to my blog. Specifically, I am moving most of my Public Speaking blog posts to my mailing list. I’ve got more of those than I realized. Being a program, my thought process was of course “Why spend an hour copying and pasting all of these posts when I can spend two hours and write a script?” :) So I wrote a simple script using WP-CLI to gather the information. So I hammered out some bash to get the job done. I am proud that other than wp-cli, I did not have to resort to any additional PHP code to do the job. It was tempting at times, but I did it.

Yes, I am aware that wp-cli will export to a WXR file. I wanted something simpler.

No, this is not a complete solution. It doesn’t deal with attachments, comments, or metadata. I don’t need those for this project.

Purpose

This bash script will export all of the blog posts in a given WordPress category into individual HTML files.  There is no templating to control how they are output, it is not that smart. It takes no parameters, everything is hard coded.

Assumptions

  1. You have wp-cli installed on your machine, it is named wp, and it is in your path.
  2. You have a WordPress blog

Here is the script. Below, I will break it down line-by-line in case it’s not clear.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/bin/bash
HOME_DIR=~
WP_DIR=/path/to/public_html/
CATEGORY=speaking
mkdir -p $HOME_DIR/blog/$CATEGORY
cd $WP_DIR
 
for LINE in $(wp post list --category_name=$CATEGORY --fields=ID,post_name --format=csv| tail -n +2); do
        ID=$(echo $LINE | cut -f1 -d,)
        SLUG=$(echo $LINE | cut -f2 -d,)
        TITLE=$(wp post get $ID --field=post_title)
        POST_DATE=$(date -d "$(wp post get $ID --field=post_date)" +"%Y-%m-%d")
        AUTHOR=$(wp user get $(wp post get $ID --field=author) --field=display_name)
        echo "Processing $TITLE"
        echo "<h1>$TITLE</h1>" > $HOME_DIR/blog/$CATEGORY/$SLUG.blogpost.txt
        echo "<strong>Author: </strong>$AUTHOR</storng><br />"  >> $HOME_DIR/blog/$CATEGORY/$SLUG.blogpost.txt
        echo "<strong>Date Published : </strong>$POST_DATE<br />" >> $HOME_DIR/blog/$CATEGORY/$SLUG.blogpost.txt
        wp post get $ID --field=post_content >> $HOME_DIR/blog/$CATEGORY/$SLUG.blogpost.txt
done

2: This is the home directory. A directory named blog/CATEGORY will be created under this directory. It is set to the user’s home directory.

3: This is the root of your WordPress installation.

4: This is the category that you want to export.

5: Create the directory to hold the posts

8: Get a list of the post IDs for the given category. Execute lines 9-18 once for each post. The wp command in the for loop will return a csv list of ID and post names. (the slug)

9: Get the post ID from the CSV line using cut.

10: Get the slug.

11: Use wp to get the title of the post.

12: Use wp to get the post date. Use date and a format of YYYY-MM-DD to strip off the time.

13: Use wp to get the author id and feed that to wp to get the author’s display name.

14: Let the user know what we are currently processing.

15: Output the Title of the post as an H1

16: Output the By-line.

17: Output the post date

18: Use wp to gather the content of the actual post and output it.

Lather, rinse, repeat.

 

:Conclusion

This is one of those “gets the job done” scripts. It is brittle and it is fragile. There are a lot of ways to break it and there is zero error-handling in it. All that having been said, it gets the job done. More importantly, it illustrates one of the cool things about wp-cli, scriptability of WordPress. I live in the command line, wp-cli has quickly become one of my most used tools. This, however, is the first time I’ve used it as part of a larger script. I think that’s cool. :)

Until next time,
I <3 |<
=C=