org2blog hack: Display section numbers

In Emacs `org-mode’, if you put “#OPTIONS: num:t” at the top of a file, the
document that is exported will contain section numbers. However, with the same
settings, blogs that are generated by `org2blog’ (version: punchagan commit
387217f792) will not have the same effect.

I have opened an issue on github. But there has been no reply, this is quite a
common problem for free software projects. Therefore, I did a hack in
`org2blog’ to solve the problem myself.

The effect I want is, text in `org-mode’ like this:

#+BEGIN_SRC Language

1 Hello

1.1 World

1.2 Emacs

#+END_SRC

Which once exported, becomes:

1 Hello
1.1 World
1.2 Emacs

2 Locate the bug

I don’t have much experience in Elisp except for configuring Emacs, so I spent
some time on locating the bug.

This is how I found the bug. I followed the code flow and printed out the
values of variables. Finally, the bug turned out to be nothing to do with
`org2blog’ – It was located in the `org-mode’ instead.

Function `org2blog/wp-post-buffer’ calls function `org2blog/wp-parse-entry’ to
translate `org-mode’ contents into HTML for publishing.

And in function `org2blog/wp-parse-entry’, `org-export-as-html’ is called to
do the translating job.

I found that before `org-export-as-html’, everything was fine. However, after
the contents are processed by `org-export-as-html’, the section number
information disappeared. So clearly there is a bug in `org-mode’
(`org-export-as-html’ is a function of `org-mode’).

Then I typed in “C-h f RET org-export-as-html” to read the document of
`org-export-as-html':

org-export-as-html is an interactive autoloaded compiled Lisp function in
`org-html.el'.

(org-export-as-html ARG &optional EXT-PLIST TO-BUFFER BODY-ONLY PUB-DIR)

Export the outline as a pretty HTML file.
...
...
...
When BODY-ONLY is set, don't produce the file header and footer, simply return
the content of <body>...</body>, without even the body tags themselves.  When
PUB-DIR is set, use this as the publishing directory.

When `org2blog’ calls `org-export-as-html’, the argument `BODY-ONLY’ it passes
in is `t’, so it will only return the contents between <body> and </body>,
but not the whole HTML page.

By trial and error, I found that if I pass `nil’ as the value of `BODY-ONLY’
to function `org-export-as-html’, the returned HTML page will contain section
numbers, but if I pass `t’ to it, no section numbers are generated.

Note: by going through this process, I became familiar with edebug – the
source-level debugger for Elisp programs.

3 Solve the problem

`org-mode’ is much bigger than `org2blog’, and according to my prior
experience, the code of `org-mode’ is dirty, complicated and sparsely
commented. Therefore, I decided to do some modifications to `org2blog’ to
solve this problem.

My idea is to export the `org-mode’ file to a whole HTML page and then
retrieve the part I need, since `org-export-as-html’ works well when `nil’ is
passed in as `BODY-ONLY’.

Write a function to get the contents between <body> and </body>:

(defun org2blog/wp-retrieve-html-body (html-text)
  "Retrieve the content between <body> and </body> from the HTML
string generated by org-export-as-html"
  (substring html-text
             (string-match "<body>" html-text)
             (string-match "</body>" html-text)))

Then modify `org2blog/wp-parse-entry’ to make it to call the function above.

(setq html-text
      ;;Starting with org-mode 7.9.3, org-export-as-html
      ;;takes 4 optional args instead of 5.
      (condition-case nil
          (org2blog/wp-retrieve-html-body
           (org-export-as-html nil nil nil 'string nil nil))
        (wrong-number-of-arguments
         (org2blog/wp-retrieve-html-body
          (org-export-as-html nil nil 'string nil nil)))))

Evaluate `org2blog/wp-retrieve-html-body’ and `org2blog/wp-parse-entry’ by
pressing “M-x eval-defun RET” in their definitions. `org-mode’ files with
“num:t” should now be able to be exported to blogs with section numbers.

4 (possible) Better approaches

There are a few problems with my solution:

  • This problem is caused by a bug of `org-mode’, so the problem should be
    fixed in `org-mode’ instead of the innocent `org2blog’.
  • There are potential risks with using `string-match’ in this case: a
    generated blog might be shortened if its `org-mode’ file contains
    </body>. The safest way is parsing the whole HTML page before processing
    its contents.

Actually, the best way is reporting this bug to `org-mode’ development mailing
list, or fixing the bug and sending them a patch. But I’m too lazy to do so
seeing as my dirty solution works.

5 Get my code

You can clone from my `org2blog’ fork:

$ git clone git@github.com:RenWenshan/org2blog.git

Alternatively, you can add a new upstream to your existing git repository,
then pull from it, please see the git manual for details :).

You can also download my `org2blog.el’ and replace your own with mine:
https://raw.github.com/RenWenshan/org2blog/master/org2blog.el

Happy Hacking!

Note: English is not my first language, so please feel free to point out any
mistakes you might find.