The Excerpt Reloaded: The Root Cause and Fix For Creating Validated XHTML

Rob June 2nd, 2007

Update 7/17/2007: New version R1.4: Fixed the plugin URI and author URI so that the links on the plugin administration page will work.

Update 7/10/2007: I incorporated a fix that Hillary Melville described in a comment (see comment section below) into the downloadable plugin, which is now at version R1.3.

Update 6/6/2007: I found an additional cause of unclosed tags and found a fix. See this article for details. This second fix is incorporated in the plugin you can download from this page.

A Wordpress plugin called the-excerpt-reloaded allows one to generate excerpts from the first N words of a wordpress post. The problem with this plugin is that it can create XHTML that doesn’t validate (some tags do not close). I determined the root cause of this problem and found a fix for this issue. You may download the updated plugin here.

The Problem

The invalid XHTML is caused when a <p> tag is inserted at the start of the excerpt, but there is no closing </p> tag. I found that many others were having this same issue (see comments here). The plugin has a parameter $fix_tags that is supposed to fix issues like this, but even with this parameter set to “true”, the problem still occurs.

Why It Happens

The problem is introduced when the function get_the_excerpt_reloaded() calls the apply_filters() function. apply_filters() performs several “filter” operations on the excerpt text in order to generate the final XHTML from a block of text. I believe that this is mainly for processing wordpress posts that are created with the WYSIWYG editor to create things like smilies or paragraph tags. One of the filter functions that it calls for “the_excerpt” filter is named “wpautop()”. This is the filter that adds the paragraph tags. This is where the problem is caused.

After the part of wpautop() where <p> and </p> pairs are inserted around text paragraphs, it does a few more search and replaces on the text before it returns the final output. Here are 2 of the search/replaces it performs:

  1. If any tag in a set of tags (call this set $allblocks) is directly after a <p> tag, with only whitespace character(s) (and no non-whitespace characters) between <p> and that opening tag, then the <p> is removed.
  2. If any tag in the $allblocks set of tags is directly before a </p> tag, with only whitespace character(s) (and no non-whitespace characters) between that tag and </p>, then the </p> is removed.

One of the tags in the $allblocks set is <div> (or </div>). By default, the_excerpt_reloaded puts a <div class=”more-link”> and </div> at the end of the excerpt containing the $more_link_text (the text that you designate that says something like “Continue Reading…”). Presumably, this is done so that you can define special formatting for the “more-link” class in your CSS.

The </div> will appear at the end of the excerpt. After the intermediate step where wpautop() places the </p> at the end of your post, it performs check #2, finds a </div> tag right before a </p> tag, and it removes the </p>. wpautop doesn’t remove the opening <p> because it is not directly next to another tag, and is most likely next to the first word of your post. This is why an opening <p> remains, while the </p> is removed!

The Fix

To fix this problem, we move the call to apply_filters() to be before adding the final <div></div> pair that encloses your $more_link_text.

First at the very end of the function, cut apply filters() from here:

$output = apply_filters($filter_type, $output);
return $output;

Next, paste the apply_filters() call at the end of this text:

$output = rtrim($output, “\s\n\t\r\0\x0B”);
$output = ($fix_tags) ? $output : balanceTags($output);
$output .= ($showdots && $ellipsis) ? ‘…’ : ”;
$output = apply_filters($filter_type, $output);

This fix is included in the updated the_excerpt_reloaded.php file attached to this post.

The $fix_tags bug

There was another bug that I noticed (as well as others) where the $fix_tags parameter is not passed from the_excerpt_reloaded() to get_the_excerpt_reloaded(). The fix for this is also included in the updated the_excerpt_reloaded() file attached to this post. To fix this, insert $fix_tags in this line:

echo get_the_excerpt_reloaded($excerpt_length, $allowedtags, $filter_type, $use_more_link, $more_link_text, $force_more, $fakeit, $no_more, $more_tag, $more_link_title, $showdots);

to make it look like this:

echo get_the_excerpt_reloaded($excerpt_length, $allowedtags, $filter_type, $use_more_link, $more_link_text, $force_more, $fakeit, $fix_tags, $no_more, $more_tag, $more_link_title, $showdots);

Also, change this line:

function get_the_excerpt_reloaded($excerpt_length, $allowedtags, $filter_type, $use_more_link, $more_link_text, $force_more, $fakeit, $no_more, $more_tag, $more_link_title, $showdots) {

to this:

function get_the_excerpt_reloaded($excerpt_length, $allowedtags, $filter_type, $use_more_link, $more_link_text, $force_more, $fakeit, $fix_tags, $no_more, $more_tag, $more_link_title, $showdots) {

Possible Other Issues and Other Features

Even with this fix, I can envision some other problems that the_excerpt_reloaded() may cause.

Even with $fix_tags set to true, opened tags that are closed prematurely because the excerpt is cut-off will not have all of the intended text between the opening and closing tag. This may or may not be an issue. It may be a good idea to cut off the opening tag (as well as all text after it) to make this cleaner.

Another issue is that when counting N words, the plugin will erase any kind of whitespace (including newlines) between each word and just create a “space”. I’m not sure if this is always an issue. It may cause paragraphs to get combined.

One feature that I would like to add is to make the $more_link_text appear as part of the last paragraph (at the end) instead of separated as it’s own paragraph.

When I have some time, I’ll see if I can make these improvements to the_excerpt_reloaded(). It was fun doing the detective work to determine the cause of the invalid XHTML problem. I hope that this fix is useful for you.

24 Comments »

  1. willon 04 Jun 2007 at 1:03 am

    Thanks for the update!

  2. […] June 5, 2007: Rob from Rob’s Notebook posted a comment, which you can see below, about his mod of the Excerpt Reloaded plugin. There is a problem with the original plugin where very often the closing paragraph tag </p> […]

  3. Steveon 05 Jun 2007 at 3:28 am

    Cheers for the update Rob.

  4. […] –more– pseudo-tag - the body of the code comes courtesy of the-excerpt-reloaded with modifications - for which, many […]

  5. […] the_excerpt Reloaded […]

  6. Hillary Melvilleon 09 Jul 2007 at 11:51 pm

    Good work! I wish I had found this about 3hours earlier :)

    You missed this one:

    if(’all’ != $allowed_tags) {
    $output = strip_tags($output, $allowedtags);
    }

    if(’all’ != $allowedtags) {
    $output = strip_tags($output, $allowedtags);
    }

  7. Robon 10 Jul 2007 at 8:40 pm

    Good find Hillary! Thank you for pointing it out.

    I updated the downloadable plugin to have this fix. It is now at version R1.3.

  8. Alishaon 15 Jul 2007 at 5:43 pm

    Thank you so much for this fix! I spent quite a long time trying to figure out why it wasn’t XHTML valid!

  9. Plugins de Wordpress at Habitaquoon 05 Oct 2007 at 7:15 pm

    […] el tema de los excerpt (”seguir leyendo”) automáticos, acabé cogiendo “the_except_reloaded” (ya modificado por otro) para poder cortar por caracteres y no por palabras, respetando las […]

  10. […] The Excerpt Reloaded This plugin does what I always wanted to be done: instead of your post being truncated without any option and shown plain (and boring) by the original “the_excerpt()” function of Wordpress, this plugin - that was originally written by Kaf Oseo - gives you control over the length and the format of your excerpt. While truncating the post, this plugin tries to prevent tags from not being closed - it works in most cases. Sometimes, properly closing the tags does not work, which will break XHTML validity - e.g. if a link is created at the very position where the post is truncated. Whatever, it works in most cases, and I am sure they find a way of further improving the plugin. […]

  11. ITExperience.neton 21 Nov 2007 at 4:58 pm

    Thank you for this interesting article! It has really helped me building my own website!

  12. […] dies ist verstanden und der Tipp das Plugin the excerpt Reloaded zu verwenden für mich zu der Downloadseite mit den notwendigen Hinweisen. Nun geht es ans […]

  13. Trishaon 21 Jan 2008 at 6:20 pm

    Using this plugin (your version) with WP 2.3.2 - is there a way to make it take the excerpt from the most recent Post, but not Pages? I use this in the sidebar and it works fine on my homepage (showing excerpt from most recent post), but when I go to a Page, this section of the sidebar now shows an excerpt from that particular Page, which is not what I want. Is there a modification I can make to stop it from including Pages? Thanks for any advice you can offer….

  14. Crystalon 23 Feb 2008 at 6:31 pm

    I using the last plugin - Thank you. I still get a 404 error when I click continue to view the entire post. Please, direct me to right place to get an answer. I’ve been working for hours on this :[

  15. Gilberton 05 Mar 2008 at 5:05 pm

    I wanted to ask if that fix goes into the plugin itself or the where the file is applied into the loop?

  16. Gilberton 05 Mar 2008 at 5:08 pm

    I downloaded the fixed plugin and uploaded it to my blog, and still the paragraphs in my main page ignore the . what can still be wrong in my version? Could someone help me?

  17. Gilberton 05 Mar 2008 at 5:16 pm

    This is what I currently have in my code:


    ‘, ‘none’, true, ‘Keep Reading…’, false, 1,0,”,”,1); ?>

    It still doesnt add the paragraph breaks.

  18. jsherkon 29 Mar 2008 at 1:43 pm

    I upgraded my sandbox to WordPress 2.5.rc2 (release candidate 2) and the_excerpt_reloaded R1.4 appears to be function fine without any glitches.

    Just wanted to let you know!!

  19. […] little plugin for Wordpress that allows me to format my excerpts all I like. It’s called The Excerpt Reloaded, and it let’s you set a ton of options like excerpt length, the name of your “read […]

  20. Meerblickzimmeron 10 Apr 2008 at 3:18 am

    Hei!

    Thanks for your work, but it doesn´t work for me. I use your fixed Plugin and this code:

    the_excerpt_reloaded(150, ‘

    ‘, TRUE, ‘les mer »’, FALSE, 2);

    Whats wrong? Can you help me?
    Thanks a lot!
    Greetings.
    M

  21. Meerblickzimmeron 10 Apr 2008 at 3:22 am

    I´m sorry, but the my code i not right. The right code is her: http://www.mein.meerblickzimmer.de/theexcerpt.txt

    Maybe can you help me.
    Thanks a lot! M

  22. Davidon 24 Apr 2008 at 6:57 pm

    Hello

    1.
    I am using this MOD, GREATTT ! the blank admin page and the “Wordpress requires Cookies but your browser does not support them or they are blocked” issue both are solved by this MOD. I hope others can benefit from this. Since I was wandering and the issue was of Excerpt_reloaded, other than anything.

    2. I have another problem - when I type exceprt, It does not show the typed excerpt. + earlier version it showed typed excerpt but did not give me the more link.

    3. So you need to provide solution for, showing the typed excerpt, with the customized more link.

    PLEASEEEE

  23. NSpeakson 27 Apr 2008 at 1:51 am

    Does this works with Wordpress 2.5?

  24. More notes on v6 - blog - coda.cozaon 05 May 2008 at 8:25 pm

    […] the_excerpt Reloaded (more customisable than WP’s default excerpt function, it allows you to exclude elements such as images, links, etc.) […]

RSS feed for comments on this post. TrackBack URI

Leave a comment

If you want to leave a feedback to this post or to some other user´s comment, simply fill out the form below.

(required)

(required)