Highlighting a search string in HTML text
Posted on April 5th, 2004 in Code Repository | 40 Comments »
Several hundred user notes on the str_replace page later, and we have a function that provides a variety of text highlighting options for plaintext or HTML strings.
This function, for example, would allow you to parse a HTML document and wrap certain words in HTML tags to direct the users attention.
/**
* Perform a simple text replace
* This should be used when the string does not contain HTML
* (off by default)
*/
define('STR_HIGHLIGHT_SIMPLE', 1);
/**
* Only match whole words in the string
* (off by default)
*/
define('STR_HIGHLIGHT_WHOLEWD', 2);
/**
* Case sensitive matching
* (off by default)
*/
define('STR_HIGHLIGHT_CASESENS', 4);
/**
* Overwrite links if matched
* This should be used when the replacement string is a link
* (off by default)
*/
define('STR_HIGHLIGHT_STRIPLINKS', 8);
/**
* Highlight a string in text without corrupting HTML tags
*
* @author Aidan Lister <aidan@php.net>
* @version 3.1.1
* @link http://aidanlister.com/2004/04/highlighting-a-search-string-in-html-text/
* @param string $text Haystack - The text to search
* @param array|string $needle Needle - The string to highlight
* @param bool $options Bitwise set of options
* @param array $highlight Replacement string
* @return Text with needle highlighted
*/
function str_highlight($text, $needle, $options = null, $highlight = null)
{
// Default highlighting
if ($highlight === null) {
$highlight = '<strong>\1</strong>';
}
// Select pattern to use
if ($options & STR_HIGHLIGHT_SIMPLE) {
$pattern = '#(%s)#';
$sl_pattern = '#(%s)#';
} else {
$pattern = '#(?!<.*?)(%s)(?![^<>]*?>)#';
$sl_pattern = '#<a\s(?:.*?)>(%s)</a>#';
}
// Case sensitivity
if (!($options & STR_HIGHLIGHT_CASESENS)) {
$pattern .= 'i';
$sl_pattern .= 'i';
}
$needle = (array) $needle;
foreach ($needle as $needle_s) {
$needle_s = preg_quote($needle_s);
// Escape needle with optional whole word check
if ($options & STR_HIGHLIGHT_WHOLEWD) {
$needle_s = '\b' . $needle_s . '\b';
}
// Strip links
if ($options & STR_HIGHLIGHT_STRIPLINKS) {
$sl_regex = sprintf($sl_pattern, $needle_s);
$text = preg_replace($sl_regex, '\1', $text);
}
$regex = sprintf($pattern, $needle_s);
$text = preg_replace($regex, $highlight, $text);
}
return $text;
}
Let’s do a quick example:
// Simple Example
$string = 'This is a site about PHP and SQL';
$search = array('php', 'sql');
echo str_highlight($string, $search);
echo "\n";
// With HTML in the text
$string = 'Link to <a href="php">php</a>';
$search = 'php';
echo htmlspecialchars(str_highlight($string, $search));
echo "\n";
// Matching whole words only
$string = 'I like to eat bananas with my nana!';
$search = 'Nana';
echo str_highlight($string, $search, STR_HIGHLIGHT_SIMPLE|STR_HIGHLIGHT_WHOLEWD);
echo "\n";
// With custom highlighting
$string = 'With custom highlighting!';
$search = 'custom';
$highlight = '<span style="text-decoration: underline;">\1</span>';
echo str_highlight($string, $search, STR_HIGHLIGHT_SIMPLE, $highlight);
echo "\n";
// With links
$string = 'I am a <a href="http://www.php.net">link</a>';
$search = 'link';
$highlight = '<a href="http://www.google.com/">\1</a>';
echo htmlspecialchars(str_highlight($string, $search, STR_HIGHLIGHT_STRIPLINKS, $highlight));
This code would produce the following output:
This is a site about <strong>PHP</strong> and <strong>SQL</strong> Link to <a href="/php/"><strong>php</strong></a> I like to eat bananas with my <strong>nana</strong>! With <span style="text-decoration: underline;">custom highlighting</span>! I am a <a href="http://www.google.com/">link</a>
40 Responses
I tried to solve a similar problem but when I saw your function I immediately realized I could re-use your idea. Elegant solution! Thanks a lot!
I love this solution. Quick and easy. However I’m such a novice at regular expressions… -how do you make it so that it only highlights “whole” words?
Thanks!
[Editors Note: I've added this in as an option]
this looks great and a good solution for me.
Thanks a lot! I was about to code a similar function, but yours does the job with style!
Muchas graciassss, desde Argentina. Thanks!
Thanks for your str_highlight! Usefull for all our search results at http://www.3fragezeichen.de!
Matthias
Thanks a lot, very usful! I’ve got the same problem as muscottyb: I’m a novice at regular expressions and want to highlight a string case sensitive. How to do this?
Thanks!
[Editor's Note: Set fifth param to true]
hi aidan’s wonderfull!!!!!!!
Thank you very much for sharing your work. I am a real beginner and this helped me out in my current project. But, more importantly it helped me learn.
Due to the number of options people have kindly requested, I’ve had to change the function prototype.
You may now use bitwise constants to set each option in the 3rd parameter.
I’ve added support for highlighting with links with the STR_HIGHLIGHT_STRIPLINKS constant. This will simply remove the existing link if a match is found in the text, ostensibly to be replaced by the replacement link.
Note: If you don’t like the length of the constants, you can use the values instead. For example, STR_HIGHLIGHT_SIMPLE|STR_HIGHLIGHT_WHOLEWD is the same as 1|2 or 3.
Also note the highlight parameter no longer takes an array, but a string which is fed directly to to preg_replace.
Thanks for all the feedback guys, let me know how you like the new changes.
WOW !
That?s what i was searching for a long while !
THX !!!
Awesome little bit of code!
If the search comes up with a page that contains a large amount of text how can I show just, say, 50 words from the text WHILE ALSO including the 50-word section that contains at least one of the searched-for words?
By the way, I added $needle = explode(‘ ‘,$needle); to the top of the function to convert my search words into an array which works really well at finding all occurrences of searched words regardless of whether or not they are adjacent to each other.
Thanks again and I look forward to some ideas of how I can summarise my search results.
– Galen
I like it
Great, tenx a lot. I’ve searched all day for something like that and found only crabs. Really tenx. Great!!!
Can you make it so that it doesn’t replace already made links containing the word? I just want it to skip the replacing if the higligh word is in a link (i.e. between <a..>… </a>).
if you search for ‘ ‘.$needle.’ ‘ it should avoid any html links
great job! nice tool!
question: is it possible to highlight the string with a special color?
Jacky,
Look at the example “with custom highlighting”. Something like: style=”color: red;” would work fine.
When a Title is entered into the link, and the title contains a keyword it creates a link inside of a link. Can you fix this???
[Editor's Note: Not really, the regex becomes too complicated. It may be easier to start lexing the HTML instead.]
Hi, good work. Enjoy it.
Great job ! I reused your script in my website ! Thanks a lot.
Aidan… You’re my hero
Thanks for a great function! Keep up the good work…
Just found that the word boundary flag (\b) make the pattern fails if it is next to a high ascii character.
Example: \b?t?\b
\W don’t have this limitation but doesn’t mean the same thing.
Any idea how I could work around this?
Otherwise, this library is really cool!
[Editor's Note: Unfortunately this is a problem with the regex library and probably won't be fixed until PHP6]
Exactly what I was looking for !!!
Many Thanks
David
Awesome! I stole your regex for something completely else, but i was long lloking for such a thing.
Hello Aidan, thanks for the script. There is only one problem I have come across and that is …
<?php
$text = “My name is toseef and I want to see the world”;
$search = array(“toseef”, “see”);
?>
The output becomes
My name is <b>to<b>see</b>f</b> and I want to <b>see</b> the world
Notice how toseef isnt right?
[Editor's Note: Yes, if the search strings overlap the word will be highlighted twice. Unfortunately without drastically increasing the complexity of the function, this can not be solved. On the positive side, the user should rarely notice.]
Thank You very much! Very simple and useful!
Fantastic stuff!
Really useful function!
Thank you very much!
Hi! Nice work, thanks for this example, I was going crazy to do something similar
Thanks a lot!
Hello,
I have recently found your function to be so very helpful.
I did have some trouble trying to operate it on whole word searches.
I then found out that the ‘/b’ boundary tag does not work for Unicode chars as i am working with Hebrew.
After researching a bit, i have found a way to solve this and fixed the function to fully support also Unicode.
I will be happy to send you the new version and you can update your site with it.
Thanks a lot!!
Oran
Gr8
got what i was looking for….
Thanks
Very nice site!
Great script Aidan, Thanks heaps!
Really useful function! Thank you very much!
Great, useful function! Thank you very much!
Thx this makes our search script much better. Thx for sharing
nice function, very useful. thanks.
Thanks for sharing! It’s very useful.