Helpful Information
 
 
Category: Regex Programming
Replace text between HTML tags

Hello everyone,
My name is Roi and I need your help.
Hope you will be able to help me-
I wrote a script that take variable that contain HTML code,. it replace the exact word "php" (no matter if it is contain capital letters) with the word "asp"
for example, if the variable contain:

<a href='https://forums.devshed.com/archive/index.php/php.com'>myphp</a> best <b>php</b> website <h1>PhP!!!</h1> php and myphp or phpme - <u>php!</u>!the result will be

<a href='https://forums.devshed.com/archive/index.php/asp.com'>myphp</a> best <b>asp</b> website <h1>asp!!!</h1> aspand myphp or phpme - <u>asp!</u>! well, the problem is that it althouth replace the letters inside the THML tags and because of that the links changed...
Here is my code:

function keepcase($word, $replace) {
$replace[0] = (ctype_upper($word[0]) ? strtoupper($replace[0]) : $replace[0]);

return $replace;
}

$text = strtolower(file_get_contents($folder.$file));
$replace = "asp";
$word = "php";
$output = preg_replace('/\b' . preg_quote($word) . '\b/ei', "keepcase('\\0', '$replace')", $text);

echo $output; What should I change if I want the replacements to be only on the text between the HTML tags?

Thank you in advance,
Roi.

This is a very difficult problem that cannot easily be solved using a regular expression. Your best bet is something like:

preg_replace("/>([^<]*)(" . preg_quote($word) . ")([^<]*)</ie", "'>\\1' . keepcase('\\2', '{$replace}') . '\\3<'", $text);That still won't get all the examples, but it's closer.

-Dan

Hi ManiacDan,
Thank for your code!

I tried it, and as you said - it's good, but still have some problems - for now it don't replace the code inside the <a> tag, but it don't ignore 2 things that it supposed to ignore:
a. it supposed to ignore words that not exactly like the wey words. for example - it supposed to ignore the word "phpme", but istead it replace it to "aspme".
b. It supposed to replace the word "PHP" even if there are capital letters, for example - "pHP".

My input was:

<a href='https://forums.devshed.com/archive/index.php/php.com'>myphp</a> best <b>php</b> website <h1>PhP!!!</h1> php AND myphp or phpme - <u>php!</u>! <img src="" alt="great Php" /><a href='https://forums.devshed.com/archive/index.php/php.com'>php</a>

and the output was:

<a href='https://forums.devshed.com/archive/index.php/php.com'>myasp</a> best <b>asp</b> website <h1>Asp!!!</h1> php AND myphp or aspme - <u>asp!</u>! <img src="" alt="great Php" /><a href='https://forums.devshed.com/archive/index.php/php.com'>asp</a>

Well, the output supposed to be:

<a href='https://forums.devshed.com/archive/index.php/php.com'>myphp</a> best <b>asp</b> website <h1>Asp!!!</h1> asp AND myphp or phpme - <u>asp!</u>! <img src="" alt="great Php" /><a href='https://forums.devshed.com/archive/index.php/php.com'>asp</a>

Any chance you'll take a look at this one more time?

Added \b:


function keepcase($a, $b) { return $b; }
$text = " <a href='https://forums.devshed.com/archive/index.php/php.com'>myphp</a> best <b>php</b> website <h1>PhP!!!</h1> php AND myphp or phpme - <u>php!</u>! <img src=\"\" alt=\"great Php\" /><a href='https://forums.devshed.com/archive/index.php/php.com'>php</a> ";
$word = 'php';
$replace = 'asp';
echo htmlentities(preg_replace("/>([^<]*)(\b" . preg_quote($word) . "\b)([^<]*)</ie", "'>\\1' . keepcase('\\2', '{$replace}') . '\\3<'", $text));
die();-Dan

I think its the best result we got for this in the past 3 days...:)

I made a test and added one more "php" to the text (after the <b>) :

$text = "<a href='https://forums.devshed.com/archive/index.php/php.com'>php</a> best <b>php phP MYPHP</b> website <h1>ever!!!</h1> php and myphp or phpme - <u>php!</u>!";
The script supposed to changed it to "asp" but it didn't...

That still won't get all the examplesYou'll have to write another for the case of content at the "edge" of the string, without any tags to be between. You can either write two more replacements (beginning and end) or you can "cheat" by doing:

function keepcase($a, $b) { return $b; }
$text = " <a href='https://forums.devshed.com/archive/index.php/php.com'>myphp</a> best <b>php</b> website <h1>PhP!!!</h1> php AND myphp or phpme - <u>php!</u>! <img src=\"\" alt=\"great Php\" /><a href='https://forums.devshed.com/archive/index.php/php.com'>php</a> php! ";
$word = 'php';
$replace = 'asp';
echo htmlentities(trim(preg_replace("/>([^<]*)(\b" . preg_quote($word) . "\b)([^<]*)</ie", "'>\\1' . keepcase('\\2', '{$replace}') . '\\3<'", '>' . $text . '<'), '<>'));
die(); -Dan

Thank you!
I'll try it...










privacy (GDPR)