Straight out of the box, PHP provides us with a wealth of pre-built text processing functions. Whilst there are many different string functions available, such as strtolower, str_contains, trim, and addslashes, the function similar_text is one that's possibly overlooked when programming with PHP. The similar_text function is a great function to calculate the similarity of two strings. This becomes very useful when working with databases and searching for similar text based on user input. But how does the similar_text function work and what are some examples of comparing two strings against each other for their similarities?
Similar text (similar_text) in PHP accepts two string parameters and an optional float percentage value (passed by reference). The idea is to pass the function to two different strings and the similar_text function will return an integer value of their similarity. If you provided a float reference, this will be the result of the average of the lengths of the given string times 100. This function's algorithm has been based on the book Programming Classics: Implementing the World's Best Algorithms by Ian Oliver, with an adjustment to the implementation using recursive calls instead of a stack.
Let's take a look at similar_text in action with a handful of examples to show how it works.
PHP similar_text Example
Below is an example of the similar_text PHP function, where to start we are passing two identical strings. As the strings match the expected the result should be four. This returns four because out of a total of 4 characters, all match. If we run the same again, this time passing the "$percent" variable as a reference, when echoing it, it should be 100, which is a float meaning 100%.
# Outputs: 4
similar_text('test', 'test');
# Same, but passing varaible $percent as referance
similar_text('test', 'test', $percent);
# Outputs: float(100)
echo percent;
Now let's compare some similar but not identical strings. Here we are now passing 'test' and 'testing', similar but not identical, and we should expect around 72% matching characters.
similar_text('test', 'testing', $percent);
# Outputs: float(72.72727272727273)
echo percent;
Now let's compare some similar but not identical strings. Here we are now passing 'test' and 'testing', similar but not identical, and we should expect around 72% matching characters. Below we're running two test string searchings, one "Apple iPhone 15" and another "Android iPhone 15" (which is made up). As we can see, the "Apple iPhone 15" of course has a higher match than the second "Apple iPhone".
similar_text('Apple iPhone 15', 'Apple iPhone', $percent);
# Outputs: float(88.88888888888889)
echo percent;
# Compared to
similar_text('Android iPhone 15', 'Apple iPhone', $percent);
# Outputs: float(55.172413793103445)
echo percent;
Drawbacks of similar_text
Whilst the similar_text function does what it says it should do, this function does have its drawbacks. The overall complexity of the algorithm is pretty expensive (O(max(n,m)**3)
) which following the O notion time complexity, this function gets slower the bigger the input size ("n"). Another drawback of this function is that it's possible to yield different results simply by swapping the parameters around, as the example below shows.
similar_text('this is a random test', 'strings more strings', $percent);
# Outputs: float(34.146341463414636)
echo percent;
# Compared to
similar_text('strings more strings', 'this is a random test', $percent);
# Outputs: float(9.75609756097561)
echo percent;
similar_text vs Levenshtein
An alternative to similar_text is the Levenshtein function in PHP. Whilst not the same, the Levenshtein function can be used to find misspellings with words, by finding the shortest distance between two strings. Along with a word directory can be very powerful. While the Levenshtein function is more complex to use, using similar_text might be easier to use in your situation. It is normally best to use Leveenshtien functions on larger datasets compared to using the similar_text function.
How to Make PHP's similar_text Case Insensitive
When working with similar_text it becomes apparent that comparing case-sensitive results doesn't yield the same output. Let's take the below example. Even though we're comparing the same string, because they're not exact, due to one being uppercase and the other lowercase, the percentage returns in 0%. To overcome this issue, before comparing the strings it's a good idea to force all characters to be lowercase, and to do this, you can use the strtolower function in PHP.
similar_text('PHP', 'php', $percent);
# Outputs: float(0)
echo percent;
# Convert similar_text to case insensitive matching
similar_text(strtolower('PHP'), strtolower('php'), $percent);
# Outputs: float(100)
echo percent;
Conclusion
The similar_text function is a great way to use PHP's built-in string-matching algorithm. Whilst not perfect and time expensive on large data sets, it's a good starting point to turn your PHP web application into a better string-matching machine.
- Use strtolower to enable case-insensitive matching
- Best avoided on huge string lengths due to time complexities with the alothirthm
- Use PHP's Levenshtein function on larger data sets
- Swapping the strings around, between parameters 1 and 2, may yield different results