Mastering Algorithms: Essential Questions to Understand the Gist of String matching Algorithm

String algorithms are an essential component of computer science, with applications in domains as diverse as natural language processing, bioinformatics, and cryptography. A string is just a collection of characters, and string algorithms are used to manipulate and analyze these collections in an effective manner. This article will examine some of the most popular string algorithms, such as string compression, regular expression, edit distance, string matching, string manipulation, and compression techniques.

You can refer to the searching and sorting algorithm

So let's get started with String algorithms

String matching algorithms

String matching algorithms are used to find a pattern within a larger string. The most basic algorithm is the naive pattern-searching algorithm, which simply checks every possible substring of the larger string to see if it matches the pattern. More advanced algorithms, such as the Knuth-Morris-Pratt and Boyer-Moore algorithms, are much faster and more efficient.



Naive pattern searching

Naive pattern searching is a simple algorithm used to find all the matching occurrences of a specified pattern in a given string. 

It is a brute-force approach that compares the first character of the pattern with the searchable text. If a match is found, pointers in both strings are advanced. If a match is not found, the pointer of the text is incremented, and the pointer of the pattern is reset. This process is repeated until the end of the text. 

Naive pattern searching is useful for small texts and does not require any pre-processing. It is also known as the "brute force" approach to pattern searching. 



Time complexity 
The worst-case time complexity of the algorithm is O(m*(n-m+1)), where m is the length of the pattern and n is the length of the text.


Leetcode questions

Easy: Implement strStr() - Implement the strStr() function to find the first occurrence of a substring in a given string using the naive algorithm. (https://leetcode.com/problems/implement-strstr/)


Easy: Repeated Substring Pattern - Given a string, check if it can be constructed by repeating a substring. (https://leetcode.com/problems/repeated-substring-pattern/)


Medium: Longest Duplicate Substring - Given a string, find the longest substring that occurs at least twice in the string. (https://leetcode.com/problems/longest-duplicate-substring/)


Hard: Shortest Palindrome - Given a string, add the minimum number of characters to it to make it a palindrome. (https://leetcode.com/problems/shortest-palindrome/)


Hard: Longest Chunked Palindrome Decomposition - Given a string, find the maximum number of non-overlapping substrings that can be grouped into a palindrome. (https://leetcode.com/problems/longest-chunked-palindrome-decomposition/)


Rabin-Karp algorithm

The Rabin-Karp algorithm is a string-searching algorithm that uses hashing to find patterns in strings. It is used to find all occurrences of a pattern in a given text.

Preprocessing: The Rabin-Karp algorithm first computes the hash value of the pattern and the first substring of the text that is the same length as the pattern. This is done using a rolling hash function that updates the hash value of the substring in constant time as it slides across the text.

Searching: The algorithm then compares the hash value of the pattern to the hash value of each substring of the text that is the same length as the pattern. If the hash values match, the algorithm compares the pattern to the substring character by character to confirm the match.

Rolling: If the hash values do not match, the algorithm slides the substring one position to the right and recomputes its hash value using the rolling hash function. This process is repeated until a match is found or the end of the text is reached.





Time complexity 
The average and best-case time complexity of the algorithm is O(m + n), where m is the length of the pattern and n is the length of the text

The worst-case time complexity of the algorithm is O(nm), which would occur with an extremely bad hash function that resulted in a false positive at each step

Leetcode questions

Easy: Repeated Substring Pattern -Given a string, check if it can be constructed by repeating a substring. (https://leetcode.com/problems/repeated-substring-pattern/)


Medium: Longest Substring with At Least K Repeating Characters - Given a string and an integer k, find the length of the longest substring that contains at least k repeating characters using the Rabin-Karp algorithm. (https://leetcode.com/problems/longest-substring-with-at-least-k-repeating-characters/)


Hard: Longest Duplicate Substring - Given a string, find the longest substring that occurs at least twice in the string using the Rabin-Karp algorithm. (https://leetcode.com/problems/longest-duplicate-substring/)


Hard: Rolling Hash - Implement a rolling hash algorithm to efficiently compute the hash value of a sliding window of a given string. (https://leetcode.com/problems/rolling-hash/)


Hard: Concatenated Words - Given a list of words, find all the concatenated words in the list using the Rabin-Karp algorithm. (https://leetcode.com/problems/concatenated-words/)

Knuth-Morris-Pratt algorithm

The Knuth-Morris-Pratt (KMP) algorithm is a string-searching algorithm that searches for occurrences of a pattern in a given text. It works by pre-processing the pattern to create a prefix function that encapsulates knowledge about how the pattern matches itself.

The algorithm compares successive characters of the pattern to "parallel" characters of the text, moving from one to the next by incrementing the index if they match. If a mismatch occurs, the prefix function is used to determine where to resume the comparison. 




Time complexity 
The KMP algorithm has a worst-case time complexity of O(m+n), where m is the length of the pattern and n is the length of the text.

Leetcode questions

Easy: Implement strStr() - Implement the strStr() function to find the first occurrence of a substring in a given string using the KMP algorithm. (https://leetcode.com/problems/implement-strstr/)

Medium: Shortest Palindrome - Given a string, add the minimum number of characters to it to make it a palindrome using the KMP algorithm. (https://leetcode.com/problems/shortest-palindrome/)

Hard: Regular Expression Matching - Given a string and a pattern, implement regular expression matching with support for '.' and '*', using the KMP algorithm. (https://leetcode.com/problems/regular-expression-matching/)


Hard: Shortest Supersequence - Given two strings str1 and str2, find the shortest string that has both str1 and str2 as subsequences, using the KMP algorithm. (https://leetcode.com/problems/shortest-common-supersequence/)

Hard: Find All Anagrams in a String - Given a string s and a non-empty string p, find all the start indices of p's anagrams in s, using the KMP algorithm. (https://leetcode.com/problems/find-all-anagrams-in-a-string/)

Boyer-Moore algorithm

The Boyer-Moore algorithm is an efficient string-searching algorithm that is used to find all occurrences of a pattern in a given text. It works by preprocessing the pattern to create two arrays: the bad character array and the good suffix array

The bad character array stores the rightmost position of each character in the pattern, and the good suffix array stores the length of the longest suffix of the pattern that matches a prefix of the pattern. The algorithm then compares the pattern to the text from right to left, starting at the end of the pattern. If a mismatch occurs, the algorithm uses the bad character array and the good suffix array to determine how far to shift the pattern to the right.

The algorithm then continues to compare the pattern to the text from right to left until a match is found or the end of the text is reached. The Boyer-Moore algorithm has a worst-case time complexity of O(mn), where m is the length of the pattern and n is the length of the text, but in practice, it is much faster than other string-searching algorithms




Time complexity 
The worst-case time complexity of the algorithm is O(m*n), where m is the length of the pattern and n is the length of the text.

Leetcode questions

Easy: Implement strStr() - Implement the strStr() function to find the first occurrence of a substring in a given string using the Boyer-Moore algorithm. (https://leetcode.com/problems/implement-strstr/)


Medium: Minimum Window Substring - Given two strings s and t, find the minimum window in s which will contain all the characters in t in complexity O(n) using the Boyer-Moore algorithm. (https://leetcode.com/problems/minimum-window-substring/)


Hard: Regular Expression Matching - Given a string and a pattern, implement regular expression matching with support for '.' and '*', using the Boyer-Moore algorithm. (https://leetcode.com/problems/regular-expression-matching/)


Hard: Longest Repeating Substring - Given a string, find the longest substring that occurs at least twice in the string using the Boyer-Moore algorithm. (https://leetcode.com/problems/longest-repeating-substring/)


Hard: Online Stock Span - Write a class StockSpanner which collects daily price quotes for some stock, and returns the span of that stock's price for the current day using the Boyer-Moore algorithm. (https://leetcode.com/problems/online-stock-span/)

We will discuss a lot more data structure in the coming blog, follow and stay tuned and receive emails for updates :)

To get a gist of data structures can refer There are a couple of books written that can help you more. Refer to Best Books for Data Structures and Algorithms 2023

Let's discuss and grow.
Can also let me know what article you want to discuss in the next blog.

Post a Comment (0)
Previous Post Next Post