Introduction

This tutorial is part of a collection tutorials on basic data structures and algorithms that are created using Kotlin. This project is useful if you are trying to get more fluency in Kotlin or need a refresher to do interview prep for software engineering roles.

How to run this project

You can get the code for this and all the other tutorials in this collection from this github repo. Here’s a screen capture of project in this repo in action.

Once you’ve cloned the repo, type gradle run in order to build and run this project on the command line.

Importing this project into Jetbrains IntelliJ IDEA

  • This project was created using Jetbrains Idea as a Gradle and Kotlin project (more info).
    • When you import this project into Idea as a Gradle project, make sure not to check “Offline work” (which if checked, won’t allow the gradle dependencies to be downloaded).
    • As of Jun 24 2018, Java 10 doesn’t work w/ this gradle distribution (v4.4.x), so you can use Java 9 or 8, or upgrade to a newer version of gradle (4.8+).

It is a very common problem to search for the presence of a substring inside of a string. Let’s say that:

  • the string is called str (with length n)
  • the substring is substr (with length m).

O(n * m) - Brute force approach

The brute force approach to string searches is done by simply sliding the pattern along the string until you find a match. Every time you slide down to the next character in str, this algorithm doesn’t really remember what it already knows about the string (since it iterates thru the entire length of the string at every iteration). It essentially forgets about what it knows about the string (already) for every n-1 attempts that it makes to match the substring.

/**
 * O(m * n), where m = str.size, and n = substr.size
 *
 * This is an inefficient brute force algorithm which has quadratic complexity.
 */
fun substring(str: CharArray, substr: CharArray): Any {
    // substr can't be longer than str
    if (substr.size > str.size) return false

    // Iterate str using cursor1 and for each index look ahead
    // to see if matches exist for substr
    var occurrences = 0
    for (cursor1 in 0 until str.size) {
        var matchCount = 0
        for (cursor2 in 0 until substr.size) {
            val index = cursor1 + cursor2
            // If index exceeds the size of str that means substr wasn't found
            if (index > str.size - 1) break
            // If there's a match at index between the str and substr
            // then remember it
            if (str[index] == substr[cursor2]) matchCount++
        }
        // Found a match
        if (matchCount == substr.size) occurrences++
    }

    return object {
        val numberOfMatches = occurrences
        val matchFound = occurrences > 0
        override fun toString(): String = StringBuilder().apply {
            append("{match found = $matchFound")
            append(", # matches = $numberOfMatches}")
        }.toString().brightBlue()
    }
}

Please note that if you want to turn a Kotlin String into a CharArray you can use something like "Hello world".toCharArray(). Here’s more info about this on stackoverflow.

O(n + m) - Using a state machine

By using a state machine that is built from the substring pattern we can come up with a much better algorithm to match these patterns inside our string. The idea here is not to forget what we have already seen about the string as we iterate over each character in it.

This is a streaming algorithm where we pass one character at a time (as we iterate thru the entire string) to a state maachine which matches the pattern in the substring. For every iteration:

  • Each character is compared with the character at a cursor (which represents the state) in the substring.
  • If there’s a match, this cursor is incremented, and at the next iteration, the next character in the pattern will be matched, and so on.
  • When there’s a mismatch the cursor is reset to 0.
  • And we know a match has been found when the cursor equals the length of the substring.

This approach is based on the idea of Deterministic Finite Automaton.

/**
 * O(m + n), where m = str.size, and n = substr.size
 *
 * This function uses a deterministic finite automation (DFA) method
 * which entails the use of a state machine to keep track of progress
 * in a game.
 */
fun substring_optimized(str: CharArray, substr: CharArray): Any {

    class StateMachine(val pattern: CharArray) {
        var cursor = 0
        fun add(character: Char) {
            if (pattern[cursor] == character) cursor++
            else cursor = 0
        }

        fun isMatch() = cursor == pattern.size
        fun reset() {
            cursor = 0
        }
    }

    val stateMachine = StateMachine(substr)
    var numberOfOccurrences = 0

    for (cursor in 0 until str.size) {
        stateMachine.add(str[cursor])
        if (stateMachine.isMatch()) {
            stateMachine.reset()
            numberOfOccurrences++
        }
    }

    return object {
        val occurrences = numberOfOccurrences
        val matchFound = numberOfOccurrences > 0
        override fun toString(): String = StringBuilder().apply {
            append("{occurrences = $occurrences")
            append(", matchFound = $matchFound}")
        }.toString().brightBlue()
    }

}

Resources

CS Fundamentals

Data Structures

Math

Big-O Notation

Kotlin

Markdown utilities

Related Posts