Introduction
Counting the number of words in a string is a common task in text processing. There are multiple ways to accomplish this in Java, each with its own advantages and use cases. In this blog post, we'll explore different methods to count the number of words in a given string.
Table of Contents
- Using
split()
Method - Using
StringTokenizer
- Using Regular Expressions
- Using Apache Commons Lang Library
- Complete Example Program
- Conclusion
1. Using split()() Method
The split()
method in Java is one of the simplest ways to count the number of words in a string. This method splits the string based on a given regular expression and returns an array of substrings.
Example:
public class WordCountUsingSplit {
public static void main(String[] args) {
String input = "Java is great and Java is fun.";
String[] words = input.split("\\s+");
System.out.println("Number of words using split(): " + words.length);
}
}
Explanation:
\\s+
is a regular expression that matches one or more whitespace characters.- The
split()
method splits the string into an array of words based on the given regular expression.
Output:
Number of words using split(): 7
2. Using StringTokenizer
The StringTokenizer
class is a legacy class that provides a way to break a string into tokens. It is simple to use and does not require regular expressions.
Example:
import java.util.StringTokenizer;
public class WordCountUsingStringTokenizer {
public static void main(String[] args) {
String input = "Java is great and Java is fun.";
StringTokenizer tokenizer = new StringTokenizer(input);
System.out.println("Number of words using StringTokenizer: " + tokenizer.countTokens());
}
}
Explanation:
StringTokenizer
splits the string based on default delimiters (whitespace, tab, newline, etc.).- The
countTokens()
method returns the number of tokens.
Output:
Number of words using StringTokenizer: 7
3. Using Regular Expressions
You can use regular expressions with the Pattern
and Matcher
classes to count the number of words in a string.
Example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class WordCountUsingRegex {
public static void main(String[] args) {
String input = "Java is great and Java is fun.";
Pattern pattern = Pattern.compile("\\b\\w+\\b");
Matcher matcher = pattern.matcher(input);
int count = 0;
while (matcher.find()) {
count++;
}
System.out.println("Number of words using regex: " + count);
}
}
Explanation:
\\b\\w+\\b
is a regular expression that matches words.- The
Matcher
class is used to find matches of the pattern in the string.
Output:
Number of words using regex: 7
4. Using Apache Commons Lang Library
The Apache Commons Lang library provides a utility class StringUtils
that can be used to count the number of words in a string.
Maven Dependency:
Add the following dependency to your pom.xml
file:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.12.0</version>
</dependency>
Example:
import org.apache.commons.lang3.StringUtils;
public class WordCountUsingStringUtils {
public static void main(String[] args) {
String input = "Java is great and Java is fun.";
int count = StringUtils.split(input, ' ').length;
System.out.println("Number of words using StringUtils: " + count);
}
}
Explanation:
StringUtils.split()
splits the string based on the given delimiter and returns an array of substrings.
Output:
Number of words using StringUtils: 7
5. Complete Example Program
Here is a complete program that demonstrates all the methods discussed above to count the number of words in a string.
Example Code:
import java.util.StringTokenizer;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.commons.lang3.StringUtils;
public class WordCountExample {
public static void main(String[] args) {
String input = "Java is great and Java is fun.";
// Using split() method
String[] wordsUsingSplit = input.split("\\s+");
System.out.println("Number of words using split(): " + wordsUsingSplit.length);
// Using StringTokenizer
StringTokenizer tokenizer = new StringTokenizer(input);
System.out.println("Number of words using StringTokenizer: " + tokenizer.countTokens());
// Using Regular Expressions
Pattern pattern = Pattern.compile("\\b\\w+\\b");
Matcher matcher = pattern.matcher(input);
int countUsingRegex = 0;
while (matcher.find()) {
countUsingRegex++;
}
System.out.println("Number of words using regex: " + countUsingRegex);
// Using Apache Commons Lang StringUtils
int countUsingStringUtils = StringUtils.split(input, ' ').length;
System.out.println("Number of words using StringUtils: " + countUsingStringUtils);
}
}
Output:
Number of words using split(): 7
Number of words using StringTokenizer: 7
Number of words using regex: 7
Number of words using StringUtils: 7
6. Conclusion
Counting the number of words in a string can be accomplished in multiple ways in Java. The split()
method, StringTokenizer
, regular expressions, and Apache Commons Lang library are all effective methods, each with its own advantages. By understanding and using these different methods, you can choose the most appropriate one for your specific use case.
Happy coding!
Comments
Post a Comment
Leave Comment