Introduction
Counting the number of duplicate words in a string is a common task in text processing. Whether you're analyzing text data, cleaning up user inputs, or performing any other kind of text manipulation, understanding how to identify and count duplicate words can be very useful. In this blog post, we will walk you through the steps to create a Java program that counts the number of duplicate words in a given string.
Steps to Solve the Problem
- Normalize the String: Convert the string to lowercase to ensure the comparison is case-insensitive.
- Split the String: Use a regular expression to split the string into words.
- Use a Map: Use a
HashMap
to store each word and its count. - Count Duplicates: Iterate through the map to count and display duplicate words.
Example Program
Here is a complete Java program that counts the number of duplicate words in a string.
Example Code:
import java.util.HashMap;
import java.util.Map;
public class DuplicateWordCounter {
public static void main(String[] args) {
String input = "Java is great and Java is fun. Programming in Java is great.";
// Normalize the string by converting it to lower case
String normalizedInput = input.toLowerCase();
// Split the string into words using a regular expression
String[] words = normalizedInput.split("\\W+");
// Use a HashMap to store each word and its count
Map<String, Integer> wordCountMap = new HashMap<>();
// Count the occurrences of each word
for (String word : words) {
if (wordCountMap.containsKey(word)) {
wordCountMap.put(word, wordCountMap.get(word) + 1);
} else {
wordCountMap.put(word, 1);
}
}
// Display the duplicate words and their counts
System.out.println("Duplicate words in the string:");
for (Map.Entry<String, Integer> entry : wordCountMap.entrySet()) {
if (entry.getValue() > 1) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
}
}
Output:
Duplicate words in the string:
java: 3
is: 3
great: 2
Explanation
-
Normalize the String:
- The input string is converted to lower case using
toLowerCase()
to make the comparison case-insensitive.
- The input string is converted to lower case using
-
Split the String:
- The string is split into words using the regular expression
\\W+
, which matches any non-word character. This ensures that punctuation and other non-word characters are removed.
- The string is split into words using the regular expression
-
Use a HashMap:
- A
HashMap
is used to store each word and its count. ThecontainsKey()
method checks if a word is already in the map, and if so, increments its count. Otherwise, it adds the word to the map with a count of 1.
- A
-
Count Duplicates:
- The program iterates through the map entries using an enhanced for loop. It checks if the count of a word is greater than 1 and prints the word and its count if it is a duplicate.
Conclusion
This Java program efficiently counts the number of duplicate words in a string by leveraging the HashMap
data structure. This approach ensures that all words are treated equally regardless of their case and punctuation, providing an accurate count of duplicate words. This method can be adapted and extended for various text processing needs, making it a valuable tool for Java developers.
By understanding and implementing this program, you can handle text data more effectively, making your applications more robust and user-friendly. Happy coding!
for duplicateWords("Super Man Bat Man Spider Man");
ReplyDeletethe output should be 3
but it is coming as 2
can you check.
word.toLowerCase() is causing issue
You are correct. Due to toLowerCase() the result was calculating wrong. Fixed it. Thanks for reporting.
DeleteWhere exactly on the program are supposed to to put the toLowerCase() ?
ReplyDelete