Java String codePointAt() Method

The String.codePointAt() method in Java is used to return the Unicode code point value of the character at the specified index.

Table of Contents

  1. Introduction
  2. codePointAt() Method Syntax
  3. Overloaded codePointAt() Methods
    • codePointAt(int index)
    • codePointAt(char[] a, int index)
  4. Examples
    • Getting Code Point from a String
    • Getting Code Point from a Character Array
  5. Real-World Use Case
  6. Conclusion

Introduction

The String.codePointAt() method in Java is used to return the Unicode code point value of the character at the specified index. This method is particularly useful when working with characters outside the Basic Multilingual Plane (BMP) that are represented by surrogate pairs.

codePointAt()() Method Syntax

The syntax for the codePointAt() method is as follows:

codePointAt(int index)

public int codePointAt(int index)
  • index: The index of the character.

The method returns the Unicode code point value of the character at the specified index.

codePointAt(char[] a, int index)

public static int codePointAt(char[] a, int index)
  • a: The character array.
  • index: The index of the character.

The method returns the Unicode code point value of the character at the specified index in the character array.

Examples

Getting Code Point from a String

The codePointAt(int index) method can be used to get the Unicode code point value of the character at the specified index in a string.

Example

public class CodePointAtExample {
    public static void main(String[] args) {
        String str = "Hello\uD834\uDD1EWorld";

        int codePoint = str.codePointAt(5);

        System.out.println("Code point at index 5: " + codePoint);
    }
}

Output:

Code point at index 5: 119070

In this example, the character at index 5 is part of a surrogate pair representing a musical symbol.

Getting Code Point from a Character Array

The codePointAt(char[] a, int index) method can be used to get the Unicode code point value of the character at the specified index in a character array.

Example

public class CodePointAtCharArrayExample {
    public static void main(String[] args) {
        char[] charArray = { 'H', 'e', 'l', 'l', 'o', '\uD834', '\uDD1E', 'W', 'o', 'r', 'l', 'd' };

        int codePoint = Character.codePointAt(charArray, 5);

        System.out.println("Code point at index 5: " + codePoint);
    }
}

Output:

Code point at index 5: 119070

In this example, the character at index 5 and 6 form a surrogate pair representing a musical symbol.

Handling Supplementary Characters

When working with supplementary characters (characters with code points above U+FFFF), it's important to handle surrogate pairs correctly.

Example

public class SupplementaryCharacterExample {
    public static void main(String[] args) {
        String str = "A\uD834\uDD1EB";

        int codePoint1 = str.codePointAt(1); // High surrogate
        int codePoint2 = str.codePointAt(2); // Low surrogate
        int fullCodePoint = str.codePointAt(str.offsetByCodePoints(0, 1));

        System.out.println("Code point at index 1: " + codePoint1);
        System.out.println("Code point at index 2: " + codePoint2);
        System.out.println("Full code point at index 1: " + fullCodePoint);
    }
}

Output:

Code point at index 1: 55348
Code point at index 2: 56606
Full code point at index 1: 119070

In this example, the code points at index 1 and 2 represent the high and low surrogates of the supplementary character.

Real-World Use Case

Processing Unicode Text

In a real-world application, you might need to process text containing characters from various languages and scripts, including supplementary characters. The codePointAt() method allows you to correctly handle these characters.

Example

public class UnicodeProcessingExample {
    public static void main(String[] args) {
        String text = "Emoji: \uD83D\uDE00";

        for (int i = 0; i < text.length(); i++) {
            int codePoint = text.codePointAt(i);
            if (Character.isSupplementaryCodePoint(codePoint)) {
                System.out.printf("Supplementary character at index %d: %d%n", i, codePoint);
                i++; // Skip the next index as it is part of the surrogate pair
            } else {
                System.out.printf("Character at index %d: %d%n", i, codePoint);
            }
        }
    }
}

Output:

Character at index 0: 69
Character at index 1: 109
Character at index 2: 111
Character at index 3: 106
Character at index 4: 105
Character at index 5: 58
Character at index 6: 32
Supplementary character at index 7: 128512

In this example, the code processes the text and correctly identifies the supplementary character (an emoji).

Conclusion

The String.codePointAt() method in Java is used for working with Unicode characters, especially when dealing with supplementary characters represented by surrogate pairs. By understanding how to use this method and its overloaded variants, you can efficiently handle and process text containing a wide range of characters in your Java applications. Whether you are working with strings or character arrays, the codePointAt() method provides a reliable solution for these tasks.

Comments