web.dev

The Intl.Segmenter object is now part of Baseline

thumbnail

Table of Contents

  1. Introduction
  2. Key Features
  3. Usage
  4. Compatibility
  5. Conclusion

Introduction

The Intl.Segmenter object is now part of Baseline, providing native support for locale-sensitive text segmentation in JavaScript. This enables developers to accurately break text into segments based on linguistic boundaries, such as sentences, words, or grapheme clusters.

Key Features

  • Locale-sensitive text segmentation for over 50 language scripts
  • Customizable segmentation options for tailored text processing
  • Improved accuracy in identifying boundaries of sentences, words, and grapheme clusters
  • Seamless integration with existing JavaScript applications

Usage

To use Intl.Segmenter, simply instantiate a new object with the desired locale and segmentation options. Then, call the segment method on a given text to retrieve an iterator that yields the segmented parts based on the specified locale rules.

const segmenter = new Intl.Segmenter('en', { type: 'sentence' });
const text = 'This is a sample sentence. Another one follows.';
const iterator = segmenter.segment(text);

for (const { segment, breakType } of iterator) {
  console.log(segment, breakType);
}

Compatibility

The Intl.Segmenter object is now available in most modern browsers and is supported in Node.js environments. Developers can utilize this feature in web applications, server-side scripts, and other JavaScript projects without the need for external libraries or dependencies.

Conclusion

With the introduction of Intl.Segmenter in Baseline, developers can now benefit from a standardized and interoperable solution for locale-sensitive text segmentation in JavaScript. By leveraging this feature, developers can ensure more accurate text processing and better support for various languages and writing systems in their applications.