From 468b7af67e5ea2f7308581115d4c4eb6d4d45556 Mon Sep 17 00:00:00 2001 From: Alan Wu Date: Tue, 24 Dec 2024 15:40:57 -0500 Subject: [PATCH] [DOC] RegExp: The Graph property includes some control characters The behavior of this is a carry-over from Oniguruma: https://github.com/kkos/oniguruma/blob/5eaee9f5f8f674aff4875c2b35db00758fa349d6/doc/RE#L246 The previous phrasing was inaccurate since it's fair to construe e.g. Zero Width Joiner as a control character. Reported-by: https://github.com/ruby/ruby/pull/12294 --- doc/_regexp.rdoc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/doc/_regexp.rdoc b/doc/_regexp.rdoc index ae784e5adf..a2196382df 100644 --- a/doc/_regexp.rdoc +++ b/doc/_regexp.rdoc @@ -836,8 +836,9 @@ Some commonly-used properties correspond to POSIX bracket expressions: These are also commonly used: - /\p{Emoji}/: Unicode emoji. -- /\p{Graph}/: Non-blank character - (excludes spaces, control characters, and similar). +- /\p{Graph}/: Characters excluding /\p{Cntrl}/ and /\p{Space}/. + Note that invisible characters under the Unicode + {"Format"}[https://www.compart.com/en/unicode/category/Cf] category are included. - /\p{Word}/: A member in one of these Unicode character categories (see below) or having one of these Unicode properties: