370 Commits

Author SHA1 Message Date
naruse
3c6969ec11 * string.c, parse.y, re.c: use rb_ascii8bit_encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15292 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-28 09:03:09 +00:00
akr
fc208c1bd5 * include/ruby/oniguruma.h: precise mbclen API redesigned to avoid
inline functions.
  (onigenc_mbclen_charfound): removed.
  (onigenc_mbclen_needmore): removed.
  (onigenc_mbclen_recover): removed.
  (ONIGENC_MBCLEN_CHARFOUND): removed.
  (ONIGENC_MBCLEN_CHARFOUND_P): defined.
  (ONIGENC_MBCLEN_CHARFOUND_LEN): defined.
  (ONIGENC_MBCLEN_INVALID): removed.
  (ONIGENC_MBCLEN_INVALID_P): defined.
  (ONIGENC_MBCLEN_NEEDMORE): removed.
  (ONIGENC_MBCLEN_NEEDMORE_P): defined.
  (ONIGENC_MBCLEN_NEEDMORE_LEN): defined.
  (ONIGENC_MBC_ENC_LEN): use onigenc_mbclen_approximate.

* regenc.c (onigenc_mbclen_approximate): defined.

* include/ruby/encoding.h (MBCLEN_CHARFOUND): removed.
  (MBCLEN_INVALID): removed.
  (MBCLEN_NEEDMORE): removed.
  (MBCLEN_CHARFOUND_P): defined.
  (MBCLEN_INVALID_P): defined.
  (MBCLEN_NEEDMORE_P): defined.
  (MBCLEN_CHARFOUND_LEN): defined.
  (MBCLEN_NEEDMORE_LEN): defined.

* encoding.c: use new API.

* re.c: ditto.

* string.c: ditto.

* parse.y: ditto.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15280 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-27 14:27:07 +00:00
naruse
f3fe101d55 * re.c (rb_reg_source): set encoding as regexp encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15265 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-27 07:26:51 +00:00
akr
b9c18bdcdd * re.c (rb_reg_preprocess): force fixed encoding when ASCII
incompatible source string.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15260 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-26 21:01:52 +00:00
akr
1e41069754 * include/ruby/intern.h (rb_str_buf_cat_ascii): declared.
* string.c (rb_str_buf_cat_ascii): defined.

* re.c (rb_reg_s_union): use rb_str_buf_cat_ascii to support ASCII
  incompatible encoding.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15232 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-25 07:35:27 +00:00
usa
b1257d4d20 * re.c (rb_reg_fixed_encoding_p): no need to treat ASCII-8BIT specially.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15213 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-24 09:15:03 +00:00
usa
fbe52683e6 * re.c (rb_reg_initialize): 7bit clean regexp should be US-ASCII.
[ruby-dev:33346]



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15212 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-24 07:56:12 +00:00
akr
3766eac339 * re.c (rb_reg_prepare_re): fix SEGV by
/a/ =~ "aa".force_encoding("utf-16be").


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15178 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-23 04:40:43 +00:00
usa
61fd7dbf6d * re.c (rb_char_to_option_kcode): Regexp switch `s' should mean
Windows-31J, as wells as `-Ks'.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15101 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-18 00:44:15 +00:00
nobu
a0029e3adc * re.c (rb_char_to_option_kcode): fixed typo.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15085 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-17 12:48:23 +00:00
matz
d9ff499bf3 * re.c (rb_char_to_option_kcode): use rb_enc_find_index() instead
of using fixed index value.

* enc/Makefile.in (encsrcdir): make US-ASCII built-in.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15047 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-14 13:49:29 +00:00
akr
a31e2da12c * re.c (rb_reg_prepare_re): initialize error message buffer.
(rb_reg_search): ditto.
  (rb_reg_check_preprocess): ditto.
  (rb_reg_new_str): ditto.
  (rb_enc_reg_new): ditto.
  (rb_reg_compile): ditto.
  (rb_reg_initialize_m): ditto.
  (rb_reg_s_union_m): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@15034 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-14 04:51:10 +00:00
akr
238c59842c * re.c (rb_reg_preprocess): fix fixed_enc condition.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14924 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-07 04:55:26 +00:00
akr
063beac343 * encoding.c (rb_enc_internal_get_index): extracted from
rb_enc_get_index.
  (rb_enc_internal_set_index): extracted from rb_enc_associate_index

* include/ruby/encoding.h (ENCODING_SET): work over ENCODING_INLINE_MAX.
  (ENCODING_GET): ditto.
  (ENCODING_IS_ASCII8BIT): defined.
  (ENCODING_CODERANGE_SET): defined.

* re.c (rb_reg_fixed_encoding_p): use ENCODING_IS_ASCII8BIT.

* string.c (rb_enc_str_buf_cat): use ENCODING_IS_ASCII8BIT.

* parse.y (reg_fragment_setenc_gen): use ENCODING_IS_ASCII8BIT.

* marshal.c (has_ivars): use ENCODING_IS_ASCII8BIT.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14922 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-07 02:49:01 +00:00
akr
f38cc001a7 * re.c (rb_reg_initialize_str): forbid raw non ASCII character
for ASCII-8BIT regexp in non ASCII-8BIT script.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14911 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-06 12:15:48 +00:00
akr
8987b97ca9 * include/ruby/encoding.h (rb_enc_str_buf_cat): declared.
* string.c (coderange_scan): extracted from rb_enc_str_coderange.
  (rb_enc_str_coderange): use coderange_scan.
  (rb_str_shared_replace): copy encoding and coderange.
  (rb_enc_str_buf_cat): new function for linear complexity string
  accumulation with encoding.
  (rb_str_sub_bang): don't conflict substituted part and replacement.
  (str_gsub): use rb_enc_str_buf_cat.
  (rb_str_clear): clear coderange.

* re.c (rb_reg_regsub): use rb_enc_str_buf_cat.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14910 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-06 09:25:09 +00:00
akr
da42c102c1 * re.c (rb_reg_initialize_str): /\x80/n is not an error even if script
encoding is EUC-JP.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14899 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-05 16:39:38 +00:00
nobu
8638ee26e7 * include/ruby/intern.h, re.c (rb_reg_new): keep interface same as
1.8.  [ruby-core:14583]

* include/ruby/intern.h, re.c (rb_reg_new_str): renamed, and defines
  HAVE_RB_REG_NEW_STR macro to tell if it is available.

* include/ruby/encoding.h (rb_enc_reg_new): added.

* insns.def (toregexp), marshal.c (r_object0): use rb_reg_new_str().

* re.c (rb_reg_regcomp, rb_reg_s_union): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14884 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 16:30:33 +00:00
akr
f780cdec75 * re.c (rb_reg_prepare_re): check string encoding. Oniguruma doesn't
support invalid encoding.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14880 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 05:01:58 +00:00
akr
7d98c90ef2 unused variable removed.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14879 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 03:13:53 +00:00
matz
22e7258275 * re.c (rb_reg_search): avoid inner loop for reverse search.
* regexec.c: unset USE_MATCH_RANGE_MUST_BE_INSIDE_OF_SPECIFIED_RANGE
  which is turned on since oniguruma 5.9.1.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14878 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-04 01:24:12 +00:00
akr
52f9c1d2e1 * re.c (rb_reg_search): iterate onig_match for reverse mode.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14876 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2008-01-03 17:48:06 +00:00
akr
e21907e0f8 fix typos.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14810 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-31 05:52:59 +00:00
nobu
5ee7f4b0b5 * re.c (rb_reg_regsub): returns the given string itself if nothing
changed.

* string.c (rb_str_sub_bang): keeps code-range as possible.

* string.c (str_gsub): adjusts code-range.  [ruby-core:14566]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14782 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-29 13:44:32 +00:00
akr
fd640aec82 * re.c (rb_reg_s_union): show encodings in error message.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14734 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-27 07:38:23 +00:00
akr
b910bb7761 * re.c (rb_reg_prepare_re): show regexp encoding in the error message.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14597 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-24 09:38:20 +00:00
akr
5b809a28f8 * include/ruby/encoding.h, encoding.c, re.c, io.c, parse.y, numeric.c,
ruby.c, transcode.c: rename rb_ascii_encoding. to
  rb_ascii8bit_encoding.  rb_ascii_encoding is ambiguous with 
  ASCII-8BIT and US-ASCII.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14504 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-22 23:47:18 +00:00
akr
fa3d06c738 refine error message.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14475 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-22 07:14:07 +00:00
matz
d7cc14d436 * encoding.c (rb_ascii_encoding): renamed from previous
rb_default_encoding().

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14443 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 18:55:30 +00:00
matz
b36c642a85 * re.c (rb_reg_prepare_re): stop ENCODING_NONE warning if the
encoding of the str is ASCII-8BIT.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14442 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 18:21:41 +00:00
akr
b82a05989e * re.c (ARG_ENCODING_NONE): defined for /.../n option.
(REG_ENCODING_NONE): ditto.
  (rb_char_to_option_kcode): return ARG_ENCODING_NONE for n.
  (rb_reg_prepare_re): warn /ascii/n =~ "non-ascii".
  (rb_reg_initialize): set REG_ENCODING_NONE from ARG_ENCODING_NONE.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14438 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 16:39:36 +00:00
akr
e667720bd4 * re.c (append_utf8): use rb_utf8_encoding() instead of
rb_enc_find("utf-8").


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14412 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 07:07:21 +00:00
matz
668bd7d992 * test/ruby/test_system.rb (TestSystem::valid_syntax): apply
ASCII-8BIT encoding explicitly.

* re.c (rb_reg_prepare_re): add encoding name in the message.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14402 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 05:03:14 +00:00
akr
59dca19910 * re.c: change "character encodings differ" error messages.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14401 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-21 04:54:54 +00:00
matz
77629d2cbe * string.c (rb_str_rindex_m): too much adjustment.
* re.c (reg_match_pos): pos adjustment should be based on
  characters.

* test/ruby/test_m17n.rb (TestM17N::test_str_insert): test updated
  to check negative offset behavior.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14340 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-19 17:02:29 +00:00
nobu
474a88f041 * re.c (rb_reg_regsub): should set checked encoding.
* string.c (rb_str_sub_bang): applied r14212 too.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14333 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-19 12:42:19 +00:00
akr
2d01290cfd * parse.y (arg tMATCH arg): call reg_named_capture_assign_gen if regexp
literal is used.
  (reg_named_capture_assign_gen): assign the result of named capture
  into local variables.
  [ruby-dev:32588]

* re.c: document the assignment by named captures.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14297 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-18 11:26:24 +00:00
matz
ebfcc5d933 * re.c (rb_reg_initialize): raise error if non-Unicode fixed
encoding option is specified for regexp literals with \u{}
  escapes.

* string.c (rb_str_squeeze_bang): should squeeze multibyte
  characters as well.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14275 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-17 16:06:21 +00:00
matz
d6a70c4bb7 * string.c (scan_once): need no encoding compatibility check.
it's done inside of re_reg_seach().

* string.c (rb_str_split_m): ditto.

* re.c (rb_reg_regsub): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14269 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-17 09:44:06 +00:00
matz
7bb3ea6afa * re.c (rb_reg_initialize): embedded string may override encoding
of the regular expression.

* re.c (rb_reg_initialize): fix encoding of regular expression if
  embedded string has its own encoding specified.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14218 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-13 16:09:53 +00:00
matz
a648fc802b * encoding.c (rb_enc_compatible): encoding should never fall back
to ASCII-8BIT unless both encodings are ASCII-8BIT.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14217 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-13 13:44:02 +00:00
akr
b92cee1ddb * re.c, regerror.c, string.c, parse.y, ruby.c, file.c:
use capital letter for \xHH notation.  [ruby-dev:32511]



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14202 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-12 14:30:54 +00:00
nobu
ad72efa269 * re.c (rb_reg_regsub): should copy encoding.
* string.c (rb_str_sub_bang, str_gsub): should check and copy encoding
  to be replaced.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-12 03:11:44 +00:00
akr
646f27b822 * encoding.c (rb_enc_ascget): renamed from rb_enc_get_ascii.
* include/ruby/encoding.h: follow the renaming.

* re.c: ditto. 



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14195 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-11 07:39:16 +00:00
akr
5802768b40 * encoding.c (rb_enc_get_ascii): add an argument to provide the
length of the returned character.

* include/ruby/encoding.h (rb_enc_get_ascii): add the argument.

* re.c (rb_reg_expr_str): modify rb_enc_get_ascii call.
  (rb_reg_quote): ditto.
  (rb_reg_regsub): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14190 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-11 03:08:50 +00:00
matz
f6a9c859be * re.c (rb_reg_match): should calculate offset by converted
operand.  [ruby-cvs:21416]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14180 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 10:03:48 +00:00
nobu
38a24d73c8 * re.c (rb_reg_search): return byte offset. [ruby-dev:32452]
* re.c (rb_reg_match, rb_reg_match2, rb_reg_match_m): convert byte
  offset to char index.

* string.c (rb_str_index): return byte offset.  [ruby-dev:32472]

* string.c (rb_str_split_m): calculate in byte offset.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14171 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 04:50:35 +00:00
akr
f4592d7bb0 * re.c (rb_reg_expr_str): use \xHH instead of \OOO.
* regerror.c (to_ascii): ditto.
  (onig_snprintf_with_pattern): ditto.
  (onig_snprintf_with_pattern): ditto.

* string.c (rb_str_inspect): ditto.
  (rb_str_dump): ditto.

* parse.y (parser_yylex): ditto.

* ruby.c (proc_options): ditto.

* file.c (rb_f_test): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14164 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-09 21:48:05 +00:00
akr
08eb58d3dd * re.c (rb_reg_names): new method Regexp#names.
(rb_reg_named_captures): new method Regexp#named_captures
  (match_regexp): new method MatchData#regexp.
  (match_names): new method MatchData#names.

* lib/pp.rb (MatchData#pretty_print): show names of named captures.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14163 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-09 21:44:19 +00:00
akr
e56e8c758d * re.c (rb_reg_s_last_match): accept named capture's name.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14161 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-09 13:35:38 +00:00