1823 Commits

Author SHA1 Message Date
matz
66bae8ad6d * string.c (str_gsub): should copy encoding to the result.
* sprintf.c (rb_str_format): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14212 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-13 09:05:49 +00:00
matz
fb14b7eb05 * string.c (rb_str_split_m): need not to check encoding if regexp
is empty.

* string.c (rb_str_justify): associate encoding of original to the
  result. 

* string.c (rb_str_chomp_bang): need to check encoding of record
  separator.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14211 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-13 08:28:40 +00:00
akr
b92cee1ddb * re.c, regerror.c, string.c, parse.y, ruby.c, file.c:
use capital letter for \xHH notation.  [ruby-dev:32511]



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14202 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-12 14:30:54 +00:00
nobu
ad72efa269 * re.c (rb_reg_regsub): should copy encoding.
* string.c (rb_str_sub_bang, str_gsub): should check and copy encoding
  to be replaced.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14197 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-12 03:11:44 +00:00
nobu
3a3bda73dd * string.c (rb_str_tmp_new): creates hidden temporary buffer.
* transcode.c (transcoding): added a pointer to function to flush.

* transcode.c (transcode_loop): do not use string internal.
  [ruby-dev:32512]

* transcode.c (str_transcode): allow Encoding objects.

* transcode_data.h (BYTE_LOOKUP): use actual struct name.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14176 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 08:46:06 +00:00
nobu
c26e32ec16 * string.c (rb_str_insert): should not add length in bytes to index in
chars.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14174 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 07:52:19 +00:00
matz
7ded13f54b * transcode.c: new file to provide encoding conversion features.
code contributed by Martin Duerst.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14172 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 05:01:47 +00:00
nobu
38a24d73c8 * re.c (rb_reg_search): return byte offset. [ruby-dev:32452]
* re.c (rb_reg_match, rb_reg_match2, rb_reg_match_m): convert byte
  offset to char index.

* string.c (rb_str_index): return byte offset.  [ruby-dev:32472]

* string.c (rb_str_split_m): calculate in byte offset.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14171 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-10 04:50:35 +00:00
akr
f4592d7bb0 * re.c (rb_reg_expr_str): use \xHH instead of \OOO.
* regerror.c (to_ascii): ditto.
  (onig_snprintf_with_pattern): ditto.
  (onig_snprintf_with_pattern): ditto.

* string.c (rb_str_inspect): ditto.
  (rb_str_dump): ditto.

* parse.y (parser_yylex): ditto.

* ruby.c (proc_options): ditto.

* file.c (rb_f_test): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14164 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-09 21:48:05 +00:00
nobu
45fa4a4b63 * string.c (tr_find): returns true if no characters to be removed is
specified.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14151 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-09 03:12:25 +00:00
nobu
54c146d2c3 * string.c (tr_trans): get rid of segfaults when has mulitbytes but
source sets have no mulitbytes.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14148 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-09 02:29:24 +00:00
akr
f1b7e60cb9 * encoding.c (rb_enc_mbclen): make it never fail.
(rb_enc_nth): don't check the return value of rb_enc_mbclen.
  (rb_enc_strlen): ditto.
  (rb_enc_precise_mbclen): return needmore(1) if e <= p.
  (rb_enc_get_ascii): new function for extracting ASCII character.

* include/ruby/encoding.h (rb_enc_get_ascii): declared.

* include/ruby/regex.h (ismbchar): removed.

* re.c (rb_reg_expr_str): use rb_enc_get_ascii.
  (unescape_escaped_nonascii): use rb_enc_precise_mbclen to determine
  the termination of escaped non-ASCII character.
  (unescape_nonascii): use rb_enc_precise_mbclen.
  (rb_reg_quote): use rb_enc_get_ascii.
  (rb_reg_regsub): use rb_enc_get_ascii.

* string.c (rb_str_reverse) don't check the return value of
  rb_enc_mbclen.
  (rb_str_split_m): don't call rb_enc_mbclen with e <= p.

* parse.y (is_identchar): use ISASCII.
  (parser_ismbchar): removed.
  (parser_precise_mbclen): new macro.
  (parser_isascii): new macro.
  (parser_tokadd_mbchar): use parser_precise_mbclen to check invalid
  character precisely.
  (parser_tokadd_string): use parser_isascii.
  (parser_yylex): ditto.
  (is_special_global_name): don't call is_identchar with e <= p.
  (rb_enc_symname_p): ditto.

  [ruby-dev:32455]

* ext/tk/sample/tkextlib/vu/canvSticker2.rb: remove coding cookie
  because the encoding is not UTF-8.  [ruby-dev:32475]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14131 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-08 02:50:43 +00:00
akr
69406aad50 * encoding.c (rb_enc_precise_mbclen): new function for mbclen with
validation.

* include/ruby/encoding.h (rb_enc_precise_mbclen): declared.
  (MBCLEN_CHARFOUND): new macro.
  (MBCLEN_INVALID): new macro.
  (MBCLEN_NEEDMORE): new macro.

* include/ruby/oniguruma.h (OnigEncodingTypeST): replace mbc_enc_len
  by precise_mbc_enc_len.
  (ONIGENC_PRECISE_MBC_ENC_LEN): new macro.
  (ONIGENC_CONSTRUCT_MBCLEN_CHARFOUND): new macro.
  (ONIGENC_CONSTRUCT_MBCLEN_INVALID): new macro.
  (ONIGENC_CONSTRUCT_MBCLEN_NEEDMORE): new macro.
  (ONIGENC_MBCLEN_CHARFOUND): new macro.
  (ONIGENC_MBCLEN_INVALID): new macro.
  (ONIGENC_MBCLEN_NEEDMORE): new macro.
  (ONIGENC_MBC_ENC_LEN): use ONIGENC_PRECISE_MBC_ENC_LEN.

* enc/euc_jp.c: validation implemented.

* enc/sjis.c: ditto.

* enc/utf8.c: ditto.

* string.c (rb_str_inspect): use rb_enc_precise_mbclen for invalid
  encoding.
  (rb_str_valid_encoding_p): new method String#valid_encoding?.

* io.c (rb_io_getc): use rb_enc_precise_mbclen.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14119 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-12-06 09:28:26 +00:00
akr
f5ee0fd521 * include/ruby/encoding.h, encoding.c, re.c, string.c, parse.y:
rename ENC_CODERANGE_SINGLE to ENC_CODERANGE_7BIT.
  rename ENC_CODERANGE_MULTI to ENC_CODERANGE_8BIT.
  Because single byte 8bit character, such as Shift_JIS 1byte katakana,
  is represented by ENC_CODERANGE_MULTI even if it is not multi byte.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14027 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-27 02:21:17 +00:00
akr
b2e60b2ce7 * include/ruby/encoding.h (rb_enc_str_asciionly_p): declared.
(rb_enc_str_asciicompat_p): defined.

* re.c (rb_reg_initialize_str): use rb_enc_str_asciionly_p.
  (rb_reg_quote): return ascii-8bit string if the argument is
  ascii-only to generate encoding generic regexp if possible.
  (rb_reg_s_union): fix encoding handling.  [ruby-dev:32094]

* string.c (rb_enc_str_asciionly_p): defined.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@14013 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-25 13:25:34 +00:00
ko1
25c0cb981a * include/ruby/ruby.h: introduce 2 macros:
RFLOAT_VALUE(v), DOUBLE2NUM(dbl).
  Rename RFloat#value -> RFloat#double_value.
  Do not touch RFloat#double_value directly.
* bignum.c, insns.def, marshal.c, math.c, numeric.c, object.c,
  pack.c, parse.y, process.c, random.c, sprintf.c, string.c,
  time.c: apply above changes.
* ext/dl/mkcallback.rb, ext/json/ext/generator/generator.c:
  ditto.



git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13913 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-13 16:00:53 +00:00
akr
098a563935 * string.c (tr_trans): cast to unsigned char after dereference
a pointer to a char to avoid SEGV with "\377".tr("a", "b").
  on FreeBSD/amd64.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13872 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-10 14:53:26 +00:00
nobu
03d68e3ba4 * string.c (rb_str_squeeze_bang): initialize squeezing table if no
arguments given.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13851 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-09 06:20:15 +00:00
davidflanagan
f6fb4b3e8e * string.c (tr_setup_table, tr_trans): fix test failures in test/ruby/test_string.rb
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13834 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-07 22:07:13 +00:00
matz
19c4d26c51 * string.c (tr_setup_table): use C array for characters that fit
in a byte to gain performance.

* string.c (rb_str_delete_bang): ditto.

* string.c (rb_str_squeeze_bang): ditto.

* string.c (rb_str_count): ditto.

* string.c (tr_trans): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13812 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-11-03 19:04:53 +00:00
nobu
2ae60f1634 * string.c (rb_str_substr): perfomance improvement. [ruby-dev:31806]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13791 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-29 08:58:17 +00:00
nobu
b1d53d10e2 * string.c (rb_str_ord): use encoding.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13726 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-16 17:50:51 +00:00
nobu
eb6e9c15bd * string.c (rb_str_new4): should copy encoding. a patch from NARUSE,
Yui <naruse AT airemix.com>.  [ruby-dev:32076]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13714 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-16 04:25:47 +00:00
nobu
9c24fed339 * encoding.c (rb_cEncoding): new Encoding class.
* encoding.c (rb_to_encoding, rb_to_encoding_index): helper functions.

* encoding.c (rb_obj_encoding): return Encoding object now.

* gc.c (garbage_collect): mark Encoding objects.

* string.c (rb_str_force_encoding): accept Encoding object as well as
  encoding name.

* include/ruby/encoding.h (rb_to_encoding_index, rb_to_encoding):
  prototypes.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13692 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-13 16:32:40 +00:00
nobu
8c19ecbeeb * string.c (rb_enc_str_coderange): fixed checkfor non-ascii.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13669 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-10 05:31:08 +00:00
matz
026568c915 * string.c (rb_str_to_i): update RDoc since base can be any value
between 2 and 36.  [ruby-talk:272879]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13645 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-06 14:17:35 +00:00
nobu
597974c21f * encoding.c (rb_enc_register): returns new index or -1 if failed.
* encoding.c (rb_enc_alias): check if original name is registered.

* encoding.c (rb_enc_init): register in same order as kcode options in
  re.c.  added new aliases.

* string.c (rb_str_force_encoding): check if valid encoding name.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13643 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-06 05:56:09 +00:00
nobu
6845578c92 * insns.def (opt_eq): get rid of gcc bug.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13641 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-06 05:32:37 +00:00
matz
c953283d7e revert rb_memcmp() change to pacify GCC optimizer
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13623 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-04 09:54:53 +00:00
matz
1677425e9d * re.c (rb_memcmp): no longer useful without ruby_ignorecase.
* re.c (rb_reg_prepare_re): revert recompile condition.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13622 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-04 09:24:00 +00:00
matz
dbcc539602 * re.c (ignorecase_setter): change warning message.
* re.c (ignorecase_getter): now gives warning.

* string.c (rb_str_cmp_m): update RDoc document.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13620 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-04 08:09:06 +00:00
nobu
19dee8af57 * encoding.c (rb_obj_encoding): returns encoding of the given object.
* re.c (Init_Regexp): new method Regexp#encoding.

* string.c (str_encoding): moved to encoding.c


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13613 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-10-04 06:57:19 +00:00
nobu
eb64f5de06 * string.c (rb_str_append): always set encoding, and coderange
cache bits.

* include/ruby/encoding.h (ENC_CODERANGE_SET): fixed a bug not to
  set chache bits.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13578 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-30 08:13:28 +00:00
matz
43c4d80930 * array.c (rb_ary_combination): new method to give all combination
of elements from an array.  [ruby-list:42671]

* array.c (rb_ary_product): a new method to get all combinations
  of elements from two arrays.  can be extended to combinations of
  n-arrays, e.g. a.product(b,c,d).  anyone volunteer?

* array.c (rb_ary_permutation): empty function body to calculate
  permutations of array elements.  need volunteer.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13568 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-29 08:43:59 +00:00
nobu
c351afc372 * encoding.c (rb_enc_alias): allow encodings multiple aliases.
* encoding.c (rb_enc_find_index): search the encoding which has the
  given name and return its index if found, or -1.

* st.c (type_strcasehash): case-insensitive string hash type.

* string.c (rb_str_force_encoding): force encoding of self.  this name
  comes from [ruby-dev:31894] by Martin Duerst.  [ruby-dev:31744]

* include/ruby/encoding.h (rb_enc_find_index, rb_enc_associate_index):
  prototyped.

* include/ruby/encoding.h (rb_enc_isctype): direct interface to ctype.

* include/ruby/st.h (st_init_strcasetable): prototyped.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13556 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-28 19:27:10 +00:00
matz
335fe1ee7b * string.c (rb_str_comparable): need not to check asciicompat here.
* encoding.c (rb_enc_check): ditto.

* string.c (rb_enc_str_coderange): tuned a bit; no broken check.

* encoding.c (rb_enc_check): new encoding comparison criteria.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13547 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-28 09:07:02 +00:00
nobu
19d8793ef4 * string.c (rb_str_associate_encoding): commit miss.
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13530 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-26 19:51:00 +00:00
nobu
e41b84895e * encoding.c (rb_enc_associate_index): deal with ASCII compatible
flags.

* encoding.c (rb_enc_check): allow ASCII compatible strings.

* parse.y (rb_intern_str): use ASCII encoding for ASCII string.

* string.c (rb_enc_str_coderange): check for code-range.

* string.c (rb_str_modify): clear code-range flags.

* string.c (rb_str_hash, rb_str_eql): ASCII compatible strings are
  comparable.

* include/ruby/encoding.h: added code-range flags.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13529 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-26 19:46:58 +00:00
nobu
94a0db11e7 * encoding.c (rb_enc_check): check for ASCII-compatibilities.
* parse.y (parser_tokadd_string, parser_parse_string,
  parser_here_document, parser_yylex): set encoding to US-ASCII.

* parse.y (rb_enc_symname_p): check if valid with encoding.

* parse.y (rb_intern3): let symbols have encoding.

* string.c (rb_str_hash): add encoding index.

* string.c (rb_str_comparable, rb_str_equal, rb_str_eql): check if
  compatible encoding.

* string.c (sym_inspect): made encoding aware.

* insns.def (opt_eq): compare with encoding.

* include/ruby/encoding.h (rb_enc_asciicompat): check if ASCII
  compatible.

* include/ruby/encoding.h (rb_enc_get_index): added prototype.

* include/ruby/intern.h (rb_str_comparable, rb_str_equal): ditto.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13518 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-26 09:39:08 +00:00
matz
5376745fb6 * re.c (rb_reg_match_m): evaluate a block if match. it would make
condition statement much shorter, if no else clause is needed.

* string.c (rb_str_match_m): ditto.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13475 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-20 17:14:01 +00:00
kou
740e1b66c8 * string.c (rb_str_rstrip_bang): fixed too much rstrip. [ruby-dev:31786]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13449 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-15 09:26:06 +00:00
nobu
26adfc185f * encoding.c (rb_enc_associate_index, rb_enc_get_index): check if
object is encoding capable.  [ruby-dev:31780]

* string.c (rb_str_subpat_set): check for if the argument is a String.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13447 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-15 08:04:10 +00:00
matz
edd7c787ad * array.c (rb_ary_cycle): typo in rdoc. a patch from Yugui
<yugui@yugui.sakura.ne.jp>.  [ruby-dev:31748]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13348 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-06 12:33:45 +00:00
nobu
629b1e4324 * string.c (rb_str_succ, rb_str_chop_bang, rb_str_chop): m17n support.
[ruby-dev:31734]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13347 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-06 03:42:12 +00:00
matz
c349959778 * string.c (rb_str_splice): integer overflow for length.
[ruby-dev:31739]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13342 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-09-05 13:06:01 +00:00
nobu
b58f080349 * string.c (tr_trans, rb_str_squeeze_bang, rb_str_split_m): suppress
warnings.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13315 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-08-30 02:44:20 +00:00
matz
3d7f8c2320 * string.c (str_gsub): should not use mbclen2() which has broken API.
* re.c: remove rb_reg_mbclen2().

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13308 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-08-29 19:16:02 +00:00
matz
51b4cc11d1 * string.c (rb_str_subseq): retrieve substring based on byte offset.
* string.c (rb_str_rindex_m): was confusing character offset and
  byte offset.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13295 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-08-28 06:45:32 +00:00
nobu
53635dea12 * string.c (rb_str_splice_0): should check to modify. [ruby-dev:31665]
git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13293 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-08-28 05:23:33 +00:00
matz
51fb5511e0 * string.c (rb_str_each_line): should swallow sequence of newlines
if rs (optional argument) is an empty string.  [ruby-dev:31652]

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@13289 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
2007-08-27 14:32:15 +00:00