Fix UTF-8 bug in NSString_RegEx

This class would use the location information provided by
regex(3) as range for for a substring. However, the information
regex(3) returns is a byte-based, while NSString works on characters.

This can cause a problem when there are UTF-8 characters in the string,
as the wrong subsstring will be returned.

This is fixed by taking the UTF bytesequence, and extracting a substring
from that, rather than using NSString's own substring method
This commit is contained in:
Pieter de Bie
2009-09-14 13:02:36 +02:00
parent 4544816ac8
commit 3324591e6c
+3 -1
View File
@@ -57,7 +57,9 @@
break;
NSRange range = NSMakeRange(pmatch[i].rm_so, pmatch[i].rm_eo - pmatch[i].rm_so);
NSString * substring = [self substringWithRange:range];
NSString * substring = [[[NSString alloc] initWithBytes:[self UTF8String] + range.location
length:range.length
encoding:NSUTF8StringEncoding] autorelease];
[outMatches addObject:substring];
if (ranges)