|
|
OverviewSearch a Japanese word from Shift_JIS data line. This allows to search words without converting Japanese into UTF-8 or EUC-JP.
Flow
A sample code
# Searching word
my $search_word = 'パターン';
my $search_word_org = $search_word;
# Searched word
my $string = '検索される文字列とエスケープ処理された検索文字列をパターンマッチする処理する';
$search_word =~ s/([\W])/sprintf("%%%02X", ord($1))/eg;
$search_word =~ s/%5[BCDE]/%5c$&/gi;
$search_word =~ s/%2[489B]/%5c$&/gi;
$search_word =~ s/%3F/%5c$&/gi;
$search_word =~ s/%7[BCD]/%5c$&/gi;
$search_word =~ s/[\.\*]/%5c$&/g;
$search_word =~ s/%([A-Fa-f0-9][A-Fa-f0-9])/pack("C", hex($1))/eg;
my $hit;
("$string" =~ /$search_word/) && ($hit = 1);
if ($hit) {
print "$search_word_org was found.";
} else {
print "$search_word_org was not found.";
}
Descriptoin of the codemy $search_word = 'パターン'; my $search_word_org = $search_word; Put Search word into a variable. Keep original search word into $search_word_org to be used when displaying a result. my $string = '検索される文字列とエスケープ処理された検索文字列をパターンマッチする処理する'; This is a serched sentense.
$search_word =~ s/([\W])/sprintf("%%%02X", ord($1))/eg;
Unpack the search word. At this point ``パターン'' will look like as follows. %83p%83%5E%81%5B%83%93 $search_word =~ s/%5[BCDE]/%5c$&/gi; $search_word =~ s/%2[489B]/%5c$&/gi; $search_word =~ s/%3F/%5c$&/gi; $search_word =~ s/%7[BCD]/%5c$&/gi; $search_word =~ s/[\.\*]/%5c$&/g; Espace special characters reserved by regular expression. This process actually add back-slash right before the resreved characters. Some examples are shown below.
\=%5C, (=%28, )=%29, [=%5B, ]=%5D, |=%7C
?=%3F, +=%2B, ^=%5E, $=%24, {=%7B, }=%7D
After escaping, the ``パターン'' will look like as follows. %83p%83%5c%5E%81%5c%5B%83%93
$search_word =~ s/%([A-Fa-f0-9][A-Fa-f0-9])/pack("C", hex($1))/eg;
The escaped words to pack again.
my $hit;
("$string" =~ /$search_word/) && ($hit = 1);
When the seach word matchs to the sentense, set 1 to $hit.
if ($hit) {
print "$search_word_org was found.";
} else {
print "$search_word_org was not found.";
}
Display result based on $hit value. $search_ord_org is used in here. |