문자열 비교 시 악센트가 있는 문자 무시

programing

문자열 비교 시 악센트가 있는 문자 무시

bestprogram 2023. 10. 24. 21:28

문자열 비교 시 악센트가 있는 문자 무시

C#의 문자열 두 개를 비교해서 악센트가 없는 글자와 동일하게 취급해야 합니다.예를 들어,

string s1 = "hello";
string s2 = "héllo";

s1.Equals(s2, StringComparison.InvariantCultureIgnoreCase);
s1.Equals(s2, StringComparison.OrdinalIgnoreCase);

이 두 문자열은 (애플리케이션에 관한 한) 동일해야 하지만, 이 두 문장 모두 거짓으로 평가됩니다.C#에서 이것을 할 수 있는 방법이 있습니까?

FWIW, knightfor의 아래 답변(이 글을 쓰는 현재)이 수용된 답변이어야 합니다.

문자열에서 분음 부호를 제거하는 기능은 다음과 같습니다.

static string RemoveDiacritics(string text)
{
  string formD = text.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  foreach (char ch in formD)
  {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(ch);
    if (uc != UnicodeCategory.NonSpacingMark)
    {
      sb.Append(ch);
    }
  }

  return sb.ToString().Normalize(NormalizationForm.FormC);
}

MichKap의 블로그(RIP...)에 대한 자세한 내용

원리는 '에'를 2개의 연속문자 '에'로 바꾼다는 것입니다.그런 다음 문자를 반복하고 분음 부호를 생략합니다.

"hello"는 "he<acute>lo"가 되고, 다시 "hello"가 됩니다.

Debug.Assert("hello"==RemoveDiacritics("héllo"));

참고: 여기 좀 더 콤팩트한 것이 있습니다.동일한 기능의 NET4+ 친근한 버전:

static string RemoveDiacritics(string text)
{
  return string.Concat( 
      text.Normalize(NormalizationForm.FormD)
      .Where(ch => CharUnicodeInfo.GetUnicodeCategory(ch)!=
                                    UnicodeCategory.NonSpacingMark)
    ).Normalize(NormalizationForm.FormC);
}

문자열을 변환할 필요가 없고 동일성을 확인하려는 경우 사용할 수 있습니다.

string s1 = "hello";
string s2 = "héllo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace) == 0)
{
    // both strings are equal
}

비교가 대소문자 구분 없이 수행되도록 하려면

string s1 = "HEllO";
string s2 = "héLLo";

if (String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase) == 0)
{
    // both strings are equal
}

저는 비슷한 작업을 해야 했지만 StartsWith 방법을 사용했습니다.여기 @Serge - appTranslator에서 파생된 간단한 솔루션이 있습니다.

확장 방법은 다음과 같습니다.

    public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
    {
        if (str.Length >= value.Length)
            return string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
        else
            return false;            
    }

그리고 하나의 라이너에 대해서는 기괴합니다 ;)

    public static bool StartsWith(this string str, string value, CultureInfo culture, CompareOptions options)
    {
        return str.Length >= value.Length && string.Compare(str.Substring(0, value.Length), value, culture, options) == 0;
    }

악센트는 둔감하고 대소문자는 둔감 시작With는 이렇게 부를 수 있습니다.

value.ToString().StartsWith(str, CultureInfo.InvariantCulture, CompareOptions.IgnoreNonSpace | CompareOptions.IgnoreCase)

다음 방법CompareIgnoreAccents(...)는 예제 데이터에서 작동합니다.제가 배경 정보를 얻은 기사는 다음과 같습니다. http://www.codeproject.com/KB/cs/EncodingAccents.aspx

private static bool CompareIgnoreAccents(string s1, string s2)
{
    return string.Compare(
        RemoveAccents(s1), RemoveAccents(s2), StringComparison.InvariantCultureIgnoreCase) == 0;
}

private static string RemoveAccents(string s)
{
    Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");

    return destEncoding.GetString(
        Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}

확장 방법이 더 좋을 것 같습니다.

public static string RemoveAccents(this string s)
{
    Encoding destEncoding = Encoding.GetEncoding("iso-8859-8");

    return destEncoding.GetString(
        Encoding.Convert(Encoding.UTF8, destEncoding, Encoding.UTF8.GetBytes(s)));
}

그렇다면 다음과 같은 용도가 사용됩니다.

if(string.Compare(s1.RemoveAccents(), s2.RemoveAccents(), true) == 0) {
   ...

악센트를 없애는 보다 간단한 방법:

    Dim source As String = "áéíóúç"
    Dim result As String

    Dim bytes As Byte() = Encoding.GetEncoding("Cyrillic").GetBytes(source)
    result = Encoding.ASCII.GetString(bytes)

String에 이 오버로드를 시도해 봅니다.방법 비교.

String.비교 메서드(String, String, Boolean, CultureInfo)

culture info를 포함한 비교 연산을 기반으로 int 값을 산출합니다.페이지의 예는 en-US와 en-CZ의 "Change"를 비교합니다.중국어-CZ는 하나의 "글자"입니다.

링크에서 예제

using System;
using System.Globalization;

class Sample {
    public static void Main() {
    String str1 = "change";
    String str2 = "dollar";
    String relation = null;

    relation = symbol( String.Compare(str1, str2, false, new CultureInfo("en-US")) );
    Console.WriteLine("For en-US: {0} {1} {2}", str1, relation, str2);

    relation = symbol( String.Compare(str1, str2, false, new CultureInfo("cs-CZ")) );
    Console.WriteLine("For cs-CZ: {0} {1} {2}", str1, relation, str2);
    }

    private static String symbol(int r) {
    String s = "=";
    if      (r < 0) s = "<";
    else if (r > 0) s = ">";
    return s;
    }
}
/*
This example produces the following results.
For en-US: change < dollar
For cs-CZ: change > dollar
*/

그러므로 억양이 있는 언어들에 대해서 당신은 그 문화를 얻고 그것에 기초하여 문자열을 테스트할 필요가 있을 것입니다.

http://msdn.microsoft.com/en-us/library/hyxc48dt.aspx

언급URL : https://stackoverflow.com/questions/359827/ignoring-accented-letters-in-string-comparison

'programing' 카테고리의 다른 글

노드 MySQL 이스케이프 LIKE 문 (0)	2023.10.24
맨 왼쪽 테이블의 모든 행에 대해 맨 오른쪽 테이블에서 하나의 행만 반환 (0)	2023.10.24
CSS의 비율에 따라 색상을 더 밝게 또는 더 어둡게 동적으로 변경 (0)	2023.10.24
Visual Studio에서 디버깅하는 동안 Chrome 브라우저가 .css 파일을 강제로 다시 로드하도록 하는 방법은 무엇입니까? (0)	2023.10.24
워드프레스 MYSQL 오류 (0)	2023.10.24

현재글문자열 비교 시 악센트가 있는 문자 무시

각종 프로그래밍 정보를 다루는 블로그입니다.

mongodb, ASP.NET, Oracle, git, mariadb, c, Android, angularJS, Wordpress, Angular, jquery, json, python, ajax, Excel, MYSQL, reactjs, bash, spring-boot, sql-server,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

bestprogram

문자열 비교 시 악센트가 있는 문자 무시

문자열 비교 시 악센트가 있는 문자 무시

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

문자열 비교 시 악센트가 있는 문자 무시

문자열 비교 시 악센트가 있는 문자 무시

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바