๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Problem Solving/Algorithm, Tips

Trie (Prefix tree)

by ํ–‰๋ฑ 2020. 6. 29.

 

Trie

 

๊ฐœ๋…

n-ary tree์˜ ๋ณ€์ข…์œผ๋กœ ๊ฐ ๋…ธ๋“œ์— ๋ฌธ์ž๋ฅผ ์ €์žฅํ•˜๋Š” ์ž๋ฃŒ๊ตฌ์กฐ์ด๋‹ค.

ํŠธ๋ฆฌ๋ฅผ ์•„๋ž˜์ชฝ์œผ๋กœ ์ˆœํšŒํ•˜๋ฉด ๋‹จ์–ด ํ•˜๋‚˜๊ฐ€ ๋‚˜์˜จ๋‹ค.

 

* ๋…ธ๋“œ

* ๋…ธ๋“œ๋Š” ๋„ ๋…ธ๋“œ๋ผ๊ณ ๋„ ๋ถˆ๋ฆฌ๋ฉฐ ๋‹จ์–ด์˜ ๋์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

TrieNode๋ฅผ ์ƒ์†ํ•œ TerminatingTrieNode๋กœ ํ‘œํ˜„ํ•  ์ˆ˜๋„ ์žˆ๊ณ 

* ๋…ธ๋“œ์˜ ๋ถ€๋ชจ ๋…ธ๋“œ ์•ˆ์— boolean flag๋ฅผ ์ƒˆ๋กœ ์ •์˜ํ•ด ๋‹จ์–ด์˜ ๋์„ ํ‘œํ˜„ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

 

์ž์‹ ๋…ธ๋“œ ๊ฐœ์ˆ˜

* ๋…ธ๋“œ๋กœ ๋‹จ์–ด์˜ ๋์„ ํ‘œํ˜„ํ•˜๋ฉด ๊ฐ ๋…ธ๋“œ๋Š” 1๊ฐœ ~ (ALPHABET_SIZE + 1)๊ฐœ๊นŒ์ง€ ์ž์‹์„ ๊ฐ–๊ณ  ์žˆ์„ ์ˆ˜ ์žˆ๋‹ค.

๋‹จ * ๋…ธ๋“œ ๋Œ€์‹  boolean flag๋กœ ํ‘œํ˜„ํ–ˆ๋‹ค๋ฉด 0๊ฐœ ~ ALPHABET_SIZE๊ฐœ๊นŒ์ง€.

 

ํ™œ์šฉ

์ ‘๋‘์‚ฌ๋ฅผ ๋น ๋ฅด๊ฒŒ ์ฐพ์•„๋ณด๊ธฐ ์œ„ํ•ด ๋ชจ๋“  ๋ฌธ์ž์—ด์„ ํŠธ๋ผ์ด์— ์ €์žฅํ•ด๋†“์„ ์ˆ˜ ์žˆ๋‹ค.

ํ•ด์‹œํ…Œ์ด๋ธ”์„ ์ด์šฉํ•˜๋ฉด ์ฃผ์–ด์ง„ ๋ฌธ์ž์—ด์˜ ์œ ํšจ์„ฑ์„ ๋น ๋ฅด๊ฒŒ ํ™•์ธํ•  ์ˆ˜ ์žˆ์ง€๋งŒ,

๊ทธ ๋ฌธ์ž์—ด์ด ์–ด๋–ค ์œ ํšจํ•œ ๋‹จ์–ด์˜ ์ ‘๋‘์‚ฌ์ธ์ง€๋Š” ํ™•์ธํ•  ์ˆ˜ ์—†๋‹ค.

 

Trie๋Š” ๊ธธ์ด๊ฐ€ K์ธ ๋ฌธ์ž์—ด์ด ์ฃผ์–ด์กŒ์„ ๋•Œ O(K) ์‹œ๊ฐ„์— ํ•ด๋‹น ๋ฌธ์ž์—ด์ด ์œ ํšจํ•œ ์ ‘๋‘์‚ฌ์ธ์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

์ด ์‹œ๊ฐ„์€ ํ•ด์‹œํ…Œ์ด๋ธ”์„ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ์™€ ์ •ํ™•ํžˆ ๊ฐ™์€ ์ˆ˜ํ–‰ ์‹œ๊ฐ„์ด๋‹ค.

(ํ”ํžˆ ํ•ด์‹œํ…Œ์ด๋ธ” ๊ฒ€์ƒ‰์ด O(1) ์‹œ๊ฐ„์— ์ˆ˜ํ–‰๋œ๋‹ค๊ณ  ํ•˜์ง€๋งŒ ์ž…๋ ฅ ๋ฌธ์ž์—ด์€ ์ „๋ถ€ ์ฝ์–ด์•ผ ํ•˜๋ฏ€๋กœ O(K)์ด๋‹ค.)

 

 

Reference: ์ฝ”๋”ฉ ์ธํ„ฐ๋ทฐ ์™„์ „๋ถ„์„ ์ฑ…

 


์™œ Trie ์ธ๊ฐ€?

1) ๊ธธ์ด๊ฐ€ K์ธ ๋ฌธ์ž์—ด์— ๋Œ€ํ•˜์—ฌ insert์™€ search๋ฅผ O(K)์— ์ˆ˜ํ–‰ํ•œ๋‹ค.

↔ BST๋Š” K๊ฐ€ ๋ฌธ์ž์—ด์˜ ๊ธธ์ด, N์ด ์ „์ฒด ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜๋ผ๊ณ  ํ•  ๋•Œ insert์™€ search๋ฅผ O(KlogN)์— ์ˆ˜ํ–‰ํ•œ๋‹ค.

↔ Hashing๊ณผ ๋‹ฌ๋ฆฌ ํ•ด์‹œ ํ•จ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•  ํ•„์š”๊ฐ€ ์—†๊ณ  Chaning์ด๋‚˜ Open addressing ๊ฐ™์€ Collsion handling๋„ ํ•„์š” ์—†๋‹ค.

 

2) (Hashing๊ณผ ๋‹ฌ๋ฆฌ) ๋ชจ๋“  ๋‹จ์–ด๋ฅผ ์•ŒํŒŒ๋ฒณ ์ˆœ์„œ๋กœ ์ถœ๋ ฅํ•˜๊ธฐ ์‰ฝ๋‹ค.

3) ํšจ์œจ์ ์œผ๋กœ prefix search๋ฅผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

 

Trie์˜ ์ด์Šˆ

Trie์˜ ๊ฐ€์žฅ ํฐ ๋‹จ์ ์€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ์•„์ฃผ ๋งŽ์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.

๊ฐ ๋…ธ๋“œ์— ๋Œ€ํ•˜์—ฌ (์•ŒํŒŒ๋ฒณ ๊ฐœ์ˆ˜๋งŒํผ) ์•„์ฃผ ๋งŽ์€ ๋…ธ๋“œ ํฌ์ธํ„ฐ๋ฅผ ๊ฐ–๊ฒŒ๋œ๋‹ค.

์ฆ‰ K๊ฐ€ ๋ฌธ์ž์—ด์˜ ๊ธธ์ด, N์ด ์ „์ฒด ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜์ผ ๋•Œ ๊ณต๊ฐ„ ๋ณต์žก๋„๊ฐ€ O(ALPHABET_SIZE * K * N) ์ด๋‹ค.

๊ณต๊ฐ„ ๋ฌธ์ œ๊ฐ€ ์ค‘์š”ํ•˜๋‹ค๋ฉด Compressed Trie, Ternary Search Tree ๋“ฑ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

--> ์ •๋ฆฌํ•˜์ž๋ฉด Trie๋Š” ๋งค์šฐ ๋น ๋ฅด์ง€๋งŒ ์—„์ฒญ ํฐ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

 

 

Reference: https://www.geeksforgeeks.org/advantages-trie-data-structure/,

https://www.geeksforgeeks.org/trie-insert-and-search/

 


๊ตฌํ˜„

struct TrieNode{
    TrieNode* children[ALPHABET_SIZE];
    bool isEndOfWord;
};

๋งจ ์œ„ ๊ทธ๋ฆผ์ฒ˜๋Ÿผ, ์กด์žฌํ•˜๋Š” child node๋งŒ ๊ทธ๋ ค๋„ฃ๋Š” ๊ฒŒ ์•„๋‹ˆ๋ผ

๊ฐ ๋…ธ๋“œ๋งˆ๋‹ค ALPHABET_SIZE๊ฐœ ๋งŒํผ์˜ TrieNode*๋ฅผ ๋ฏธ๋ฆฌ ์ค€๋น„ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

 

children์˜ ์ธ๋ฑ์Šค๋Š” ๋ช‡ ๋ฒˆ์งธ ์•ŒํŒŒ๋ฒณ์ธ์ง€๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

๊ฐ€๋ น 'a'๋ผ๋ฉด children[0], 'b'๋ผ๋ฉด children[1], 'z'๋ผ๋ฉด children[25].

 

์ฒ˜์Œ TrieNode๊ฐ€ ์ƒ์„ฑ๋˜์—ˆ์„ ๋•Œ๋Š” children[i]๊ฐ€ ๋ชจ๋‘ NULL์ด๋ผ๊ณ  ์ƒ๊ฐํ•œ๋‹ค.

ํ•ด๋‹น TrieNode ์•„๋ž˜ 'a'๊ฐ€ ์˜จ๋‹ค๋ฉด children[0]์ด NULL์ด ์•„๋‹Œ ์ƒˆ๋กœ์šด TrieNode๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋„๋ก ํ•˜๋Š” ์‹.

(์ƒˆ๋กœ์šด TrieNode๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋„๋ก ํ•˜๋Š” ๊ฒƒ์€ getNode() ๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•œ๋‹ค. - ์•„๋ž˜ ์ฝ”๋“œ๋ฅผ ์ฐธ๊ณ )

 

insert์™€ search๋Š” ๋งค์šฐ ๋น„์Šทํ•˜๋ฉฐ,

insert๋Š” getNode()๋ฅผ ์ด์šฉํ•ด ์ƒˆ๋กœ์šด ๋…ธ๋“œ๋ฅผ ์—ฐ๊ฒฐํ•ด๊ฐ€๋Š” ํ•œํŽธ search๋Š” ๊ทธ๋ƒฅ ์•„๋ž˜๋กœ ์ญ‰ ๋‚ด๋ ค๊ฐ„๋‹ค.

 

search๊ฐ€ ์ข…๋ฃŒ๋˜๋Š” ์กฐ๊ฑด์€ ๋‘ ๊ฐ€์ง€์ด๋‹ค.

1) input string์˜ ๋์— ๋‹ค๋‹ค๋ž๋‹ค.

--> ์ด ๊ฒฝ์šฐ ๋งˆ์ง€๋ง‰ ๋…ธ๋“œ์˜ isEndOfWord๊ฐ€ true์ด๋ฉด search ๊ฒฐ๊ณผ๋„ true์ด๋‹ค.

2) trie์— ํ•ด๋‹นํ•˜๋Š” key๊ฐ€ ์—†๋‹ค.

--> search ๊ฒฐ๊ณผ๋Š” false์ด๋‹ค.

 

์•„๋ž˜๋Š” Reference์˜ ์ฝ”๋“œ์—์„œ ๋ณ€์ˆ˜๋ช…, ์ฃผ์„ ๋“ฑ์„ ์ˆ˜์ •ํ•œ ์ „์ฒด ์ฝ”๋“œ ๋ฒ„์ „.

๊ฐ€๋…์„ฑ์„ ์œ„ํ•ด input string์„ ๋งจ ์œ„ ์‚ฌ์ง„์— ์žˆ๋Š” ๋‹จ์–ด๋“ค๋กœ ํ•˜์˜€๋‹ค.

 

// Ref: https://www.geeksforgeeks.org/trie-insert-and-search/

#include<iostream>
#define ALPHABET_SIZE 26
using namespace std;

struct TrieNode{
    TrieNode* children[ALPHABET_SIZE];
    bool isEndOfWord;
};

TrieNode* getNode()
{
    TrieNode* new_node = new TrieNode;
    new_node->isEndOfWord = false;

    for(int i=0; i<ALPHABET_SIZE; i++){
        new_node->children[i] = NULL;
    }

    return new_node;
}

void insert(TrieNode* root, string key)
{
    TrieNode* temp_node = root;

    for(int i=0; i<key.length(); i++){
        int index = key[i] - 'a';
        if(!temp_node->children[index])
            temp_node->children[index] = getNode();
        temp_node = temp_node->children[index];
    }

    temp_node->isEndOfWord = true;
}

bool search(TrieNode* root, string key)
{
    TrieNode* temp_node = root;

    for(int i=0; i<key.length(); i++){
        int index = key[i] - 'a';
        if(!temp_node->children[index])
            return false;
        temp_node = temp_node->children[index];
    }

    return temp_node && temp_node->isEndOfWord;
}

int main()
{
    string keys[] = {"many", "man", "my", "lie", "a"};
    int n = sizeof(keys) / sizeof(keys[0]);

    TrieNode* root = getNode();
    for(int i=0; i<n; i++)
        insert(root, keys[i]);

    if(search(root, "man")) cout << "Yes" << endl;
    else    cout << "No" << endl;

    if(search(root, "lay")) cout << "Yes" << endl;
    else    cout << "No" << endl;

    return 0;
}

 

Reference: https://www.geeksforgeeks.org/trie-insert-and-search/

๋Œ“๊ธ€