:: wikimiki.org ::
| Acyclic Deterministic Finite Automata |
Acyclic deterministic finite automataAcyclic deterministic finite automata (ADFA) are deterministic finite automata without cycles. In other words, they can only represent finite sets of strings. They can be used as a data structure for word storage with extremely fast search performance. Minimized ADFA can be very compact as well. The size of a minimized ADFA does not directly depend on the number of keys stored. In fact, after a certain point, as more words are stored in a minimized ADFA, its size can begin to decrease. Its size would actually appear to be related to how complex the set of strings is. A trie is a type of ADFA.
See also
- Trie
- Deterministic finite automata
Category:Computational models
ja:決定性有限オートマトン
Deterministic finite automata
In the theory of computation, a deterministic finite state machine or deterministic finite automaton (DFA) is a finite state machine where for each pair of state and input symbol there is one and only one transition to a next state. DFAs recognize the set of regular languages and no other languages.
A DFA will take in a string of input symbols. For each input symbol it will then transition to a state given by following a transition function. When the last input symbol has received it will either accept or reject the string depending on if it is in an accepting state or not.
Formal definition
A DFA is a 5-tuple,
(S, Σ, T, s, A), consisting of
- a finite set of states (S)
- a finite set called the alphabet (Σ)
- a transition function (T : S × Σ → S)
- a start state (s ∈ S)
- a set of accept states (A ⊆ S)
Let M be a DFA such that M = (S, Σ, T, s, A), and X = x0x1 ... xn be a string over the alphabet Σ. M accepts the string X if a sequence of states,
r0,r1, ..., rn, exists in S with the following conditions:
# r0 = s
# ri+1 = T(ri, xi), for i = 0, ..., n-1
# rn ∈ A.
As shown in the first condition, the machine starts in the start state s.
The second condition says that given each character of string X, the machine will transition from state to state as ruled by the transition function T.
The last condition says that the machine accepts if the last input of X causes the machine to be in one of the accepting states. Otherwise, it is said to reject the string. The set of strings it accepts form a language, which is the language the DFA recognises.
Example
The following example is of a DFA M, with a binary alphabet, which determines if the input contains an even number of 0s.
M = (S, Σ, T, s, A) where
- Σ = ,
- S = ,
- s = S1,
- A = , and
- T is defined by the following state transition table:
The state diagram for M is:
:Image:DFAexample.png
Simply put, the state S1 represents that there has been an even number of 0s in the input so far, while S2 signifies an odd number. A 1 in the input does not change the state of the automaton. When the input ends, the state will show whether the input contained an even number of 0s or not.
The language of M can be described by the regular language given by this regular expression:
:
Advantages and disadvantages
DFAs are one of the most practical models of computation, since there is a trivial linear time, constant-space, online algorithm to simulate a DFA on a stream of input. Given two DFAs there are efficient algorithms to find a DFA recognizing the union, intersection, and complements of the languages they recognize. There are also efficient algorithms to determine whether a DFA accepts any strings, whether a DFA accepts all strings, whether two DFAs recognize the same language, and to find the DFA with a minimum number of states for a particular regular language.
On the other hand, DFAs are of strictly limited power in the languages they can recognize — many simple languages, including any problem that requires more than constant space to solve, cannot be recognized by a DFA.
References
- Michael Sipser. Introduction to the Theory of Computation. PWS, Boston. 1997. ISBN 053494728X. Section 1.1: Finite Automata, pp.31–47. Subsection "Decidable Problems Concerning Regular Languages" of section 4.1: Decidable Languages, pp.152–155.
See also
- Acyclic deterministic finite automata
- Nondeterministic finite state machine
- Turing machine
Category:Computational models
ja:決定性有限オートマトン
Trie
In computer science, a trie is an ordered tree data structure that is used to store an associative array where the keys are strings. Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree shows what key it is associated with. All the descendants of any one node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that happen to correspond to keys of interest.
The term trie comes from "retrieval". Due to its etymology some sources say it should be pronounced as "tree", while others encourage the use of "try" in order to distinguish it from the more general tree.
In the shown example, keys are listed in the nodes and values below them. Each complete English word has an integer value associated with it. A trie can be seen as a deterministic finite automaton, although the symbol on each edge is often implicit in the order of the branches.
Note that it is not necessary for keys to be explicitly stored in nodes. (In the figure, words are shown only to illustrate how the trie works).
Advantages and drawbacks
The following are the main advantages of tries over binary search trees (BSTs):
- Looking up keys is faster. Looking up a key of length m takes worst case O(m) = O(1) time. BST take O(lg n) time because lookups depend on the depth of the tree, which is logarithmic in the number of keys. Also, the simple operations tries use during lookup, such as array indexing using a character, are fast on real machines.
- Tries can require less space when they contain a large number of short strings, because the keys are not stored explicitly and nodes are shared between keys.
- Tries help with longest-prefix matching, where we wish to find the key sharing the longest possible prefix with a given key, efficient. They also allow one to associate a value with an entire group of keys that have a common prefix.
- There is no need to keep a trie balanced, which for BSTs typically involves a great deal of complexity (see self-balancing binary search tree).
Its disadvantages are the following:
- Tries can give an ordering of the keys, but it must correspond to some lexicographic ordering.
- Tries can be considerably larger in certain circumstances, such as a trie containing a small number of very long strings (Patricia tries help to deal with this).
- Trie algorithms are more complex than simple BSTs.
- It is not always easy to represent data as strings, e.g. complex objects or floating-point numbers.
Although it seems restrictive to say a trie's key type must be a string, many common data types can be seen as strings; for example, an integer can be seen as a string of bits. Integers with common bit prefixes occur as map keys in many applications such as routing tables and address translation tables.
Tries are most useful when the keys are of varying lengths and we expect some key lookups to fail, because the key is not present. If we have fixed-length keys, and expect all lookups to succeed, then we can improve key lookup by combining every node with a single child (such as "i" and "in" above) with its child, producing a Patricia trie. This saves space in maps where long paths down the trie do not have branches fanning out, for example in maps where many keys have a long common prefix or where many portions of keys are composed of characters all unique.
Clarification about performance
It is acceptable to consider trie search time O(1). However, this is not entirely correct because it assumes that the length of the keys is constant. Given N distinct keys, the lower bound of the length of the longest key is actually logkN where k is the size of the alphabet. It can therefore be demonstrated that trie search time is O(log N) strictly speaking, which would appear to be the same as that of BST search.
This observation nonetheless does not take away from the benefits of tries, because the real advantage of tries is that they make each comparison operation cheaper: in a BST, we are performing string comparisons, which are O(k) in the worst case, while in a trie we are comparing single characters in constant time. This is not merely a theoretical difference in this case, because as we descend close to the leaves of the BST, the strings we compare will often have long common prefixes, causing string comparisons to be slow in practice. Therefore, BST and binary search time is actually O(log2N).
A similar argument applies to radix sort, which sorts bitstrings of length k in O(kn) time; because sorting is applied more often to small values than large values, this factor of k is often neglected.
Applications
As replacement of other data structures
As mentioned, a trie has a number of advantages over binary search trees. A trie can also be used to replace a hash table, over which it has the following advantages:
- Average lookup speed is theoretically the same, but a trie is faster in practice.
- Worst-case lookup speed in hash tables is O(N).
- There are no key collisions.
- Buckets are only necessary if a single key is mapped to more than one value.
- There is no need to provide a hash function.
- A trie can provide an alphabetical ordering of the entries by key.
Tries do have some drawbacks as well:
- It is not easy to represent all keys as strings.
- They are frequently less space-efficient than hash tables.
- Unlike hash tables, they are generally not already available in programming language toolkits.
Dictionary representation
A seemingly obvious application of tries is that of storing dictionary words. Tries allow for very fast word lookup, insertion and deletion. Words also share nodes, so one might expect some space savings (which is not the case in practice due to space required by nodes). They are also well suited for the implementation of approximate matching algorithms in spell checking software, for example. If only the dictionary words need to be stored, however, without any additional information required to be stored along with each word, minimal acyclic deterministic finite automata are much more compact than tries.
Sorting
Lexicographic sorting of a set of keys can be accomplished with a simple trie-based algorithm as follows:
- Insert all keys in a trie.
- Output all keys in the trie by means of pre-order traversal.
Theoretically this algorithm is just as fast as radix sort, but in practice it is likely slower due to the need to allocate tree nodes.
A parallel algorithm for sorting N keys based on tries is O(1) if there are N processors.
Full text search
A special kind of trie, called a suffix tree, can be used to index all suffixes in a text in order to carry out fast full text searches.
See also
- Acyclic deterministic finite automata
- Deterministic finite automata
- Judy array
- Search algorithm
External links
- [http://www.nist.gov/dads/HTML/trie.html NIST's Dictionary of Algorithms and Data Structures: Trie]
- [http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/Trie/ Tries] by Lloyd Allison
- [http://linux.thai.net/~thep/datrie/datrie.html An Implementation of Double-Array Trie]
- [http://tom.biodome.org/briandais.html de la Briandais Tree]
References
- Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Section 6.3: Digital Searching, pp.492–512.
Category:Trees (structure)
Category:Computational modelsThe category of Computational Models lists abstract models for investigating computing machines. Standard computational models assume discrete time paradigm.
Category:Theory of computation
ja:計算モデル Gary GoodridgeGary Goodridge,January 17,1966 is a Trinidad and Tobago kickboxer.
See also
- List of male kickboxers
Goodridge, Gary
gry strategiczne katalog cheap tickets prace magisterskie tablice
|
|
|
| :: RELATED NEWS :: |
Wikipedia:Votes for deletion/Fip
This page is an archive of the discussion about the proposed deletion of the article below . This page is no longer live. Further comments should be made on the article's talk page rather than here so that this page is preserved as an historic record. The result of the debate was unanimous delete. Mgm|
| |