|
Technical Report: DCC-2011-03
Study of the average size of Glushkov and Partial Derivative Automata
Sabine Broda
CMUP & DCC-FC, Universidade do Porto
e-mail: sbb@dcc.fc.up.pt
António Machiavelo
CMUP & DM-FC, Universidade do Porto
e-mail: ajmachia@fc.up.pt
Nelma Moreira, Rogério
Reis
CMUP & DCC-FC, Universidade do Porto
e-mail: {nam,rvr}@dcc.fc.up.pt
August 2011
Abstract
In this paper, the relation between the Glushkov automaton
(nfaPos) and the partial derivative automaton (nfaPd) of a
given regular expression, in terms of transition complexity, is
studied. The average transition complexity of nfaPos was
proved by Nicaud to be linear in the size of the corresponding
expression. This result was obtained using an upper bound of the
number of transitions of nfaPos. Here we present a new
quadratic construction of nfaPos that leads to a more elegant
and straightforward implementation, and that allows the exact
counting of the number of transitions. Based on that, a better
estimation of the average size is presented. Asymptotically, and as
the alphabet size grows, the number of transitions per state is on
average 2.
Broda et al. computed an upper bound for the ratio of the
number of states of nfaPd to the number of states of
nfaPos, which is about 1/2 for large alphabet
sizes. Here we show how to obtain an upper bound for the number of
transitions in nfaPd, which we then use to get an average case
approximation. Some experimental results are presented that
illustrate the quality of our estimate.
|