##### Document Text Contents

Page 2

Architectural Design of

Multi-Agent Systems:

Technologies and Techniques

Hong Lin

University of Houston – Downtown, USA

Hershey • New York

INFORMAT ION SC IENCE REFERENCE

Page 221

�00

MAITS

can avoid this limitation of the HMM. Moreover,

the HMM is trained using maximum likelihood

estimation (MLE). The MLE method assumes the

data sets have a statistical distribution. However,

the statistical distribution of biometric or network

data is unknown. To avoid these limitations of

the current HMM, we use the temporal Markov

modeling (TMM) method presented a later sec-

tion. The TMM has an observed Markov chain

where cluster indices are states to represent the

dependence of each observation on its predeces-

sor. The hidden Markov chain in the HMM is

still employed in the TMM (Tran, 2004b; Tran

& Pham, 2005).

We train the TMM using the quasi-likeli-

hood estimation (QLE) method. The reason is

that computer network data and biometric data

do not have a statistical distribution, therefore

the current MLE method is not appropriate. The

QLE method requires only assumptions on the

mean and variance functions rather than the full

likelihood.

The method used to determine the similarity

score S is usually formulated as a problem of

statistical hypothesis testing. For a given input

biometric X and a claimed identity, the problem

formulation is to test the null hypothesis H0: X is

from the claimed identity, against the alternative

hypothesis H: X is not from the claimed identity. If

the probabilities of both the hypotheses are known

exactly, according to Neyman-Pearson’s Lemma,

the optimum test to decide between these two

hypotheses is a likelihood ratio test. Consider a

false rejection of a claimed user caused in the cur-

rent likelihood ratio-based scores because of the

use of the background model set. The likelihood

values of the background models are assumed to

be equally weighted. However, this assumption is

often not true as the similarity measures between

each background model and the claimed model

might be different. A likelihood transformation is

proposed to overcome this problem. An alterna-

tive approach is the use of prior probabilities as

mixture weights for background models.

On the other hand, a false acceptance of an

untrue user can arise because of the relativity of

the ratio-based values. For example, the two ratios

of (0.06 / 0.03) and (0.00006 / 0.00003) have the

same value. The first ratio can lead to a correct

decision whereas the second one is not likely to

because both likelihood values are very low.

Estimation of Prior Probabilities

The prior probabilities can be estimated directly

from the training data. Let X be the training data

set used to train the model set = { 1, 2, …, M}.

The prior probabilities P( i ) satisfies:

1

( | ) 1

M

i

i

P (1)

Maximising P(X | ) over P( i ) using the La-

grangian method, the updated prior probabilities

( | )iP is calculated from P( i ) as follows:

1

( | , ) ( | )

( | )

( | , ) ( | )

i i

i M

k k

k

P X P

P

P X P

or:

1

1

( | , ) ( | )1

( | )

( | , ) ( | )

T

t i i

i M

t

t k k

k

P x P

P

T P x P

(2)

The second estimation method in (2) is called

frame-level prior estimation to distinguish it from

the first estimation.

Temporal Models

The use of codewords in a codebook as states of

a Markov chain was developed by Dai (1995) for

Page 222

�0�

MAITS

isolated word recognition. The proposed research

extends this idea to a general framework, and

hence it can apply to HMMs, GMMs, and their

fuzzy versions.

Let O = o1, o2, …, oT denote a stochastic pro-

cess in discrete time. The probability that the

t-th variable ot takes the value wt depends on the

values taken by all the previous variables. Using

the Markov assumption, the probability that the

t-th variable ot takes the value wt depends on the

immediately preceding value ot – 1 as follows:

1 1 2 2 1 1( | , ,..., )t t t t t tP o w o w o w o w- - - -= = = = =

1 1( | )t t t tP o w o w- -= =

(3)

The stochastic processes based on this assump-

tion are termed Markov processes. Markov chains

are Markov processes for which state variables

are restricted to have a finite number of values,

and the probability )|( 11 -- == tttt wowoP is as-

sumed to be invariant in time. The sequence W

= w1, w2, …, wT represents a sequence of states.

In order to apply Markov chains theory to tem-

poral models, the feature vectors are considered

outputs of Markov chains. Let X = x1, x2, …, xT

be a sequence of feature vectors that represents

a spoken or written word, a feature vector xt

can be mapped to either a member of the set of

codewords V ={v1, v2, …, vK} obtained by vector

quantisation (VQ) modelling or a member of the

set of Gaussian components G ={g1, g2, …, gK}

by GMM. The state sequence W may be either a

codeword sequence w1 = vi, w2 = vj, …, wT = vm or

a Gaussian sequence w1 = gi, w2 = gj, …, wT = gm,

where 1 < i, j, m < K. Therefore, each codeword in

V or each Gaussian in G is a specific state of the

Markov chain. The state-transition probabilities

of the Markov chain are used to represent the

dependence between acoustic features.

It should be noted that each observation in the

sequence O is assumed to be statistically depen-

dent on its predecessor in the proposed temporal

model. In the HMM, observations in the sequence

O are assumed to be independent. Therefore the

temporal model has avoided the limitation of the

HMM. There are 2 types of temporal models:

Temporal Gaussian Mixture Model and Temporal

Hidden Markov Model. The later can be further

classified into discrete temporal hidden Markov

model (DTHMM) and continuous temporal hid-

den Markov model (CTHMM) (Tran, 2004b).

Methods

Let lc be the claimed speaker model and l be a

model representing all other possible speakers,

that is, impostors. Let P(X |lc) and P(X |l) be the

likelihood functions of the claimed speaker and

impostors, respectively. For a given input utter-

ance X and a claimed identity, a basic hypothesis

test is between the null hypothesis H0: X is from

lc and the alternative hypothesis H1: X is from l.

The optimum test to decide between these two

hypotheses is a likelihood ratio test given by:

0

0

accept H( | )

( )

reject H( | )

cP XS X

P X

>

=

≤

(4)

S(X) is regarded as the claimed speaker’s

similarity score. While the model for H0 can be

estimated using the training data from the claimed

speaker, the model for impostors is less defined.

The first approach is to use a subset of impostors’

models that is representative of the population

close (similar) to the claimed speaker. This subset

has been called cohort set or background speaker

set. Depending on the approximation of P(X | l) in

(4) by the likelihood functions of the background

model set P(X | li), i = 1, …, B, we obtain different

normalisation methods (Burileanu, Moraru, Bo-

jan, Puchiu, & Stan, 2002). The second approach

is to pool speech from several speakers to train a

single universal background model (Reynolds et

al., 2000). The third approach is the hybrid cohort-

Page 441

��0

Index

P

patient agent (PA) 320

peer-to-peer (P2P)

client 117

network 116–125

performance

analysis 136

prediction toolkit (PACE) 184

personal

agent 57

data 122

scheduling management 57

pharmacy agent (PhA) 320

phishing 195

polling 66

port scanning 194–195

pre-mortem-based computer forensics agent 192

process planning agent (PPA) 300

producer-consumer problem 32

productivity 1

programming language 3, 27, 107

Q

quality 16

quasi-likelihood estimation (QLE) 197, 203

R

rationality 27

recommendation cycle 254

recommender system 254, 260

registry server agent (RSA) 299

reinforcement learning (RL) 184, 239–252

problem (RLP) 264

reliability 16

repetition 262

rescheduling 302

reusability 16, 108

RiskMan (risk management) 364–384

risk management 364–384

robot control paradigm 344

robust intelligent control 342–363

robustness 346

runtime engine 368

S

scalability 16

scalable fault tolerant agent grooming environment

(SAGE) 144–174

scanning 194

scheduling 48, 122, 290–291

agent 298, 300

problem 50

scripted agent 369

security 16, 108, 118, 192–212

seed selection 292

self-healing 342

self-organization 276, 316, 318

semantic language (SL) 13

sense

-act (SA) 344

-plan-act (SPA) 344

108

Shakey 344

shared memory 30

similarity knowledge 261

SimMaster 369, 376

simple network management protocol (SNMP) 127

Slammer virus 193

software

agent 180

customization 100

spam 195

sportier critique 257

SpyWare 195

standard set 62

stationary agent 320

storage device card 320

story engine 372

structured thinking 4

supplier management agent (SMA) 298–299

supply chain management (SCM) 288–312, 313–

341

survivability 198

swarm intelligence approach 274–287

swarms 316

system log monitoring agent 192

T

T-Cham 27

task

-scheduling algorithm 128

allocation 274–287

decomposition 130

manipulation 128

reduction 215–218

temporal

logic 37

properties 37

time slot 49

tourism 274

Page 442

���

Index

trainee

agent 369

interface (TI) 376

transmission control protocol/internet protocol

(TCP/IP) 117

tuple space 31

U

uniform resource locator (URL) 123

unit critique 256

unstructured thinking 4

user

feedback 253

knowledge 260

V

vector quantization 197

Venn Diagram 31

veracity 27

30

virtual

agent cluster 152

organization (VO) 176

reality training 364, 366

virus 193

visual management agent 149

vocabulary knowledge 260

voice authentication 201

W

122

working agent 192

worms 193

WS-Resource 177

Architectural Design of

Multi-Agent Systems:

Technologies and Techniques

Hong Lin

University of Houston – Downtown, USA

Hershey • New York

INFORMAT ION SC IENCE REFERENCE

Page 221

�00

MAITS

can avoid this limitation of the HMM. Moreover,

the HMM is trained using maximum likelihood

estimation (MLE). The MLE method assumes the

data sets have a statistical distribution. However,

the statistical distribution of biometric or network

data is unknown. To avoid these limitations of

the current HMM, we use the temporal Markov

modeling (TMM) method presented a later sec-

tion. The TMM has an observed Markov chain

where cluster indices are states to represent the

dependence of each observation on its predeces-

sor. The hidden Markov chain in the HMM is

still employed in the TMM (Tran, 2004b; Tran

& Pham, 2005).

We train the TMM using the quasi-likeli-

hood estimation (QLE) method. The reason is

that computer network data and biometric data

do not have a statistical distribution, therefore

the current MLE method is not appropriate. The

QLE method requires only assumptions on the

mean and variance functions rather than the full

likelihood.

The method used to determine the similarity

score S is usually formulated as a problem of

statistical hypothesis testing. For a given input

biometric X and a claimed identity, the problem

formulation is to test the null hypothesis H0: X is

from the claimed identity, against the alternative

hypothesis H: X is not from the claimed identity. If

the probabilities of both the hypotheses are known

exactly, according to Neyman-Pearson’s Lemma,

the optimum test to decide between these two

hypotheses is a likelihood ratio test. Consider a

false rejection of a claimed user caused in the cur-

rent likelihood ratio-based scores because of the

use of the background model set. The likelihood

values of the background models are assumed to

be equally weighted. However, this assumption is

often not true as the similarity measures between

each background model and the claimed model

might be different. A likelihood transformation is

proposed to overcome this problem. An alterna-

tive approach is the use of prior probabilities as

mixture weights for background models.

On the other hand, a false acceptance of an

untrue user can arise because of the relativity of

the ratio-based values. For example, the two ratios

of (0.06 / 0.03) and (0.00006 / 0.00003) have the

same value. The first ratio can lead to a correct

decision whereas the second one is not likely to

because both likelihood values are very low.

Estimation of Prior Probabilities

The prior probabilities can be estimated directly

from the training data. Let X be the training data

set used to train the model set = { 1, 2, …, M}.

The prior probabilities P( i ) satisfies:

1

( | ) 1

M

i

i

P (1)

Maximising P(X | ) over P( i ) using the La-

grangian method, the updated prior probabilities

( | )iP is calculated from P( i ) as follows:

1

( | , ) ( | )

( | )

( | , ) ( | )

i i

i M

k k

k

P X P

P

P X P

or:

1

1

( | , ) ( | )1

( | )

( | , ) ( | )

T

t i i

i M

t

t k k

k

P x P

P

T P x P

(2)

The second estimation method in (2) is called

frame-level prior estimation to distinguish it from

the first estimation.

Temporal Models

The use of codewords in a codebook as states of

a Markov chain was developed by Dai (1995) for

Page 222

�0�

MAITS

isolated word recognition. The proposed research

extends this idea to a general framework, and

hence it can apply to HMMs, GMMs, and their

fuzzy versions.

Let O = o1, o2, …, oT denote a stochastic pro-

cess in discrete time. The probability that the

t-th variable ot takes the value wt depends on the

values taken by all the previous variables. Using

the Markov assumption, the probability that the

t-th variable ot takes the value wt depends on the

immediately preceding value ot – 1 as follows:

1 1 2 2 1 1( | , ,..., )t t t t t tP o w o w o w o w- - - -= = = = =

1 1( | )t t t tP o w o w- -= =

(3)

The stochastic processes based on this assump-

tion are termed Markov processes. Markov chains

are Markov processes for which state variables

are restricted to have a finite number of values,

and the probability )|( 11 -- == tttt wowoP is as-

sumed to be invariant in time. The sequence W

= w1, w2, …, wT represents a sequence of states.

In order to apply Markov chains theory to tem-

poral models, the feature vectors are considered

outputs of Markov chains. Let X = x1, x2, …, xT

be a sequence of feature vectors that represents

a spoken or written word, a feature vector xt

can be mapped to either a member of the set of

codewords V ={v1, v2, …, vK} obtained by vector

quantisation (VQ) modelling or a member of the

set of Gaussian components G ={g1, g2, …, gK}

by GMM. The state sequence W may be either a

codeword sequence w1 = vi, w2 = vj, …, wT = vm or

a Gaussian sequence w1 = gi, w2 = gj, …, wT = gm,

where 1 < i, j, m < K. Therefore, each codeword in

V or each Gaussian in G is a specific state of the

Markov chain. The state-transition probabilities

of the Markov chain are used to represent the

dependence between acoustic features.

It should be noted that each observation in the

sequence O is assumed to be statistically depen-

dent on its predecessor in the proposed temporal

model. In the HMM, observations in the sequence

O are assumed to be independent. Therefore the

temporal model has avoided the limitation of the

HMM. There are 2 types of temporal models:

Temporal Gaussian Mixture Model and Temporal

Hidden Markov Model. The later can be further

classified into discrete temporal hidden Markov

model (DTHMM) and continuous temporal hid-

den Markov model (CTHMM) (Tran, 2004b).

Methods

Let lc be the claimed speaker model and l be a

model representing all other possible speakers,

that is, impostors. Let P(X |lc) and P(X |l) be the

likelihood functions of the claimed speaker and

impostors, respectively. For a given input utter-

ance X and a claimed identity, a basic hypothesis

test is between the null hypothesis H0: X is from

lc and the alternative hypothesis H1: X is from l.

The optimum test to decide between these two

hypotheses is a likelihood ratio test given by:

0

0

accept H( | )

( )

reject H( | )

cP XS X

P X

>

=

≤

(4)

S(X) is regarded as the claimed speaker’s

similarity score. While the model for H0 can be

estimated using the training data from the claimed

speaker, the model for impostors is less defined.

The first approach is to use a subset of impostors’

models that is representative of the population

close (similar) to the claimed speaker. This subset

has been called cohort set or background speaker

set. Depending on the approximation of P(X | l) in

(4) by the likelihood functions of the background

model set P(X | li), i = 1, …, B, we obtain different

normalisation methods (Burileanu, Moraru, Bo-

jan, Puchiu, & Stan, 2002). The second approach

is to pool speech from several speakers to train a

single universal background model (Reynolds et

al., 2000). The third approach is the hybrid cohort-

Page 441

��0

Index

P

patient agent (PA) 320

peer-to-peer (P2P)

client 117

network 116–125

performance

analysis 136

prediction toolkit (PACE) 184

personal

agent 57

data 122

scheduling management 57

pharmacy agent (PhA) 320

phishing 195

polling 66

port scanning 194–195

pre-mortem-based computer forensics agent 192

process planning agent (PPA) 300

producer-consumer problem 32

productivity 1

programming language 3, 27, 107

Q

quality 16

quasi-likelihood estimation (QLE) 197, 203

R

rationality 27

recommendation cycle 254

recommender system 254, 260

registry server agent (RSA) 299

reinforcement learning (RL) 184, 239–252

problem (RLP) 264

reliability 16

repetition 262

rescheduling 302

reusability 16, 108

RiskMan (risk management) 364–384

risk management 364–384

robot control paradigm 344

robust intelligent control 342–363

robustness 346

runtime engine 368

S

scalability 16

scalable fault tolerant agent grooming environment

(SAGE) 144–174

scanning 194

scheduling 48, 122, 290–291

agent 298, 300

problem 50

scripted agent 369

security 16, 108, 118, 192–212

seed selection 292

self-healing 342

self-organization 276, 316, 318

semantic language (SL) 13

sense

-act (SA) 344

-plan-act (SPA) 344

108

Shakey 344

shared memory 30

similarity knowledge 261

SimMaster 369, 376

simple network management protocol (SNMP) 127

Slammer virus 193

software

agent 180

customization 100

spam 195

sportier critique 257

SpyWare 195

standard set 62

stationary agent 320

storage device card 320

story engine 372

structured thinking 4

supplier management agent (SMA) 298–299

supply chain management (SCM) 288–312, 313–

341

survivability 198

swarm intelligence approach 274–287

swarms 316

system log monitoring agent 192

T

T-Cham 27

task

-scheduling algorithm 128

allocation 274–287

decomposition 130

manipulation 128

reduction 215–218

temporal

logic 37

properties 37

time slot 49

tourism 274

Page 442

���

Index

trainee

agent 369

interface (TI) 376

transmission control protocol/internet protocol

(TCP/IP) 117

tuple space 31

U

uniform resource locator (URL) 123

unit critique 256

unstructured thinking 4

user

feedback 253

knowledge 260

V

vector quantization 197

Venn Diagram 31

veracity 27

30

virtual

agent cluster 152

organization (VO) 176

reality training 364, 366

virus 193

visual management agent 149

vocabulary knowledge 260

voice authentication 201

W

122

working agent 192

worms 193

WS-Resource 177