Package nltk_lite :: Module probability :: Class GoodTuringProbDist
[show private | hide private]
[frames | no frames]

Type GoodTuringProbDist

object --+    
         |    
 ProbDistI --+
             |
            GoodTuringProbDist


The Good-Turing estimate of a probability distribution. This method calculates the probability mass to assign to events with zero or low counts based on the number of events with higher counts. It does so by using the smoothed count c*: where c is the original count, N(i) is the number of event types observed with count i. These smoothed counts are then normalised to yield a probability distribution.
Method Summary
  __init__(self, freqdist, bins)
Creates a Good-Turing probability distribution estimate.
string __repr__(self)
Return a string representation of this ProbDist.
  freqdist(self)
any max(self)
Return the sample with the greatest probability.
float prob(self, sample)
Return the probability for a given sample.
list samples(self)
Return a list of all samples that have nonzero probabilities.
Inherited from ProbDistI: logprob
Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Method Details

__init__(self, freqdist, bins)
(Constructor)

Creates a Good-Turing probability distribution estimate. This method calculates the probability mass to assign to events with zero or low counts based on the number of events with higher counts. It does so by using the smoothed count c*:
  • c* = (c + 1) N(c + 1) / N(c)

where c is the original count, N(i) is the number of event types observed with count i. These smoothed counts are then normalised to yield a probability distribution.

The bins parameter allows N(0) to be estimated.
Parameters:
freqdist - The frequency counts upon which to base the estimation.
           (type=FreqDist)
bins - The number of possible event types. This must be at least as large as the number of bins in the freqdist. If None, then it's taken to be equal to freqdist.B().
           (type=Int)
Overrides:
nltk_lite.probability.ProbDistI.__init__

__repr__(self)
(Representation operator)

Returns:
A string representation of this ProbDist.
           (type=string)
Overrides:
__builtin__.object.__repr__

max(self)

Returns:
the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.
           (type=any)
Overrides:
nltk_lite.probability.ProbDistI.max (inherited documentation)

prob(self, sample)

Parameters:
sample - The sample whose probability should be returned.
           (type=any)
Returns:
the probability for a given sample. Probabilities are always real numbers in the range [0, 1].
           (type=float)
Overrides:
nltk_lite.probability.ProbDistI.prob (inherited documentation)

samples(self)

Returns:
A list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.
           (type=list)
Overrides:
nltk_lite.probability.ProbDistI.samples (inherited documentation)

Generated by Epydoc 2.1 on Tue Sep 5 09:37:20 2006 http://epydoc.sf.net