Class HeldoutProbDist
source code
object --+
|
ProbDistI --+
|
HeldoutProbDist
The heldout estimate for the probability distribution of the
experiment used to generate two frequency distributions. These two
frequency distributions are called the "heldout frequency
distribution" and the "base frequency distribution." The
heldout
estimate uses uses the heldout frequency distribution to predict the
probability of each sample, given its frequency in the base
frequency distribution.
In particular, the heldout estimate approximates the probability for a
sample that occurs r times in the base distribution
as the average frequency in the heldout distribution of all samples that
occur r times in the base distribution.
This average frequency is Tr[r]/(Nr[r]*N),
where:
-
Tr[r] is the total count in the heldout
distribution for all samples that occur r times
in the base distribution.
-
Nr[r] is the number of samples that occur r times in the base distribution.
-
N is the number of outcomes recorded by the
heldout frequency distribution.
In order to increase the efficiency of the prob
member
function, Tr[r]/(Nr[r]*N) is precomputed for each
value of r when the HeldoutProbDist
is
created.
|
__init__(self,
base_fdist,
heldout_fdist,
bins=None)
Use the heldout estimate to create a probability distribution for the
experiment used to generate base_fdist and
heldout_fdist . |
source code
|
|
list of float
|
_calculate_Tr(self)
Returns:
the list Tr, where Tr[r] is
the total count in heldout_fdist for all samples that
occur r times in base_fdist . |
source code
|
|
list of float
|
_calculate_estimate(self,
Tr,
Nr,
N)
Returns:
the list estimate, where estimate[r] is the probability estimate for any
sample that occurs r times in the base frequency
distribution. |
source code
|
|
FreqDist
|
base_fdist(self)
Returns:
The base frequency distribution that this probability distribution is
based on. |
source code
|
|
FreqDist
|
heldout_fdist(self)
Returns:
The heldout frequency distribution that this probability distribution
is based on. |
source code
|
|
float
|
prob(self,
sample)
Returns:
the probability for a given sample. |
source code
|
|
any
|
max(self)
Returns:
the sample with the greatest probability. |
source code
|
|
string
|
|
Inherited from ProbDistI :
logprob ,
samples
Inherited from object :
__delattr__ ,
__getattribute__ ,
__hash__ ,
__new__ ,
__reduce__ ,
__reduce_ex__ ,
__setattr__ ,
__str__
|
list of float
|
_estimate
A list mapping from r, the number of times that a
sample occurs in the base distribution, to the probability estimate
for that sample.
|
int
|
_max_r
The maximum number of times that any sample occurs in the base
distribution.
|
__init__(self,
base_fdist,
heldout_fdist,
bins=None)
(Constructor)
| source code
|
Use the heldout estimate to create a probability distribution for the
experiment used to generate base_fdist and
heldout_fdist .
- Parameters:
base_fdist (FreqDist ) - The base frequency distribution.
heldout_fdist (FreqDist ) - The heldout frequency distribution.
bins (int ) - The number of sample values that can be generated by the
experiment that is described by the probability distribution.
This value must be correctly set for the probabilities of the
sample values to sum to one. If bins is not
specified, it defaults to freqdist.B() .
- Overrides:
ProbDistI.__init__
|
- Returns:
list of float
- the list Tr, where Tr[r]
is the total count in
heldout_fdist for all samples
that occur r times in
base_fdist .
|
- Parameters:
Tr (list of float ) - the list Tr, where Tr[r]
is the total count in the heldout distribution for all samples
that occur r times in base distribution.
Nr (list of float ) - The list Nr, where Nr[r]
is the number of samples that occur r times
in the base distribution.
N (int ) - The total number of outcomes recorded by the heldout frequency
distribution.
- Returns:
list of float
- the list estimate, where estimate[r] is the probability estimate for any
sample that occurs r times in the base
frequency distribution. In particular, estimate[r] is Tr[r]/(N[r]*N). In the special case that N[r]=0, estimate[r] will
never be used; so we define estimate[r]=None
for those cases.
|
- Returns:
FreqDist
- The base frequency distribution that this probability
distribution is based on.
|
- Returns:
FreqDist
- The heldout frequency distribution that this probability
distribution is based on.
|
- Returns: float
- the probability for a given sample. Probabilities are always
real numbers in the range [0, 1].
- Overrides:
ProbDistI.prob
- (inherited documentation)
|
- Returns: any
- the sample with the greatest probability. If two or more samples
have the same probability, return one of them; which sample is
returned is undefined.
- Overrides:
ProbDistI.max
- (inherited documentation)
|
repr(x)
- Returns:
string
- A string representation of this
ProbDist .
- Overrides:
object.__repr__
|
_estimate
A list mapping from r, the number of times that a
sample occurs in the base distribution, to the probability estimate for
that sample. _estimate[r] is calculated
by finding the average frequency in the heldout distribution of all
samples that occur r times in the base distribution.
In particular, _estimate[r] = Tr[r]/(Nr[r]*N).
- Type:
list of float
|
_max_r
The maximum number of times that any sample occurs in the base
distribution. _max_r is used to decide how large
_estimate must be.
- Type:
int
|