chainer.functions.crf1d¶
-
chainer.functions.
crf1d
(cost, xs, ys, reduce='mean')[source]¶ Calculates negative log-likelihood of linear-chain CRF.
It takes a transition cost matrix, a sequence of costs, and a sequence of labels. Let \(c_{st}\) be a transition cost from a label \(s\) to a label \(t\), \(x_{it}\) be a cost of a label \(t\) at position \(i\), and \(y_i\) be an expected label at position \(i\). The negative log-likelihood of linear-chain CRF is defined as
\[L = -\left( \sum_{i=1}^l x_{iy_i} + \ \sum_{i=1}^{l-1} c_{y_i y_{i+1}} - {\log(Z)} \right) ,\]where \(l\) is the length of the input sequence and \(Z\) is the normalizing constant called partition function.
Note
When you want to calculate the negative log-likelihood of sequences which have different lengths, sort the sequences in descending order of lengths and transpose the sequences. For example, you have three input sequences:
>>> a1 = a2 = a3 = a4 = np.random.uniform(-1, 1, 3).astype(np.float32) >>> b1 = b2 = b3 = np.random.uniform(-1, 1, 3).astype(np.float32) >>> c1 = c2 = np.random.uniform(-1, 1, 3).astype(np.float32)
>>> a = [a1, a2, a3, a4] >>> b = [b1, b2, b3] >>> c = [c1, c2]
where
a1
and all other variables are arrays with(K,)
shape. Make a transpose of the sequences:>>> x1 = np.stack([a1, b1, c1]) >>> x2 = np.stack([a2, b2, c2]) >>> x3 = np.stack([a3, b3]) >>> x4 = np.stack([a4])
and make a list of the arrays:
>>> xs = [x1, x2, x3, x4]
You need to make label sequences in the same fashion. And then, call the function:
>>> cost = chainer.Variable( ... np.random.uniform(-1, 1, (3, 3)).astype(np.float32)) >>> ys = [np.zeros(x.shape[0:1], dtype=np.int32) for x in xs] >>> loss = F.crf1d(cost, xs, ys)
It calculates mean of the negative log-likelihood of the three sequences.
The output is a variable whose value depends on the value of the option
reduce
. If it is'no'
, it holds the elementwise loss values. If it is'mean'
, it holds mean of the loss values.- Parameters
cost (
Variable
or N-dimensional array) – A \(K \times K\) matrix which holds transition cost between two labels, where \(K\) is the number of labels.xs (list of Variable) – Input vector for each label.
len(xs)
denotes the length of the sequence, and eachVariable
holds a \(B \times K\) matrix, where \(B\) is mini-batch size, \(K\) is the number of labels. Note that \(B\)s in all the variables are not necessary the same, i.e., it accepts the input sequences with different lengths.ys (list of Variable) – Expected output labels. It needs to have the same length as
xs
. EachVariable
holds a \(B\) integer vector. Whenx
inxs
has the different \(B\), correspodingy
has the same \(B\). In other words,ys
must satisfyys[i].shape == xs[i].shape[0:1]
for alli
.reduce (str) – Reduction option. Its value must be either
'mean'
or'no'
. Otherwise,ValueError
is raised.
- Returns
A variable holding the average negative log-likelihood of the input sequences.
- Return type
Note
See detail in the original paper: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.