Abstract |
Sequence labeling problems are tasks to assign
a label to each of the hidden variables in given sequences. In bioinformatics area, many problems such as gene finding and protein secondary structure prediction are naturally defined as sequence labeling problems. Although hidden Makov models have been employed for these tasks traditionally, and conditional models such as conditional random fields are being used more recently, they can not consider arbitrarily wide context in sequences efficiently. In this presentation, we introduce a kernel-based approach to labeling problems. One of the virtues of kernel methods is that they can handle arbitrary size features (e.g. arbitrary order Makov assumption by using string kernels) in polynomial time complexity. We
combine a labeling learning algorithm and marginalized kernel functions
with arbitrary sized features, which results in a new sequence labeling
algorithm incorporating arbitrary sized features in polynomial time
complexity. |