ゲノム情報科学研究教育機構  アブストラクト
Date July 14, 2003
Speaker Marco Cuturi, Ecole des Mines de Paris
Title A covariance kernel derived from text compression tools
Abstract
  As the amount of DNA data in biological databases keeps growing at an increasing pace, the remote protein homology detection problem remains a major issue in bioinformatics.  We propose in this talk a novel kernel for sequences - with applications to protein homology detection - to be used with usual kernel methods such as SVM.  Borrowing ideas and techniques from the field of data compression, we show that positive definite kernels can be derived from arithmetic coding algorithms, when the coding probability is a Bayesian mixture related to the empirical mutual information between two strings.

  When the coding probability is obtained by averaging finite memory sources, with mixture of Dirichlet distributions as conditional probabilities, the kernel can be written as a triple-mixture of probabilities and be computed with a linear complexity in time and space.  Encouraging results will be shown on the problem of remote protein homology detection.
「セミナー」に戻る      
 ホーム