Corpus:
bake, bakes, baked, baking,
vote, votes, voted, voting,
list, lists, listed, listing,
salt, salts, salted, salting,
wait, waits, waited, waiting,
mark, marks, marked, marking,
pull, pulls, pulled, pulling
Length of the model | List lengths | `lambda(T) + lambda(F) + lambda(Sigma)` | `log 28 = 4.80` | |
Stems |
bake bakes baked baking vote votes voted voting list lists listed listing salt salts salted salting wait waits waited waiting mark marks marked marking pull pulls pulled pulling |
`sum_t (log26 * text(length)(t) + log ( [[W]] / [[t]] ) )` |
`150 * log 26 + 28 * log 28` `= 839.67` |
|
Suffixes | NONE | `sum_(f) (log26 * text(length)(f) + log ( [[W_A]] / [[f]] ))` | `0` | |
Signatures | {ptrs to all stems} {NONE} |
`sum_sigma log ( [[W]] / [[sigma]] ) +` `sum_sigma [ lambda(t_sigma) + lambda(f_sigma) + sum_t^(t_sigma) log ( [[W]] / [[t]] ) + sum_f^(f_sigma) log ( [[sigma]] / [[f in sigma]] )]` |
`29 * log 28 = 139.41` | |
`983.89` | ||||
Length of the compressed corpus | `sum_{w in W} [w] [log([[W]]/[[sigma_w]]) + log([[sigma_w]] / [[t_w]]) + log([[sigma_w]] / [[f_w in sigma_w]])]` | `28 * log 28 = 134.61` |
Corpus:
bake, bakes, baked, baking,
vote, votes, voted, voting,
list, lists, listed, listing,
salt, salts, salted, salting,
wait, waits, waited, waiting,
mark, marks, marked, marking,
pull, pulls, pulled, pulling
Length of the model | List lengths | `lambda(T) + lambda(F) + lambda(Sigma)` | `log 7 + log 6 + log 2 = 6.39` | ||||
Stems |
bak vot list salt wait mark pull |
`sum_t (log26 * text(length)(t) + log ( [[W]] / [[t]] ) )` | `26 * log 26 + 7 * log 7 = 141.86` | ||||
Suffixes |
NONE e es s ed ing |
`sum_(f) (log26 * text(length)(f) + log ( [[W_A]] / [[f]] ))` |
`9 * log26 + 2 * log 10.3 + 2 * log 3.4 + log 4.6` `= 54.76` |
||||
Signatures |
|
`sum_sigma log ( [[W]] / [[sigma]] ) +` `sum_sigma [ lambda(t_sigma) + lambda(f_sigma) + sum_t^(t_sigma) log ( [[W]] / [[t]] ) + sum_f^(f_sigma) log ( [[sigma]] / [[f in sigma]] )]` |
`log 14 + log 5.6 +` `log 2 + log 4 + 2 * log 3.5 + 4 * log 4 +` `log 5 + log 4 + 5 * log 1.4 + 4 * log 4` `= 35.66` |
||||
`233.13` | |||||||
Length of the compressed corpus | `sum_{w in W} [w] [log([[W]]/[[sigma_w]]) + log([[sigma_w]] / [[t_w]]) + log([[sigma_w]] / [[f_w in sigma_w]])]` | `8 * (log 3.5 + log 2 + log 4) +` `20 * (log 1.4 + log 5 + log 4)` `= 134.61` |