
Do I understand correctly that, after the successful email conversation with Dan, @rjnw is now un-stuck on both the NB and the LDA benchmarks?

I think LDA was working, not sure about NB

There are two email threads, we only worked on lda. He replied to the NB thread too, but I don’t know what’s going on there.

On LDA, I believe you are about to produce plots. On NB, I think you should apply the same dtype debugging as LDA to NB (whether 1D or 2D), and see if AugurV2 stops classifying every test-set document as newsgroup # 19.

(reminder: MALLET for LDA is also worth comparing to)

hmm I will see if I can do mallet, right now I am working on augur and hakaru.

Is there a way to normalize the likelihood?

we get very small numbers in logspace

like –30613812.14543229 in logspace

right now I am just trying to subtract the smallest value across both augur and hakaru

Why bother normalize? Just plot. There’s no need to show 0 on the vertical axis.

hmm okay

The fact that the numbers are so small just means it is extremely unlikely for a random text generator to generate exactly the 20-newsgroups corpus. https://commons.wikimedia.org/wiki/File:Chimpanzee_seated_at_typewriter.jpg


I ran it twice but the plot is displayed only with one trial.

I am going to compare llvm with haskell backend to see if the compilation is correct.

Oh, x-axis is time in seconds and y-axis likelihood

I guess by “compare” you mean comparing the probabilities computed for a single update, because a whole sweep would take the Haskell backend too long.
How many sweeps does that llvm curve represent?