Do I understand correctly that, after the successful email conversation with Dan, @rjnw is now un-stuck on both the NB and the LDA benchmarks?
I think LDA was working, not sure about NB
There are two email threads, we only worked on lda. He replied to the NB thread too, but I don’t know what’s going on there.
On LDA, I believe you are about to produce plots. On NB, I think you should apply the same dtype debugging as LDA to NB (whether 1D or 2D), and see if AugurV2 stops classifying every test-set document as newsgroup # 19.
(reminder: MALLET for LDA is also worth comparing to)
hmm I will see if I can do mallet, right now I am working on augur and hakaru.
Is there a way to normalize the likelihood?
we get very small numbers in logspace
like –30613812.14543229 in logspace
right now I am just trying to subtract the smallest value across both augur and hakaru
Why bother normalize? Just plot. There’s no need to show 0 on the vertical axis.
hmm okay
The fact that the numbers are so small just means it is extremely unlikely for a random text generator to generate exactly the 20-newsgroups corpus. https://commons.wikimedia.org/wiki/File:Chimpanzee_seated_at_typewriter.jpg
I ran it twice but the plot is displayed only with one trial.
I am going to compare llvm with haskell backend to see if the compilation is correct.
Oh, x-axis is time in seconds and y-axis likelihood
I guess by “compare” you mean comparing the probabilities computed for a single update, because a whole sweep would take the Haskell backend too long.
How many sweeps does that llvm curve represent?