
While making more graphs for GmmGibbs, the ones smaller in size than 50 classes like 25 classes are fine. But I tried 80 and 100 classes and the accuracy is at 3–4%


~/w/h/runners > make gmm-hs classes=80 points=10000
./hkbin/gmmGibbs ../input/GmmGibbs/80-10000 ../testcode/jagssrc ../output/
gmmGibbs: Data.Number.LogFloat.(/): argument out of range
CallStack (from HasCallStack):
error, called at src/Data/Number/LogFloat.hs:287:5 in logfloat-0.13.3.3-F8BPp8N0fTjGacJi2HZfLH:Data.Number.LogFloat
make: *** [Makefile:9: gmm-hs] Error 1

I am getting this from haskell backend

This is with testcode/hssrc/GmmGibbs.hs, or some other prog? Has it changed recently?


I believe it’s the same, I copied some time ago

Is there some way to debug it?

Oh ok. The error message at http://hackage.haskell.org/package/logfloat-0.13.3.3/docs/src/Data-Number-LogFloat.html says we’re dividing infinity by infinity

hmm

Then probably the accuracy we are getting from racket backend is just random starting z
.

one place where infinity may be divided is if the input to categorical is an array that contains infinity.

I just pushed an input file which gives this error if you want to take a look.

commands cd runners ; make hkbin; make gmm-hs classes=80 points=10000

Preparing to install GHC (tinfo6-nopie) to an isolated location.

While it does that… is it easy for you to try this patch to Prog4.hs?

It’s running with the patch

Is there a change in hakaru file? So I also change the llvm backend?

So, I should change how the type checker inserts coercions, to produce the patched code, which avoids overflow

(This is about how the type checker inserts coercions in the result of hk-maple)

Okay the haskell code you sent runs but the accuracy is still ~3%

Hmm…

(I’m changing the type checker.) It’s not just because the haskell way is slow, is it?

It’s taking forever for me to test my change to the type checker, possibly because of the Maple communication issue. Would you please try rerunning hk-maple GmmGibbs.hs
with hk-maple built from the commit(s) I just pushed to the hakaru repository?

Current status: • @rjnw has made edits to section 5, which @samth plans to go over • GMM works properly in all systems • LDA works properly on all systems with the smaller dataset (kos–50) • NB works on all systems, but there are issues with Augurv2’s accuracy, see long email thread with Dan Huang • LDA on bigger data sets requires improvements to Hakaru to be competetive • GMM with more classes seems to run into Hakaru bugs Based on this, @ccshan and I decided that we should just keep GMM/LDA/NB on the smaller data sets, where everything works. To address the issues with Augur accuracy, @rjnw should compute log-likelihood plots for Hk/Jags/Augur on NB. Besides that, @rjnw, you should focus on getting plots that are ready to go in the paper for all three benchmarks (we have the data already).

I think that’s a reasonable summary.

Right now I am finishing up lda for kos–50 and kos–100. kos–100 also shows similar relation b/w hakaru and augur as kos–50. As for calculating log-likelihood plots for naive-bayes, I don’t have the likelihood calculation.

Also, @samth @rjnw shall we meet tomorrow?

I will come in morning.

Yes

I think the weight
removed by unsample
is again the likelihood.