Racket Slack Archive

rjnw

2018-7-5 14:02:25

While making more graphs for GmmGibbs, the ones smaller in size than 50 classes like 25 classes are fine. But I tried 80 and 100 classes and the accuracy is at 3–4%

rjnw

2018-7-5 14:02:49

rjnw

2018-7-5 15:52:37

~/w/h/runners &gt; make gmm-hs classes=80 points=10000
./hkbin/gmmGibbs ../input/GmmGibbs/80-10000 ../testcode/jagssrc ../output/
gmmGibbs: Data.Number.LogFloat.(/): argument out of range
CallStack (from HasCallStack):
  error, called at src/Data/Number/LogFloat.hs:287:5 in logfloat-0.13.3.3-F8BPp8N0fTjGacJi2HZfLH:Data.Number.LogFloat
make: *** [Makefile:9: gmm-hs] Error 1

rjnw

2018-7-5 15:52:45

I am getting this from haskell backend

ccshan

2018-7-5 16:01:48

This is with testcode/hssrc/GmmGibbs.hs, or some other prog? Has it changed recently?

rjnw

2018-7-5 16:03:21

It’s this one https://github.com/rjnw/hakaru-benchmarks/blob/master/runners/hk/GmmGibbs/Prog4.hs

rjnw

2018-7-5 16:03:41

I believe it’s the same, I copied some time ago

rjnw

2018-7-5 16:04:18

Is there some way to debug it?

ccshan

2018-7-5 16:04:34

Oh ok. The error message at http://hackage.haskell.org/package/logfloat-0.13.3.3/docs/src/Data-Number-LogFloat.html says we’re dividing infinity by infinity

rjnw

2018-7-5 16:04:44

hmm

rjnw

2018-7-5 16:05:56

Then probably the accuracy we are getting from racket backend is just random starting z.

ccshan

2018-7-5 16:10:15

one place where infinity may be divided is if the input to categorical is an array that contains infinity.

rjnw

2018-7-5 16:12:30

I just pushed an input file which gives this error if you want to take a look.

rjnw

2018-7-5 16:13:10

commands cd runners ; make hkbin; make gmm-hs classes=80 points=10000

ccshan

2018-7-5 16:18:03

Preparing to install GHC (tinfo6-nopie) to an isolated location.

ccshan

2018-7-5 16:25:49

While it does that… is it easy for you to try this patch to Prog4.hs?

rjnw

2018-7-5 16:27:58

It’s running with the patch

rjnw

2018-7-5 16:28:34

Is there a change in hakaru file? So I also change the llvm backend?

ccshan

2018-7-5 16:28:53

So, I should change how the type checker inserts coercions, to produce the patched code, which avoids overflow

ccshan

2018-7-5 16:33:29

(This is about how the type checker inserts coercions in the result of hk-maple)

rjnw

2018-7-5 17:18:29

Okay the haskell code you sent runs but the accuracy is still ~3%

ccshan

2018-7-5 17:21:10

Hmm…

ccshan

2018-7-5 17:25:26

(I’m changing the type checker.) It’s not just because the haskell way is slow, is it?

ccshan

2018-7-5 17:44:16

It’s taking forever for me to test my change to the type checker, possibly because of the Maple communication issue. Would you please try rerunning hk-maple GmmGibbs.hs with hk-maple built from the commit(s) I just pushed to the hakaru repository?

samth

2018-7-5 21:02:32

Current status: • @rjnw has made edits to section 5, which @samth plans to go over • GMM works properly in all systems • LDA works properly on all systems with the smaller dataset (kos–50) • NB works on all systems, but there are issues with Augurv2’s accuracy, see long email thread with Dan Huang • LDA on bigger data sets requires improvements to Hakaru to be competetive • GMM with more classes seems to run into Hakaru bugs Based on this, @ccshan and I decided that we should just keep GMM/LDA/NB on the smaller data sets, where everything works. To address the issues with Augur accuracy, @rjnw should compute log-likelihood plots for Hk/Jags/Augur on NB. Besides that, @rjnw, you should focus on getting plots that are ready to go in the paper for all three benchmarks (we have the data already).

samth

2018-7-5 21:02:40

I think that’s a reasonable summary.

rjnw

2018-7-5 21:05:24

Right now I am finishing up lda for kos–50 and kos–100. kos–100 also shows similar relation b/w hakaru and augur as kos–50. As for calculating log-likelihood plots for naive-bayes, I don’t have the likelihood calculation.

ccshan

2018-7-5 23:47:34

Also, @samth @rjnw shall we meet tomorrow?

rjnw

2018-7-5 23:48:16

I will come in morning.

samth

2018-7-6 00:48:31

Yes

ccshan

2018-7-6 05:52:25

I think the weight removed by unsample is again the likelihood.