
There’s a difference between the model used for jags inference and the model used for hs/rkt inference: in jags, dnorm(0,14)
means precision 14 rather than standard deviation 14, so it should be dnorm(0,1/14^2)
instead. But I don’t know if that’s what makes jags accuracy higher. Also I guess it’s not hard to manually check a couple of computed accuracies.

Also, based on the graphs @pravnar produced, I think something is wrong in the hs code and maybe also the rkt code

since accuracy shouldn’t decline over time

I am trying to run naive_bayes without categorical in haskell but its getting killed by my kernel because it is allocating a lot of virtual memory ~/w/hakaru-rktjit > ./test/hs/nb_simpbucket "./test/input/news/words" "./test/input/news/docs" "./test/input/news/topics"
["./test/input/news/words","./test/input/news/docs","./test/input/news/topics"]
"starting main"
fish: “./test/hs/nb_simpbucket "./test…” terminated by signal SIGKILL (Forced quit)
here is the dmesg: [74878.864601] [14839] 1000 14839 268442028 2721615 5333 14 0 0 nb_simpbucket
[74878.864602] Out of memory: Kill process 14839 (nb_simpbucket) score 528 or sacrifice child
[74878.864606] Killed process 14839 (nb_simpbucket) total-vm:1073768112kB, anon-rss:10886456kB, file-rss:4kB, shmem-rss:0kB
Did anyone get any similar error, I was trying to run naivebayes with the full news dataset to compare timings with sham

compiling using O2 ~/w/h/hakaru > stack ghc -- -O2 "../../hakaru-rktjit/hs/nb_simpbucket.hs" -o "../../hakaru-rktjit/test/hs/nb_simpbucket"
[1 of 1] Compiling Main ( ../../hakaru-rktjit/hs/nb_simpbucket.hs, ../../hakaru-rktjit/hs/nb_simpbucket.o )
Linking ../../hakaru-rktjit/test/hs/nb_simpbucket ...

testing with smaller inputs like number of docs 1000 and number of words 200 with 20 topics seems to run fine

@samth Well, accuracy may sometimes decrease because each sampler is trying to generate a stream of classification guesses such that more likely guesses occur more often—but less likely guesses still occur, just proportionately less often.

@rjnw Well it would be nice to have a sequential log of memory allocations, at least showing the size of each allocation.

@ccshan sure, but doesn’t the jags plot look a lot more sensible than the others?

Yes but just a tad more sensible…

@ccshan here is a file with RTS option -S https://gist.github.com/rjnw/e95bf6331cc142ade11bc12ec0b18f55

@rjnw Oh sorry I didn’t notice it was the hs that was using too much memory. I’m having trouble reproducing the problem simply because I don’t have your test
directory. Would you please point me at your nb_simpbucket.hs
?