ccshan
2017-11-20 19:56:41

There’s a difference between the model used for jags inference and the model used for hs/rkt inference: in jags, dnorm(0,14) means precision 14 rather than standard deviation 14, so it should be dnorm(0,1/14^2) instead. But I don’t know if that’s what makes jags accuracy higher. Also I guess it’s not hard to manually check a couple of computed accuracies.


samth
2017-11-20 22:01:34

Also, based on the graphs @pravnar produced, I think something is wrong in the hs code and maybe also the rkt code


samth
2017-11-20 22:01:41

since accuracy shouldn’t decline over time


rjnw
2017-11-20 22:58:25

I am trying to run naive_bayes without categorical in haskell but its getting killed by my kernel because it is allocating a lot of virtual memory ~/w/hakaru-rktjit > ./test/hs/nb_simpbucket "./test/input/news/words" "./test/input/news/docs" "./test/input/news/topics" ["./test/input/news/words","./test/input/news/docs","./test/input/news/topics"] "starting main" fish: “./test/hs/nb_simpbucket "./test…” terminated by signal SIGKILL (Forced quit) here is the dmesg: [74878.864601] [14839] 1000 14839 268442028 2721615 5333 14 0 0 nb_simpbucket [74878.864602] Out of memory: Kill process 14839 (nb_simpbucket) score 528 or sacrifice child [74878.864606] Killed process 14839 (nb_simpbucket) total-vm:1073768112kB, anon-rss:10886456kB, file-rss:4kB, shmem-rss:0kB Did anyone get any similar error, I was trying to run naivebayes with the full news dataset to compare timings with sham


rjnw
2017-11-20 22:59:46

compiling using O2 ~/w/h/hakaru > stack ghc -- -O2 "../../hakaru-rktjit/hs/nb_simpbucket.hs" -o "../../hakaru-rktjit/test/hs/nb_simpbucket" [1 of 1] Compiling Main ( ../../hakaru-rktjit/hs/nb_simpbucket.hs, ../../hakaru-rktjit/hs/nb_simpbucket.o ) Linking ../../hakaru-rktjit/test/hs/nb_simpbucket ...


rjnw
2017-11-21 00:10:19

testing with smaller inputs like number of docs 1000 and number of words 200 with 20 topics seems to run fine


ccshan
2017-11-21 02:48:29

@samth Well, accuracy may sometimes decrease because each sampler is trying to generate a stream of classification guesses such that more likely guesses occur more often—but less likely guesses still occur, just proportionately less often.


ccshan
2017-11-21 02:49:03

@rjnw Well it would be nice to have a sequential log of memory allocations, at least showing the size of each allocation.


samth
2017-11-21 02:50:22

@ccshan sure, but doesn’t the jags plot look a lot more sensible than the others?


ccshan
2017-11-21 02:51:22

Yes but just a tad more sensible…


rjnw
2017-11-21 03:35:17

@ccshan here is a file with RTS option -S https://gist.github.com/rjnw/e95bf6331cc142ade11bc12ec0b18f55


ccshan
2017-11-21 05:19:15

@rjnw Oh sorry I didn’t notice it was the hs that was using too much memory. I’m having trouble reproducing the problem simply because I don’t have your test directory. Would you please point me at your nb_simpbucket.hs?