rjnw
2018-6-21 01:27:52

@ccshan I managed to get naive bayes in augur running, but I don’t know how to only update 10% of the documents. Can you look at it when you have some time. https://github.com/rjnw/hakaru-benchmarks/blob/master/runners/augur/nb.py


ccshan
2018-6-21 02:14:44

Thanks! Can you or I take a look at the C code to see if there’s an outer loop there that sweeps through updating all the elements of z?


ccshan
2018-6-21 02:17:20

If so, then maybe the easiest way to only update 10% of the documents is to split the z, doc, and w arrays into two each (training and test). Or maybe it’s easier to dive into the C code and change the loop to sweep through not all of z..


rjnw
2018-6-21 04:39:39

I looked at the C code I found the loop for updating z, it’s not too complicated to change it to only do 10%. I am going to be in office tomorrow afternoon we can take a look then.


ccshan
2018-6-21 04:41:38

Ok but I’m only going to be in in the morning. Again, maybe you’d find it easier to change the AugurV2 code. Let me give it a stab:


ccshan
2018-6-21 04:43:19

(Also by the way, I’m curious to notice that you switched to the 1D way we represent documents’ words, as opposed to the 2D way with an array of arrays of words. Wondering why. Not so important.)


ccshan
2018-6-21 04:45:07

Something like this: augur_nb = '''(K : Int, D1 : Int, D2 : Int, N1 : Int, N2 : Int, topic_prior : Vec Real, word_prior : Vec Real, doc1 : Vec Int, doc2 : Vec Int) => { param theta ~ Dirichlet(topic_prior); param phi[k] ~ Dirichlet(word_prior) for k <- 0 until K ; data z1[d] ~ Categorical(theta) for d <- 0 until D1 ; param z2[d] ~ Categorical(theta) for d <- 0 until D2 ; data w1[n] ~ Categorical(phi[z1[doc1[n]]]) for n <- 0 until N1; data w2[n] ~ Categorical(phi[z2[doc2[n]]]) for n <- 0 until N2; } '''