
@ccshan I managed to get naive bayes in augur running, but I don’t know how to only update 10% of the documents. Can you look at it when you have some time. https://github.com/rjnw/hakaru-benchmarks/blob/master/runners/augur/nb.py

Thanks! Can you or I take a look at the C code to see if there’s an outer loop there that sweeps through updating all the elements of z?

If so, then maybe the easiest way to only update 10% of the documents is to split the z
, doc
, and w
arrays into two each (training and test). Or maybe it’s easier to dive into the C code and change the loop to sweep through not all of z..

I looked at the C code I found the loop for updating z, it’s not too complicated to change it to only do 10%. I am going to be in office tomorrow afternoon we can take a look then.

Ok but I’m only going to be in in the morning. Again, maybe you’d find it easier to change the AugurV2 code. Let me give it a stab:

(Also by the way, I’m curious to notice that you switched to the 1D way we represent documents’ words, as opposed to the 2D way with an array of arrays of words. Wondering why. Not so important.)

Something like this: augur_nb = '''(K : Int, D1 : Int, D2 : Int, N1 : Int, N2 : Int, topic_prior : Vec Real, word_prior : Vec Real, doc1 : Vec Int, doc2 : Vec Int) => {
param theta ~ Dirichlet(topic_prior);
param phi[k] ~ Dirichlet(word_prior)
for k <- 0 until K ;
data z1[d] ~ Categorical(theta)
for d <- 0 until D1 ;
param z2[d] ~ Categorical(theta)
for d <- 0 until D2 ;
data w1[n] ~ Categorical(phi[z1[doc1[n]]])
for n <- 0 until N1;
data w2[n] ~ Categorical(phi[z2[doc2[n]]])
for n <- 0 until N2;
}
'''