ccshan
2018-6-27 17:14:28

Basically Daniel’s email made me worry about memory bytes being interpreted incorrectly


rjnw
2018-6-27 17:32:18

I will put some printf’s to see the values of those arrays


rjnw
2018-6-27 17:33:31

I have another issue. LdaLikelihood is always returning 0. https://github.com/rjnw/hakaru-benchmarks/blob/master/runners/hk/LdaGibbs/Likelihood.hs


ccshan
2018-6-27 17:46:55

Don’t use fromProb, which causes underflow (so the 0 you get is actually promising). Use log to get the log likelihood, without underflow.


ccshan
2018-6-27 17:47:50

(Of course I mean the log from our LogFloatPrelude)


rjnw
2018-6-27 22:06:53

@ccshan is this augur lda code correct (ndocs : Int, ntopics : Int, nwords : Int, topics_prior : Vec Real, word_prior : Vec Real, doc : Vec Int) => { param theta[d] ~ Dirichlet(topics_prior) for d <- 0 until ndocs ; param phi[k] ~ Dirichlet(word_prior) for k <- 0 until ntopics ; param z[d] ~ Categorical(theta[doc[d]]) for d <- 0 until nwords ; data w[d] ~ Categorical(phi[z[d]]) for d <- 0 until nwords ; }


rjnw
2018-6-27 22:07:16

I had to change lda to 1D as well


rjnw
2018-6-27 22:09:18

It’s giving me this error: NameError: Error: [CgConj] \| Product, could not match Pi(t10 <- 0 until ndocs) { Dirichlet(theta[t10] ; topics_prior) } with Pi(t12 <- 0 until nwords) { let t14 = doc[t12] in let t15 = theta[t14] in Categorical(z[t12] ; t15) }


ccshan
2018-6-27 22:31:32

I found where this error comes from. It doesn’t work better with 2D LDA?


rjnw
2018-6-27 22:45:45

I still haven’t figured out how to do irregular arrays. I will try that again then.


ccshan
2018-6-27 22:59:31

My guess is that, in order for 1D LDA to work, “the normalization rule…where z is a Categorical variable” in the AugurV2 paper needs to be generalized to where z is not necessarily a Categorical variable (such as z[d]) but possibly a bounded Int variable (such as doc[d]). Maybe this rule is implemented under -- == Mixture factoring in RwCore.hs and can be fixed, but I don’t understand that code yet. Meanwhile, if you could show what goes wrong when you try irregular arrays in 2D LDA, we can ask Daniel Huang about that.


rjnw
2018-6-27 23:25:07

Is this correct for 2D LDA (ntopics : Int, ndocs : Int, w_shape : Vec Int, topics_prior : Vec Real, words_prior : Vec Real) => { param theta[d] ~ Dirichlet(topics_prior) for d <- 0 until ndocs ; param phi[k] ~ Dirichlet(words_prior) for k <- 0 until ntopics ; param z[d, n] ~ Categorical(theta[d]) for d <- 0 until ndocs, n <- 0 until w_shape[d] ; data w[d, n] ~ Categorical(phi[z[d, n]]) for d <- 0 until ndocs, n <- 0 until w_shape[d] ; }


rjnw
2018-6-27 23:25:28

This gets terminated by segfault, address boundary error


rjnw
2018-6-27 23:28:09

w_shape is number of words in document at index i, ndocs is length of w_shape


ccshan
2018-6-27 23:42:28

Well, I’m just going by augurv2/examples/lda.py but that seems right. Does that example (2D LDA with all documents equal length) work for you? What if you take exactly that code but remove one word from one document? Can you add printf to the your generated code to make sure that memory bytes are interpreted as intended, or trace where the segfault happens?


ccshan
2018-6-27 23:44:25

Feel free to send your segfaulting Python code to Daniel Huang and cc me…


rjnw
2018-6-27 23:50:23

okay let me try


rjnw
2018-6-28 00:27:23

this time it’s stuck for 20 minutes with a smaller data set of 200 documents. :confused:


rjnw
2018-6-28 00:38:05

When I run it with their small data from examples it works, but doesn’t work with 20newsgroup.


rjnw
2018-6-28 00:50:37

sent an email to dan