Racket Slack Archive

ccshan

2018-6-27 17:14:28

Basically Daniel’s email made me worry about memory bytes being interpreted incorrectly

rjnw

2018-6-27 17:32:18

I will put some printf’s to see the values of those arrays

rjnw

2018-6-27 17:33:31

I have another issue. LdaLikelihood is always returning 0. https://github.com/rjnw/hakaru-benchmarks/blob/master/runners/hk/LdaGibbs/Likelihood.hs

ccshan

2018-6-27 17:46:55

Don’t use fromProb, which causes underflow (so the 0 you get is actually promising). Use log to get the log likelihood, without underflow.

ccshan

2018-6-27 17:47:50

(Of course I mean the log from our LogFloatPrelude)

rjnw

2018-6-27 22:06:53

@ccshan is this augur lda code correct (ndocs : Int, ntopics : Int, nwords : Int, topics_prior : Vec Real, word_prior : Vec Real, doc : Vec Int) => { param theta[d] ~ Dirichlet(topics_prior) for d <- 0 until ndocs ; param phi[k] ~ Dirichlet(word_prior) for k <- 0 until ntopics ; param z[d] ~ Categorical(theta[doc[d]]) for d <- 0 until nwords ; data w[d] ~ Categorical(phi[z[d]]) for d <- 0 until nwords ; }

rjnw

2018-6-27 22:07:16

I had to change lda to 1D as well

rjnw

2018-6-27 22:09:18

It’s giving me this error: NameError: Error: [CgConj] \| Product, could not match Pi(t10 <- 0 until ndocs) { Dirichlet(theta[t10] ; topics_prior) } with Pi(t12 <- 0 until nwords) { let t14 = doc[t12] in let t15 = theta[t14] in Categorical(z[t12] ; t15) }

ccshan

2018-6-27 22:31:32

I found where this error comes from. It doesn’t work better with 2D LDA?

rjnw

2018-6-27 22:45:45

I still haven’t figured out how to do irregular arrays. I will try that again then.

ccshan

2018-6-27 22:59:31

My guess is that, in order for 1D LDA to work, “the normalization rule…where z is a Categorical variable” in the AugurV2 paper needs to be generalized to where z is not necessarily a Categorical variable (such as z[d]) but possibly a bounded Int variable (such as doc[d]). Maybe this rule is implemented under -- == Mixture factoring in RwCore.hs and can be fixed, but I don’t understand that code yet. Meanwhile, if you could show what goes wrong when you try irregular arrays in 2D LDA, we can ask Daniel Huang about that.

rjnw

2018-6-27 23:25:07

Is this correct for 2D LDA (ntopics : Int, ndocs : Int, w_shape : Vec Int, topics_prior : Vec Real, words_prior : Vec Real) => { param theta[d] ~ Dirichlet(topics_prior) for d <- 0 until ndocs ; param phi[k] ~ Dirichlet(words_prior) for k <- 0 until ntopics ; param z[d, n] ~ Categorical(theta[d]) for d <- 0 until ndocs, n <- 0 until w_shape[d] ; data w[d, n] ~ Categorical(phi[z[d, n]]) for d <- 0 until ndocs, n <- 0 until w_shape[d] ; }

rjnw

2018-6-27 23:25:28

This gets terminated by segfault, address boundary error

rjnw

2018-6-27 23:28:09

w_shape is number of words in document at index i, ndocs is length of w_shape

ccshan

2018-6-27 23:42:28

Well, I’m just going by augurv2/examples/lda.py but that seems right. Does that example (2D LDA with all documents equal length) work for you? What if you take exactly that code but remove one word from one document? Can you add printf to the your generated code to make sure that memory bytes are interpreted as intended, or trace where the segfault happens?

ccshan

2018-6-27 23:44:25

Feel free to send your segfaulting Python code to Daniel Huang and cc me…

rjnw

2018-6-27 23:50:23

okay let me try

rjnw

2018-6-28 00:27:23

this time it’s stuck for 20 minutes with a smaller data set of 200 documents. :confused:

rjnw

2018-6-28 00:38:05

When I run it with their small data from examples it works, but doesn’t work with 20newsgroup.

rjnw

2018-6-28 00:50:37

sent an email to dan