Racket Slack Archive

rjnw

2018-7-3 13:20:42

hakaru has 5 sweeps, augur ~190

samth

2018-7-3 13:21:34

and that’s just snapshotting per sweep for hk, right?

rjnw

2018-7-3 13:28:38

no every hundred update plus every sweep

samth

2018-7-3 13:31:15

just surprised that we have such a flat line

rjnw

2018-7-3 13:35:44

our accuracy is 0.82xx (identical to second decimal). '((41.884949999999996 0.822325) (83.79035 0.8235250000000001) (125.68235 0.8230000000000001) (167.59390000000002 0.82305) (209.45149999999998 0.82255)) time and accuracy

rjnw

2018-7-3 13:37:13

I will verify if we are measuring the accuracy correctly. I did some changes this time to compare with augur

ccshan

2018-7-3 14:00:53

Hmm, weird that AugurV2 is not as accurate as JAGS?

ccshan

2018-7-3 15:30:43

@rjnw if you have the AugurV2 generated code handy, would you please share it with me so I can take a second look?

rjnw

2018-7-3 15:32:29

NaiveBayes? The one in benchmark repository is the one I am using for evaluation. https://github.com/rjnw/hakaru-benchmarks/blob/04b8c466581f9d776efed2ef77a076e68a5338a3/runners/augur/nb.py

ccshan

2018-7-3 15:33:09

I mean the generated C code for NB

rjnw

2018-7-3 15:34:03

Oh, yeah.

rjnw

2018-7-3 15:35:40

I am also in office if you want to discuss in person.

ccshan

2018-7-3 15:35:42

Thanks and would you please double check that the sample logging and accuracy computation handles the z1/z2 split correctly?

ccshan

2018-7-3 15:35:49

I’m coming in soon

rjnw

2018-7-3 15:36:41

Yeah I am looking into it.

rjnw

2018-7-3 15:40:44

Sorry, that was old c code. I changed the working directory recently. Here is correct one.

rjnw

2018-7-3 15:41:04

deleting the older post.

ccshan

2018-7-3 15:41:18

Ah ok

rjnw

2018-7-3 16:01:38

The accuracy we calculate for augur is accurate. It is around 0.49, I verified in python itself where we split z in z1 and z2.

rjnw

2018-7-3 17:17:43

@carette Were you able to replicate the maple connection issue in the docker image I sent some time ago?

carette

2018-7-3 17:48:58

@rjnw I have not had a chance to try yet. Working on a paper with a Friday deadline.

carette

2018-7-3 17:49:07

I guess this is also becoming urgent?

samth

2018-7-3 17:49:32

@carette well, the POPL deadline is on the 11th

samth

2018-7-3 17:49:53

so yes, fixing this soon is necessary for a fix to be useful

rjnw

2018-7-3 17:50:50

Also it would be nice to have some correct numbers for simplification time.

carette

2018-7-3 18:02:22

Ok, I’ll try.

ccshan

2018-7-3 19:29:38

@rjnw In Dan’s reply an hour ago, I feel he asked about several issues that are unlikely to explain the big accuracy gap. But before I reply, can you confirm that you’re holding out 10% in AugurV2 (rather than 1%) and still see accuracy of 48–49%?

And you should set JAGS n.adapt to 0, and re-measure JAGS accuracy and startup times if necessary. This doesn’t explain the big accuracy gap that remains after a couple hundred AugurV2 sweeps (which is what you ran with 10% hold-out, right?). Initialization differences might.

rjnw

2018-7-3 19:38:55

We checked 10% in your office, when the total number of documents was 2000

rjnw

2018-7-3 19:40:13

I can remeasure jags with n.adapt to 0

ccshan

2018-7-3 19:44:40

Oh! But the ~80% accuracy was achieved by JAGS and Hakaru using the full 20-newsgroups data set, with 19997 documents. It’s quite possible that using the full 20-newsgroups data set and holding out 10% of its topics, like with JAGS and Hakaru, would let AugurV2 achieve ~80% accuracy!

Yes, please remeasure jags with n.adapt=0

rjnw

2018-7-3 21:26:42

What do you mean by using full 20-newsgroup data set? In augur’s model there are two document arrays. z1 is supervised and has 19997–2000 documents and z2 is sampled for 2000.

ccshan

2018-7-3 21:28:42

Ok, that sounds correct. I misunderstood when you said “the total number of documents was 2000”.

rjnw

2018-7-3 21:29:02

Oh sorry I meant “total number of heldout documents was 2000”

rjnw

2018-7-4 04:50:56

Jags just finished running, here is what I have:

rjnw

2018-7-4 04:51:54

sweep     time          accuracy
1             21546s     0.7798
2             22055s     0.8109

rjnw

2018-7-4 04:52:13

This is similar to what we had before but now also with adapt=0