hakaru has 5 sweeps, augur ~190
and that’s just snapshotting per sweep for hk, right?
no every hundred update plus every sweep
just surprised that we have such a flat line
our accuracy is 0.82xx (identical to second decimal). '((41.884949999999996 0.822325)
(83.79035 0.8235250000000001)
(125.68235 0.8230000000000001)
(167.59390000000002 0.82305)
(209.45149999999998 0.82255))
time and accuracy
I will verify if we are measuring the accuracy correctly. I did some changes this time to compare with augur
Hmm, weird that AugurV2 is not as accurate as JAGS?
@rjnw if you have the AugurV2 generated code handy, would you please share it with me so I can take a second look?
NaiveBayes? The one in benchmark repository is the one I am using for evaluation. https://github.com/rjnw/hakaru-benchmarks/blob/04b8c466581f9d776efed2ef77a076e68a5338a3/runners/augur/nb.py
I mean the generated C code for NB
Oh, yeah.
I am also in office if you want to discuss in person.
Thanks and would you please double check that the sample logging and accuracy computation handles the z1/z2 split correctly?
I’m coming in soon
Yeah I am looking into it.
Sorry, that was old c code. I changed the working directory recently. Here is correct one.
deleting the older post.
Ah ok
The accuracy we calculate for augur is accurate. It is around 0.49, I verified in python itself where we split z in z1 and z2.
@carette Were you able to replicate the maple connection issue in the docker image I sent some time ago?
@rjnw I have not had a chance to try yet. Working on a paper with a Friday deadline.
I guess this is also becoming urgent?
@carette well, the POPL deadline is on the 11th
so yes, fixing this soon is necessary for a fix to be useful
Also it would be nice to have some correct numbers for simplification time.
Ok, I’ll try.
@rjnw In Dan’s reply an hour ago, I feel he asked about several issues that are unlikely to explain the big accuracy gap. But before I reply, can you confirm that you’re holding out 10% in AugurV2 (rather than 1%) and still see accuracy of 48–49%?
And you should set JAGS n.adapt to 0, and re-measure JAGS accuracy and startup times if necessary. This doesn’t explain the big accuracy gap that remains after a couple hundred AugurV2 sweeps (which is what you ran with 10% hold-out, right?). Initialization differences might.
We checked 10% in your office, when the total number of documents was 2000
I can remeasure jags with n.adapt to 0
Oh! But the ~80% accuracy was achieved by JAGS and Hakaru using the full 20-newsgroups data set, with 19997 documents. It’s quite possible that using the full 20-newsgroups data set and holding out 10% of its topics, like with JAGS and Hakaru, would let AugurV2 achieve ~80% accuracy!
Yes, please remeasure jags with n.adapt=0
What do you mean by using full 20-newsgroup data set? In augur’s model there are two document arrays. z1
is supervised and has 19997–2000 documents and z2
is sampled for 2000.
Ok, that sounds correct. I misunderstood when you said “the total number of documents was 2000”.
Oh sorry I meant “total number of heldout documents was 2000”
Jags just finished running, here is what I have:
sweep time accuracy
1 21546s 0.7798
2 22055s 0.8109
This is similar to what we had before but now also with adapt=0