This is a very interesting question. A good answer could have big social benefits.
In machine learning, a common meta-learning algorithm is to combine multiple learning algorithms by voting. It is well known that this meta-learning algorithm works best with a pool of diverse base learning algorithms. A natural measure of diversity among learning algorithms is conditional information:
This suggest that perhaps each vote in, say, Digg should be weighted by its conditional information. You would need to keep a history of each voter’s voting pattern to calculate this.
You may get some other useful ideas by searching through the machine learning literature on voting as a meta-learning strategy.
The idea of choosing a diverse subset comes up frequently in the experimental design literature, as a way to cover the parameter space well enough given the number of experiments that can actually be done. Most of the techniques presume some sort of existing distance measure, but that’s likely in a social network environment.
Kevembuanggasays:
Most of the techniques presume some sort of existing distance measure…
We use a lot of informal “closeness” criteria in all our thinking, intuitive or not, but it could be that this is a blind alley because the “obvious” nearest neighbour metrics doesn’t scale at high dimensionality.
So, if metrics become useless how the heck are we going to handle similarities and analogies, hey Peter (wink!), this ruins the whole spatial level of abstraction…
Anonymoussays:
Interesting idea Daniel. I think it might apply to citation-based recommenders too, although I don’t know how to adapt Conditional Information as a way of measuring “paper diversity”.
This is a very interesting question. A good answer could have big social benefits.
In machine learning, a common meta-learning algorithm is to combine multiple learning algorithms by voting. It is well known that this meta-learning algorithm works best with a pool of diverse base learning algorithms. A natural measure of diversity among learning algorithms is conditional information:
http://en.wikipedia.org/wiki/Conditional_information
This suggest that perhaps each vote in, say, Digg should be weighted by its conditional information. You would need to keep a history of each voter’s voting pattern to calculate this.
You may get some other useful ideas by searching through the machine learning literature on voting as a meta-learning strategy.
Clever comment Peter!
The idea of choosing a diverse subset comes up frequently in the experimental design literature, as a way to cover the parameter space well enough given the number of experiments that can actually be done. Most of the techniques presume some sort of existing distance measure, but that’s likely in a social network environment.
Most of the techniques presume some sort of existing distance measure…
We use a lot of informal “closeness” criteria in all our thinking, intuitive or not, but it could be that this is a blind alley because the “obvious” nearest neighbour metrics doesn’t scale at high dimensionality.
So, if metrics become useless how the heck are we going to handle similarities and analogies, hey Peter (wink!), this ruins the whole spatial level of abstraction…
Interesting idea Daniel. I think it might apply to citation-based recommenders too, although I don’t know how to adapt Conditional Information as a way of measuring “paper diversity”.
“anonymous” in that last comment was me.