How do ideas move between mailing lists? Are there certain members of the community who are responsible for the spread of ideas?
To answer this question, write an IPython Notebook that analyzes the language used in mailing list and who is responsible for introducing new terms.
- For an Archive, write a method that converts the Archive's data into a DataFrame indexed by terms (individual words or n-grams), with columns showing the identifier of the first person to send en email to the list using that word, and the date/time when they sent it. Remove stopwords from the list before processing. Call this a Term Introduction table.
- For two Archives in a related domain (such as ipython-dev and scipy-dev), join their two Term Introduction tables on the term, creating a new table that shows when and who introduced the term on each list. Discard those terms that did not get used on both list.
- For each term in the joined introduction table, identify if it was introduced by the same person to both lists, or by different people.
- For those terms that were introduced by different people to both lists, answer the question: could the person who introduced it to the second list have heard the term first on the other list? To check, see if the second term introducer participated on the first list before the first introduction.
This photo of the whiteboard may or may not be helpful for illustrating what's to be done.

How do ideas move between mailing lists? Are there certain members of the community who are responsible for the spread of ideas?
To answer this question, write an IPython Notebook that analyzes the language used in mailing list and who is responsible for introducing new terms.
This photo of the whiteboard may or may not be helpful for illustrating what's to be done.