More Stats for Fedora Ambassadors

Today (after a week-end with low GRPS mobile bandwidth), I ran python script for stats for all entries in Ambassadors Country List (on high performance connection). It takes more than an hour :-/

Now, you can download:


Script (a wiki reg-ex crawler) may contain bugs (for example I’m not sure about some sparse “dot” on the wiki page). But I hope this open the way to make more significant stats for Fedora Project.

For the moment I’ve publish the first draft of this script (very “quick and dirty”). During next days I’ll try to package this actions into a pythons module, to make more reusable and more readable this kind of object. I hope to see something like:

import FedoraStats
foo = FedoraStats.Ambassadors()

Here you are the current draft:

#!/usr/bin/env python
# copyright 2007    Francesco Crippa <fcrippa>
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# GNU General Public License for more details.
import pycurl
import StringIO
import urllib
import re
import sys
import time
def find_world_region(data):
    list = data.split("\n")
    counts = {}
    country = "Not Defined"
    counts[country] = 0
    for x in list:
        if"^=== ", x):
            country = x.replace("===", "").strip().title()
            counts[country] = 0
        elif"^ \* ", x):
            counts[country] = counts[country] + 1
    return counts
def print_stats(counts):
    out = open('ambassadors_country_list.csv', 'w')
    sys.stdout = out
    print "FedoraAmbassadrs,",
    for date in counts:
        # date = 2008-01
        for country in counts[date]:
            # country = Italy
            # counts[date][country] = 5
            print "%s," % country ,
    print ""
    for date in counts:
        print "%s," % date ,
        for country in counts[date]:
            print "%d," % counts[date][country],
        print ""
def find_page_date(page):
    find_date = re.compile("([0-9]+)[- /.](0[1-9]|1[012])[- /.](0[1-9]|[12][0-9]|3[01])")
    date = find_date.findall(page)
    return date[0]
print "Welcome to Stats generator for Fedora Ambassadors"
if __name__ == "__main__":
    page = pycurl.Curl()
    vars = {"action": "raw"}
    url = ""
    print "    * Start configuration"
    page.setopt(pycurl.URL, url)
    page.setopt(pycurl.FOLLOWLOCATION, 1)
    page.setopt(pycurl.USERAGENT, "Mozilla/4.0")
    page.setopt(pycurl.MAXREDIRS, 5)
    page.setopt(pycurl.VERBOSE, 0)
    print "    * Downloading pages"
    counts = {}
    k = 0
    for i in range(1,663):
        #k = k + 1
        #if k == 50:
        #    print "        - I'm waiting some minutes..."
        #    time.sleep(100)
        #    k = 0
        content = StringIO.StringIO()
        page.setopt(pycurl.WRITEFUNCTION, content.write)
        vars = {}
        vars["rev"] = i
        print "        - Revision %d" % vars["rev"],
        page.setopt(pycurl.POSTFIELDS, urllib.urlencode(vars))
        year, month, day = find_page_date(content.getvalue())
        print "(Modified on %s-%s-%s)" % (year, month, day)
        vars["action"] = "raw"
        page.setopt(pycurl.POSTFIELDS, urllib.urlencode(vars))
        #print content.getvalue()
        counts[year+"-"+month] = find_world_region(content.getvalue())

3 Comments so far »

  1. Greg DeK said,

    Wrote on February 18, 2008 @ 5:20 pm

    These numbers are freaking awesome!

    The next set of questions involve activity. How can we determine whether a particular ambassador is “active”? How many of our 350 ambassadors are attending events, working on email lists, and so forth?

    There’s no simple answer, of course, but something to think about as you work on these metrics.

    Great, great work. Brilliant, actually.

  2. Fabian said,

    Wrote on February 20, 2008 @ 7:00 pm

    There are some issues with your script. It counts the lines of every area as far as I can see but there are regions where no ambassadors are and it count 1 because the country list ( contains some country names with no assigned ambassadors. Perhaps it’s possible to count the wiki names.

    But from now on it will be much easier to collect the data and build statistics.

    We better discuss this during FOSDEM while drinking some beer ;-)

  3. fcrippa said,

    Wrote on February 21, 2008 @ 12:43 am

    Yep, I think so too! ;-)

Comment RSS · TrackBack URI

Leave a Comment

Name: (Required)

E-mail: (Required)