Logo Search packages:      
Sourcecode: hellanzb version File versions  Download package

def Hellanzb::NZBLeecher::NZBSegmentQueue::RetryQueue::createQueues (   self  ) 

Create the retry PriorityQueues for all known serverPools

This is a hairy way to do this. It's not likely to scale for more than probably
4-5 serverPools. However it is functionally ideal for a reasonable number of
serverPools

The idea is you want your downloaders to always be busy. Without the RetryQueue,
they would simply always pull the next available segment out of the main
NZBSegmentQueue. Once the NZBSegmentQueue was empty, all downloaders knew they
were done

Now that we desire the ability to requeue a segment that failed on a particular
serverPool, the downloaders need to exclude the segments they've previously failed
to download, when pulling segments out of the NZBSegmentQueue

If we continue keeping all queued (and now requeued) segments in the same queue,
the potentially many downloaders could easily end up going through the entire
queue seeking a segment they haven't already tried. This is unacceptable when our
queues commonly hold over 60K items

The best way I can currently see to support the downloaders being able to quickly
lookup the 'actual' next segment they want to download is to have multiple queues,
indexed by what serverPool(s) have previously failed on those segments

If we have 3 serverPools (1, 2, and 3) we end up with a dict looking like:

not1     -> q
not2     -> q
not3     -> q
not1not2 -> q
not1not3 -> q
not2not3 -> q

I didn't quite figure out the exact equation to gather the number of Queues in
regard to the number of serverPools, but (if my math is right) it seems to grow
pretty quickly (is quadratic)

Every serverPool avoids certain queues. In the previous example, serverPool 1 only
needs to look at all the Queues that are not tagged as having already failed on 1
(not2, not3, and not2not3) -- only half of the queues

The numbers:

serverPools    totalQueues    onlyQueues

2              2              1
3              6              3
4              14             7
5              30             15
6              62             31
7              126            63

The RetryQueue.get() algorithim simply checks all queues for emptyness until it
finds one with items in it. The > 5 is worrysome. That means for 6 serverPools,
the worst case scenario (which could be very common in normal use) would be to
make 31 array len() calls. With a segment size of 340KB, downloading at 1360KB/s,
(and multiple connections) we could be doing those 31 len() calls on average of 4
times a second. And with multiple connections, this could easily spurt to near
your max connection count, per second (4, 10, even 30 connections?)

Luckily len() calls are as quick as can be and who the hell uses 6 different
usenet providers anyway? =]

Definition at line 112 of file NZBSegmentQueue.py.

00112                           :
        """ Create the retry PriorityQueues for all known serverPools

        This is a hairy way to do this. It's not likely to scale for more than probably
        4-5 serverPools. However it is functionally ideal for a reasonable number of
        serverPools

        The idea is you want your downloaders to always be busy. Without the RetryQueue,
        they would simply always pull the next available segment out of the main
        NZBSegmentQueue. Once the NZBSegmentQueue was empty, all downloaders knew they
        were done

        Now that we desire the ability to requeue a segment that failed on a particular
        serverPool, the downloaders need to exclude the segments they've previously failed
        to download, when pulling segments out of the NZBSegmentQueue

        If we continue keeping all queued (and now requeued) segments in the same queue,
        the potentially many downloaders could easily end up going through the entire
        queue seeking a segment they haven't already tried. This is unacceptable when our
        queues commonly hold over 60K items

        The best way I can currently see to support the downloaders being able to quickly
        lookup the 'actual' next segment they want to download is to have multiple queues,
        indexed by what serverPool(s) have previously failed on those segments

        If we have 3 serverPools (1, 2, and 3) we end up with a dict looking like:

        not1     -> q
        not2     -> q
        not3     -> q
        not1not2 -> q
        not1not3 -> q
        not2not3 -> q

        I didn't quite figure out the exact equation to gather the number of Queues in
        regard to the number of serverPools, but (if my math is right) it seems to grow
        pretty quickly (is quadratic)

        Every serverPool avoids certain queues. In the previous example, serverPool 1 only
        needs to look at all the Queues that are not tagged as having already failed on 1
        (not2, not3, and not2not3) -- only half of the queues

        The numbers:

        serverPools    totalQueues    onlyQueues

        2              2              1
        3              6              3
        4              14             7
        5              30             15
        6              62             31
        7              126            63

        The RetryQueue.get() algorithim simply checks all queues for emptyness until it
        finds one with items in it. The > 5 is worrysome. That means for 6 serverPools,
        the worst case scenario (which could be very common in normal use) would be to
        make 31 array len() calls. With a segment size of 340KB, downloading at 1360KB/s,
        (and multiple connections) we could be doing those 31 len() calls on average of 4
        times a second. And with multiple connections, this could easily spurt to near
        your max connection count, per second (4, 10, even 30 connections?)

        Luckily len() calls are as quick as can be and who the hell uses 6 different
        usenet providers anyway? =]
        """
        # Go through all the serverPools and create the initial 'not1' 'not2'
        # queues
        # FIXME: could probably let the recursive function take care of this
        for i in range(len(self.serverPoolNames)):
            notName = 'not' + str(i + 1)
            self.poolQueues[notName] = PriorityQueue()

            self._recurseCreateQueues([i], i, len(self.serverPoolNames))

        # Finished creating all the pools. Now index every pool's list of valid retry
        # queues they need to check.  (using the above docstring, serverPool 1 would have
        # a list of 'not2', 'not3', and 'not2not3' in its nameIndex
        i = 0
        for name in self.serverPoolNames:
            i += 1
            
            valids = []
            for notName in self.poolQueues.keys():
                if notName.find('not' + str(i)) > -1:
                    continue
                valids.append(notName)
            self.nameIndex[name] = valids

    def _recurseCreateQueues(self, currentList, currentIndex, totalCount):


Generated by  Doxygen 1.6.0   Back to index