nway merge

More fun with old code, woo!! Here is a snippet for an nway merge that abuses std::set by providing a weak order comparator where a strict weak ordering comparator is "required". The same could be done properly w/ std::multiset but my original intent was to abuse the poor sets in to allowing duplicates.

struct comparator{  
    bool operator()(list<int>* a, list<int>* b){
        return a->front() <= b->front();    //note: not a strict weak ordering, duplicates allowed
    }
};

// Roughly O(nlgn), where n = |{elems of all lists}|
list<int> nwaymerge(const std::vector<list<int>*>& mergeLists){  
    std::set<list<int>*,comparator> mergeSet;
    std::list<int> ret;
    for_each(mergeLists.begin(), mergeLists.end(), [&](std::list<int> *x){
        mergeSet.insert(x);
    });

    while(!mergeSet.empty()){   
        list<int>* v = *mergeSet.begin();
        mergeSet.erase(mergeSet.begin());       

        ret.push_back(v->front());          
        v->pop_front();                     

        // Use mergeSet.end() as insertion hint to get < O(lgn) insertion
        if(!v->empty())mergeSet.insert(mergeSet.end(),v);   
    }
    return ret;
}

void usage(){  
    list<int> a = {-2,1,1,2,3,4,5,5,6,10,11,12,13};
    list<int> b = {-2,0,1,1,1,3,4};
    list<int> c = {-4,-1,3,4,5,5,6,9,18,19};

    //  For comparison, use std::list's native sort()
    //  Do this before calling nwaymerge, nwaymerge is
    //  destructive to the input arrays.
    list<int> actual;
    {   //verify result using std::sort
        actual.insert(abc_actual.end(),a.begin(), a.end());
        actual.insert(abc_actual.end(),b.begin(), b.end());
        actual.insert(abc_actual.end(),c.begin(), c.end());
        actual.sort();
    }

    vector<list<int>*> abcs;
    abcs.push_back(&a);
    abcs.push_back(&b);
    abcs.push_back(&c);

    list<int> abc_nway = nwaymerge(abcs);    
}

std::set is being used as a priority queue that sorts all std::lists in mergeLists using the first element of each list (ie, the smallest element). The algorithm is very simple and very common, you'll find many similar versions on StackOverflow and random blogs.

One line worth noting is the following:

if(!v->empty())mergeSet.insert(mergeSet.end(),v);  

According to the std::set::insert documentation, insert has amortized complexity iff an optimal hint is given for the position of an inserted object:

If a single element is inserted, [complexity is] logarithmic in size in general, but amortized constant if a hint is given and the position given is the optimal.

nwaymerge already requires that the individual lists be sorted so one hint could be to reinsert each popped list at the end of mergeSet. If we can assume uniform distribution of values in each individual list the statistically the head element of the inserted list is likely to be <= the lowest value of the remaining lists. On average, sorting several lists of 2,000,000 values each took 0.05->0.1 milliseconds less time using nwaymerge vs. the built-in std::list::sort.

View Comments