Getting process info: Win32 and COM with python and C++

 

SUMMARY

Python is a rich scripting language offering a lot of the power of C++ while retaining the ease of use of VBscript. With it's simplified C++ style, win32 access, and ability to make COM servers, it's a natural rapid development environment for the developer.

When working with an unfamiliar API, python is great for helping you understand how to solve the problem without getting in the way. Even if you have to supply a C++ COM object, it is often easier to first figure out the details with python and then compose the C++ solution. Python is very similar to C++ pseudo-code, so you can follow it as an outline for the C++. In this case, we're going to talk about how to use both python and C++ to expose a list of processes and their corresponding ids with a COM object.

Introduction

To get process information for both NT and W2K (but not the 9x family) you can use the Performance Data Helper library(PDH) available in the SDK at microsoft's ftp site. It provides a convenient interface to performance information stored fundamentally in the registry. The basic process of using the PDH encompasses the following:

  1. Get a list of all the objects you want
  2. Get a list of the object's instances and data available for each instance: called 'items' or 'counters'
  3. Get a group of performance data for each counter

In our case the object we want is the process object, the object's instances are it's list of processes, and the counter we want for the processes is 'ID Process'.

The specific set of API called are the following:

  1. PdhEnumObjectItems -- get the list of instances (processes) for the process object
  2. PdhOpenQuery -- initialize query handle which will contain all the counters (in our case only: ID Process)
  3. PdhMakeCounterPath -- Each counter has a specific path, kind of like like a url
  4. PdhAddCounter -- convert the path to a handle and group the counters paths(in this case only one) together for the query
  5. PdhCollectQueryData -- gathers the actual data
  6. PdhGetFormattedCounterValue -- call for each counter to convert the data to a typical format
  7. PdhCloseQuery -- close the counters and the query handle

We'll cover these points now in more depth.

Getting the process list: PdhEnumObjectItems

The MSDN describes the call as the following:

 
PDH_STATUS PdhEnumObjectItems(
  LPCTSTR szDataSource,  
  LPCTSTR szMachineName, 
  LPCTSTR szObjectName,  
  LPTSTR mszCounterList, 
  LPDWORD pcchCounterListLength,
  LPTSTR mszInstanceList,  
  LPDWORD pcchInstanceListLength,
  DWORD dwDetailLevel,  
  DWORD dwFlags       
);

The python call is similar though simpler. For example, you do not need to bother with the list length -- it takes care of that for you. Both the python and C++ examples are taken from their COM components shown later.To call make with python would look like the following

    def proclist(self):
        try:
            junk, instances = win32pdh.EnumObjectItems(None,None,self.object, win32pdh.PERF_DETAIL_WIZARD)
            return instances
        except:
            raise COMException("Problem getting process list")

The variable instances contains the list of processes, and you'll find the item or counter than we want 'ID Process' present in the list of items. Since you can have multiple processes with the same name, in python it is convenient to use a dictionary to store a list of processes and how many you found for each type.

    for instance in instances:
        if proc_dict.has_key(instance):
            proc_dict[instance] = proc_dict[instance] + 1
        else:
            proc_dict[instance]=0

The C++ call though essentially the same in spirit is much more involved. To help, I use map(like a python dictionary) and string from Standard C++. The additional things you need to manage are:

HRESULT getinst (map < string,int > & m_inst) {
	
	map<string, int>::iterator iter;
	USES_CONVERSION;
	LPTSTR      szCountersBuf     = NULL;
	DWORD       dwCountersSize       = 0;
	LPTSTR      szInstancesBuf    = NULL;
	DWORD       dwInstancesSize      = 0;
	LPTSTR      szTemp          = NULL;
	
	PDH_STATUS status;
	
	std::string str_obj="Process";
	
	status = PdhEnumObjectItems(
		NULL,
		NULL,
		A2CT(str_obj.c_str()),
		NULL,
		&dwCountersSize,
		szInstancesBuf,
		&dwInstancesSize,
		PERF_DETAIL_WIZARD,
		0 );
	
	if ( ERROR_SUCCESS != status )
		return E_FAIL;
	
	
	if (dwCountersSize) {
		szCountersBuf = (LPTSTR)malloc (dwCountersSize * sizeof (TCHAR));
		if (szCountersBuf==NULL) {
			return E_FAIL;
		}
	} else
		szCountersBuf=NULL;
	
	if (dwInstancesSize) {
		szInstancesBuf = (LPTSTR)malloc (dwInstancesSize * sizeof (TCHAR));
		if (szInstancesBuf==NULL) {
			free(szCountersBuf);
			return E_FAIL;
		}
	} else
		szInstancesBuf = NULL;
	
	status = PdhEnumObjectItems(
		NULL,                   
		NULL,                   
		A2CT(str_obj.c_str()),  
		szCountersBuf,    
		&dwCountersSize,     
		szInstancesBuf,   
		&dwInstancesSize,    
		PERF_DETAIL_WIZARD,     
		0); 
	
	if ( ERROR_SUCCESS != status )		
		return E_FAIL;
	
	//it's a series of contingous NULL terminated strings, ending w/zero length string
	
	
	if (szInstancesBuf){
		for (szTemp = szInstancesBuf;*szTemp != 0;szTemp += lstrlen(szTemp) + 1) {
			m_inst[T2A(szTemp)]++; //increment instance counter
			//default value is zero for arith element
		}
	}
	
	return S_OK;		
	
}

Getting the process ids: Several Pdh Calls

A whole sequence of calls are necessary once you get the process list. To refresh your memory, you need:

  1. PdhOpenQuery -- initialize query handle which will contain all the counters (in our case only: ID Process)
  2. PdhMakeCounterPath -- Each counter has a specific path, kind of like like a url
  3. PdhAddCounter -- convert the path to a handle and group the counters paths(in this case only one) together for the query
  4. PdhCollectQueryData -- gathers the actual data
  5. PdhGetFormattedCounterValue -- call for each counter to convert the data to a typical format
  6. PdhCloseQuery -- close the counters and the query handle
As usual the python code matches this very cleanly (like pseudocode that actually runs) -- you can figure out the basic meanings and sequences of the win32 calls without having to deal with other details.

        for instance, max_instances in proc_dict.items():
            for inum in xrange(max_instances+1):
                try:
                    hq = win32pdh.OpenQuery() # initializes the query handle 
                    path = win32pdh.MakeCounterPath( (None,self.object,instance, None, inum, self.item) )
                    counter_handle=win32pdh.AddCounter(hq, path) #convert counter path to counter handle
                    win32pdh.CollectQueryData(hq) #collects data for the counter 
                    type, val = win32pdh.GetFormattedCounterValue(counter_handle, win32pdh.PDH_FMT_LONG)
                    proc_ids.append(instance+'\t'+str(val))
                    win32pdh.CloseQuery(hq) 
                except:
                    raise COMException("Problem getting process id")

Again, the C++ code is more involved and makes use of Standand C++, vector, map, and string. It converts a map of process names and the number for each name, to a vector of strings each which has tab-delimited process id entry. Also, the process id info is returned in the format of a double, which is converted to a string.

HRESULT getprocid (map<string,int>& m_inst, vector<string> &v_ids) {
	

	USES_CONVERSION;
	PDH_STATUS  status   = 0;
	HQUERY      hQuery      = NULL;
	HCOUNTER    hCounter    = NULL;
	DWORD       dwType      = 0;

	map<string,int> m_idinst;

	std::string objname="Process";
	std::string counter="ID Process";	
	char *buffer;int junk,junk2;
	
	// initialize the query handle
	
	map < string, int>::iterator iter;
	for (iter=m_inst.begin();iter != m_inst.end();++iter) {
		
		for (int i=0;i<= iter->second;++i) {
			status = PdhOpenQuery( NULL, 0, &hQuery );
			if ( status != ERROR_SUCCESS )
				return status;
			
			TCHAR       szCounterPath[2048];
			DWORD       dwPathSize  = 2048;
			PDH_COUNTER_PATH_ELEMENTS pdh_elm;
			
			pdh_elm.szMachineName     = NULL;
			pdh_elm.szObjectName      = A2T(objname.c_str());
			pdh_elm.szInstanceName    = A2T(iter->first.c_str());
			pdh_elm.szParentInstance  = NULL;
			pdh_elm.dwInstanceIndex   = i;
			pdh_elm.szCounterName     = A2T(counter.c_str());
			
			status = PdhMakeCounterPath( &pdh_elm, szCounterPath, &dwPathSize, 0 );
			if ( status != ERROR_SUCCESS ) { return E_FAIL; }
			
			
			// Add the counter to the query
			//PdhAddCounter converts each counter path into a counter handle 
			status = PdhAddCounter( hQuery, szCounterPath, 0, &hCounter );
			if ( status != ERROR_SUCCESS ) { return E_FAIL; }
			
			//PdhCollectQueryData gets raw data for the counters
			status = PdhCollectQueryData(hQuery);
			
			if ( status != ERROR_SUCCESS ) { return E_FAIL; }
			
			//PdhGetFormattedCounterValue  formats counter values for display
			DWORD           dwFormat = PDH_FMT_DOUBLE; 
			PDH_FMT_COUNTERVALUE   fmtValue;
			status = PdhGetFormattedCounterValue (hCounter,
				dwFormat, 
				(LPDWORD)NULL,
				&fmtValue);
			if (status == ERROR_SUCCESS) {
				buffer=_fcvt( fmtValue.doubleValue, 0, &junk,&junk2 );
				string id=buffer;
				v_ids.push_back(iter->first+'\t'+id);				
			} 
			
		}		
		status = PdhCloseQuery (hQuery);		
		//PdhCloseQuery closes the query handle and it's counters 
	}
	return S_OK;
}

The COM client.

The Python COM client which will call the C++ COM object and Python COM object does the following:

     import win32com.client
     a=win32com.client.Dispatch('NtPerf.process') #C++ com object
     print a.procids()

     b=win32com.client.Dispatch('PyPerf.process') #python com object
     print b.procids()

As far as it is concerned, there is no difference between the 2 objects. Both returns a list of processes and their respective id's seperated by tab.

Making the COM objects.

From a 1000 mile perspective, ATL C++ and python offer a class based COM object approach. In both approaches, the methods of the com object are simply methods of a class. However, creating a COM object in python is much easier than C++, again allowing you to focus on the problem first while still retaining the C++ feel.

Much of the details with python COM objects are exposed through a default policy which leverages IDispatch. You simply add a few attributes your python class, to expose your methods, prog id, add a line to register your class, and you are done. The policy knows what to do. Thus, it's easy enough to take a simple win32 class you wrote and add a few attributes and convert it to a COM object. Creating and developing python COM objects is simple, all that is needed is notepad. You don't need a full blown IDE nor do you have to go back and find the source code that created the object, since the object is the source code. It lends itself to very rapid development.

ATL provides wizards and a lot of the basic implementation goo (like COM interfaces for IUnknown and IDispatch), and wrappers for data types. However, there still a big difference between a simple console based app and an ATL COM object. The COM world of Variants, SafeArrays, and BSTR's is (as we'll see below) unfriendly to C++.

Python behind the scenes converts back and forth between it's native types and BSTR, Variants, SafeArrays, etc. When you change your COM object, you don't have to worry about changing IDL and constructing new Variant structures, it is managed for you.

In C++, it is more messy. First of all: strings. Since the concept of what text is isn't consistent, COM standardizes with it's own OLECHAR. Also, for non-COM text, because of issues between using ansi and unicode, you need to use the TCHAR data type which is a generic type that maps at compile time to what is necessary. And, with regard to BSTRs (length-prefixed strings), you need to convert the OLECHAR to BSTR's with SysAllocString. You notice the COM object below uses the Standard C++ string (which I prefer to use). When necessary, it converts the string it to the necessary COM type leveraging ATL conversion macros and SysAllocString (another option is to use the CComBSTR class).

Secondly: arrays. Since we are returning a list of processes and their ids, to be friendly with all languages, everything needs to be converted to SafeArrays of Variants housing BSTRs. Python does the conversion for you. With C++, you'll need to make various Variant calls to create the necessary structure. In the C++ COM object, I've encapsulated all the necessary code in a single function that converts any vector of strings into a 1 dimensional safe array of variants.

Now for some code:

The Python COM object.

The Python COM object has 2 methods proclist and procids. Proclist is trivial, simply returning the list of processes from EnumObjectItems. Procids calls proclist, constructs a dictionary to count the number of processes with the same name, and then makes the necessary calls to their their ids. Each function simple returns the python list (which is then converted for you). If you later decide to only return a single string, simple change what you return, and python again will convert for you. Each method also uses the very cool function COMException which returns errors back to the client. In addition to the methods, there are 4 basic attributes, I set to define the COM object and a line to register it.

import win32pdh, string, win32api
from win32com.server.exception import COMException
import win32com.server.util
import win32com.client.dynamic

#to generate guids use:
#import pythoncom
#print pythoncom.CreateGuid()

class pyperf:
    # COM attributes.
    _reg_clsid_ = '{763AE791-1D6B-11D4-A38B-00902798B22B}'
               #guid for your class in registry
    _reg_desc_ = "get process list and ids"
    _reg_progid_ = "PyPerf.process" #The progid for this class

    _public_methods_ = ['procids','proclist' ]  #names of callable methods
    def __init__(self):
        self.object='process'
        self.item='ID Process'
    def proclist(self):
        try:
            junk, instances = win32pdh.EnumObjectItems(None,None,self.object, win32pdh.PERF_DETAIL_WIZARD)
            return instances
        except:
            raise COMException("Problem getting process list")
    def procids(self):
        #each instance is a process, you can have multiple processes w/same name
        instances=self.proclist()
        proc_ids=[]
        proc_dict={}
        for instance in instances:
            if proc_dict.has_key(instance):
                proc_dict[instance] = proc_dict[instance] + 1
            else:
                proc_dict[instance]=0
        for instance, max_instances in proc_dict.items():
            for inum in xrange(max_instances+1):
                try:
                    hq = win32pdh.OpenQuery() # initializes the query handle 
                    path = win32pdh.MakeCounterPath( (None,self.object,instance, None, inum, self.item) )
                    counter_handle=win32pdh.AddCounter(hq, path) #convert counter path to counter handle
                    win32pdh.CollectQueryData(hq) #collects data for the counter 
                    type, val = win32pdh.GetFormattedCounterValue(counter_handle, win32pdh.PDH_FMT_LONG)
                    proc_ids.append(instance+'\t'+str(val))
                    win32pdh.CloseQuery(hq) 
                except:
                    raise COMException("Problem getting process id")

        proc_ids.sort()
        return proc_ids

if __name__=='__main__':
    import win32com.server.register
    win32com.server.register.UseCommandLine(pyperf)

The C++ COM object.

As you notice from the idl, C++ COM object also exposes 2 methods, proclist and procids. Proclist calls the getinst function returns returns a map of processes, converts that to a vector of strings, and the calls make_safe to convert that to a Safe array of Variants. Procids does much the same except that after calling getinst, it then calls getprocid, which returns a vector of strings containing the processes and their ids. The vector of strings is then converted to a SafeArray with make_safe. Unlike python, you don't actually return the SafeArray(since every COM method has to return an HRESULT). Instead, you store the values in a Variant pointer.


Here is the relevant excerpt from the IDL:
	
interface Iprocess : IDispatch
	{
		[id(1), helpstring("lists current processes")] HRESULT proclist([out, retval] VARIANT *plist);
		[id(2), helpstring("method procids")] HRESULT procids([out, retval] VARIANT *pids);
	}

Here is the source for the cpp file
#include "stdafx.h"
#include "Ntperf.h"
#include "process.h"


//MS stuff
#include "pdh.h"
#include "pdhmsg.h"

// fix problem with different versions of pdh.dll 
#undef PdhOpenQuery      //          PdhOpenQueryA
extern "C" long __stdcall
PdhOpenQuery (
			  IN      LPCSTR      szDataSource,
			  IN      DWORD       dwUserData,
			  IN      HQUERY      *phQuery
			  ); 

//STD C++ stuff
#pragma warning(disable : 4786) //get rid of stl warnings
#include <string>
#include <vector>
#include <map>
using namespace std;

/////////////////////////////////////////////////////////////////////////////
// Cprocess



HRESULT make_safe(vector<string>& v_list, VARIANT *plist) {

	HRESULT hr = S_OK;

	USES_CONVERSION;

	VariantInit(plist);	
	plist->vt = VT_ARRAY | VT_VARIANT; //set type of plist to variant array
     
	//now create the 1 dimensional safearray of variants
	LPSAFEARRAY psa;
	SAFEARRAYBOUND rgsabound[]  = { v_list.size(), 0 }; // size elements, 0-based
	psa = SafeArrayCreate(VT_VARIANT, 1, rgsabound);
        if (!psa) { return E_OUTOFMEMORY; }

	VARIANT * VarArray; 
    //Increment lock count and get pointer to the array data
	if (FAILED(hr = SafeArrayAccessData(psa,(void **) &VarArray ))) {
		return hr;
	}

	for (int i =0; i<v_list.size();i++) {
		VarArray[i].vt = VT_BSTR; 
		//convert ascii to olestr then bstr
		VarArray[i].bstrVal = SysAllocString(A2OLE(v_list[i].c_str()));
		if (!VarArray[i].bstrVal) {
			VariantClear(VarArray);
			return hr = E_OUTOFMEMORY;
		}

	}

		SafeArrayUnaccessData( psa ); 

        plist->parray = psa; //now set the array in plist to be the created array

	return S_OK;

}



HRESULT getprocid (map<string,int>& m_inst, vector<string> &v_ids) {
	

	USES_CONVERSION;
	PDH_STATUS  status   = 0;
	HQUERY      hQuery      = NULL;
	HCOUNTER    hCounter    = NULL;
	DWORD       dwType      = 0;

	map<string,int> m_idinst;

	std::string objname="Process";
	std::string counter="ID Process";	
	char *buffer;int junk,junk2;
	
	// initialize the query handle
	
	map<string, int>::iterator iter;
	for (iter=m_inst.begin();iter != m_inst.end();++iter) {
		
		for (int i=0;i<= iter->second;++i) {
			status = PdhOpenQuery( NULL, 0, &hQuery );
			if ( status != ERROR_SUCCESS )
				return status;
			
			TCHAR       szCounterPath[2048];
			DWORD       dwPathSize  = 2048;
			PDH_COUNTER_PATH_ELEMENTS pdh_elm;
			
			pdh_elm.szMachineName     = NULL;
			pdh_elm.szObjectName      = A2T(objname.c_str());
			pdh_elm.szInstanceName    = A2T(iter->first.c_str());
			pdh_elm.szParentInstance  = NULL;
			pdh_elm.dwInstanceIndex   = i;
			pdh_elm.szCounterName     = A2T(counter.c_str());
			
			status = PdhMakeCounterPath( &pdh_elm, szCounterPath, &dwPathSize, 0 );
			if ( status != ERROR_SUCCESS ) { return E_FAIL; }
			
			
			// Add the counter to the query
			//PdhAddCounter converts each counter path into a counter handle 
			status = PdhAddCounter( hQuery, szCounterPath, 0, &hCounter );
			if ( status != ERROR_SUCCESS ) { return E_FAIL; }
			
			//PdhCollectQueryData gets raw data for the counters
			status = PdhCollectQueryData(hQuery);
			
			if ( status != ERROR_SUCCESS ) { return E_FAIL; }
			
			//PdhGetFormattedCounterValue  formats counter values for display
			DWORD           dwFormat = PDH_FMT_DOUBLE; 
			PDH_FMT_COUNTERVALUE   fmtValue;
			status = PdhGetFormattedCounterValue (hCounter,
				dwFormat, 
				(LPDWORD)NULL,
				&fmtValue);
			
			
			if (status == ERROR_SUCCESS) {
				buffer=_fcvt( fmtValue.doubleValue, 0, &junk,&junk2 );
				string id=buffer;
				v_ids.push_back(iter->first+'\t'+id);				
			} 
			
		}		
		status = PdhCloseQuery (hQuery);		
		//PdhCloseQuery closes the query handle and it's counters 
	}
	
	
	return S_OK;
}


HRESULT getinst (map<string,int>& m_inst) {
	
	map<string, int>::iterator iter;
	USES_CONVERSION;
	LPTSTR      szCountersBuf     = NULL;
    DWORD       dwCountersSize       = 0;
    LPTSTR      szInstancesBuf    = NULL;
    DWORD       dwInstancesSize      = 0;
    LPTSTR      szTemp          = NULL;
	
	PDH_STATUS status;
	
	std::string str_obj="Process";
	
	status = PdhEnumObjectItems(
		NULL,
		NULL,
		A2CT(str_obj.c_str()),
		NULL,
		&dwCountersSize,
		szInstancesBuf,
		&dwInstancesSize,
		PERF_DETAIL_WIZARD,
		0 );
	
	if ( ERROR_SUCCESS != status )
		return E_FAIL;
	
	
	if (dwCountersSize) {
		szCountersBuf = (LPTSTR)malloc (dwCountersSize * sizeof (TCHAR));
		if (szCountersBuf==NULL) {
			return E_FAIL;
		}
	} else
		szCountersBuf=NULL;
	
	if (dwInstancesSize) {
		szInstancesBuf = (LPTSTR)malloc (dwInstancesSize * sizeof (TCHAR));
		if (szInstancesBuf==NULL) {
			free(szCountersBuf);
			return E_FAIL;
		}
	} else
		szInstancesBuf = NULL;
	
	status = PdhEnumObjectItems(
		NULL,                   
		NULL,                   
		A2CT(str_obj.c_str()),  
		szCountersBuf,    
		&dwCountersSize,  
		szInstancesBuf,   
		&dwInstancesSize, 
		PERF_DETAIL_WIZARD
		0); 
	
	if ( ERROR_SUCCESS != status )		
		return E_FAIL;
	
	//it's a series of contingous NULL terminated strings, ending w/zero length string
	
	
	if (szInstancesBuf){
		for (szTemp = szInstancesBuf;*szTemp != 0;szTemp += lstrlen(szTemp) + 1) {
			m_inst[T2A(szTemp)]++; //increment instance counter
			//default value is zero for arith element
		}
	}
	
		
	
	return S_OK;		
	
}



STDMETHODIMP Cprocess::proclist(VARIANT *plist)
{

	if (!plist) { return E_INVALIDARG;}

	HRESULT hr = NOERROR;

	vector<string> test(50, "hello");

	map<string,int> m_inst;
	map<string, int>::iterator iter;

	hr=getinst(m_inst);
	if FAILED(hr) {return hr;}
	
	vector<string> v_inst;
	for (iter=m_inst.begin();iter != m_inst.end();++iter) {
		//go through index of processes
		for(int i=0;i<iter->second;i++){
			//put onto vector multiple procs w/same name
			v_inst.push_back(iter->first); 
		}
	}
	
	//send a vector of strings  and a variant to make_safe
	hr = make_safe(v_inst, plist);

	//getprocid(m_inst);


	return hr;

}

STDMETHODIMP Cprocess::procids(VARIANT *pids)
{
	// TODO: Add your implementation code here
	if (!pids) { return E_INVALIDARG;}

	HRESULT hr = NOERROR;

	map<string,int> m_inst;
	map<string, int>::iterator iter;

	hr=getinst(m_inst);
	if FAILED(hr) {return hr;}

	vector<string> v_ids;	
	getprocid(m_inst,v_ids);
		
	//send a vector of strings  and a variant to make_safe
	hr = make_safe(v_ids, pids);

	//getprocid(m_inst);

	return hr;


}

In Conclusion

That was a quick tour of Python and C++ in the win32 and COM world. Both languages have their strengths and weaknesses. With C++ you have ultimate granularity and power. It obviously comes at a cost of more details to keep track of. Python's strength is rich productivity. It is fast to write the win32 and COM sever code, yet still have a sophisticated language at your disposal. You lose some of the flexibility of C++, which often does not matter. And, when it does, python can help you understand how to solve the problem, before wading into the details.

Further Info

Author

John Nielsen, jn@who.net
-- Have a great time with programming with python!

 

   Getting process info: Win32 and COM with python and C++