home learn tableau about
SDLC VS 2010 Coded UI Test Coded UI Test and Fitnesse MSBuild 4.0 MSBuild and IronPython MSBuild and IronPython - TFS checkins MSBuild and IronPython - Custom SQL Data

previous next

IronPython Class Factory for MSBuild 4.0 Part 4

Example 4: PyProduceHashManifest task, make a more concise version of the C# ProduceHashManifest Inline Task from an earlier article

I had a previous article on MSBuild 4.0 that discussed a custom task named ProduceHashManifest in order to illustrate some of the new features in Visual Studio/Team Foundation Server 2010. The idea was to generate an MD5 hash value based on the contents of each file produced by Team Build, and store the results in per-build hashManifest.txt file. That ProduceHashManifest task used something like 65 lines of C# code, written inline within an MSBuild <Task> element. Let's try to make it shorter in Python:

  <!-- Task: PyProduceHashManifest, use Python to write manifest of MD5 files to log -->
  <UsingTask TaskName="PyProduceHashManifest"  TaskFactory="PythonClassFactory" 
        AssemblyFile="..\_compiled\PythonClassFactory.dll" >
      <inFilePath ParameterType="System.String" Required="true" />
      <outHash Output="true" />
import sys
import hashlib
outHash = hashlib.md5((open(inFilePath, 'r').read())).hexdigest()

    <src Include="$(OutputDirectory)\*" />
  <Target Name="PyTarget4" Outputs="%(src.FullPath)">
    <PyProduceHashManifest inFilePath="@(src)">
      <Output PropertyName="outHash" TaskParameter="outHash"  />
    <Message Text="FILEPATH: @(src)  MD5: $(outHash)" Importance="High"/>

Back to three main elements but now <ItemGroup> has replaced <PropertyGroup>:

1) UsingTask element

One input parameter, representing the file to be hashed, and one output, holding the hash value. The code here uses a slightly different way of accessing the Python library files. In the previous example all of the Python library files from IronPython 2.6, everything in \Lib, needed to be in an expected location before the build took place. The files could have been deployed as part of the build but as a whole they are large enough to a) slow down the build, and b) take up significant space on the build server - one copy of that folder per build. For PyProduceHashManifest I pulled out the .py files containing the libraries I needed, tested to make sure they in turn didn't depend on additional .py files, and checked into TFS. Specifically, hashlib.py and md5.py were placed in a TFS project directory named partialPythonLib, located under _targets (making it a sibling to the PythonTargets.targets file holding all of the MSBuild implementation):

TFS structure for partialPythonLib

With that in place the hashlib library can be imported at build runtime and then a single line of code used to open a target file, read it, and compute the MD5 value for the file's content.

2) ItemGroup element

If an MSBuild PropertyGroup can be thought of as a single-value variable, an ItemGroup can be considered a single-dimensional array of values, the most common use of which is hold a list of files. When an ItemGroup holds files, certain metadata on those files may be accessed within the MSBuild environment, which happens in the Target element below. The file list being assigned to the 'src' ItemGroup here includes everything in the output directory, the path for which is passed in to the .targets file via the OutputDirectory property. Specifically, the MSBuild_PythonTargets workflow Activity used its CommandLineArguments property to pass the value of 'outputDirectory' (which is defined in the workflow as being more or less equal to the local build directory + "\Binaries") on to the .targets file.

3) Target element

Activation of PyTarget4 is a little trickly, where use of the Outputs attribute to Transform the files in 'src' leads to Target batching (see How To: Batch Targets with Item Metadata for further information). The end result is that the contents of PyTarget4 are processed multiple times - once for every file in the src ItemGroup, meaning once for every file in the build's output directory. And the value assigned to the inFilePath parameter and passed on to PyProduceHashManifest for a particular iteration is the file path of the 'current' file. A Message task is included in the target, to log the individual file paths and matching MD5 hash values.

Kick off one last build and take note of everything demarked by a PyTarget4 header:

   FILEPATH: C:\Builds\1\AllInOne\PythonBuildDef1\Binaries\CSEFModelFirst.exe  
        MD5: a02064d9ffbe3efdf3d3f64c88e3efa5
   FILEPATH: C:\Builds\1\AllInOne\PythonBuildDef1\Binaries\CSEFModelFirst.exe.config  
        MD5: caf5819d624471bd244161f425820ed0
   FILEPATH: C:\Builds\1\AllInOne\PythonBuildDef1\Binaries\CSEFModelFirst.pdb  
        MD5: 1f177e02f7763e21061754b918c1bee5

So four lines of Python vs. almost 70 in C#. Which is not a fair comparison because 1) I made no efforts to concisify the C# code in the first place, 2) I did make an effort to make the Python code as short as possible, where I would normally have done the file-open-and-hash on multiple lines in order to increase readability (+ added a file.Close), and 3) I cheated by writing the file paths and hash values to the log file instead of a standalone .txt as in the original C# ProduceHashManifest.

Probably could have gotten the Python script here down to two lines by adding hashlib.py and md5.py to TFS in the _targets directory, alongside the .targets file, and getting rid of the two 'sys' lines. A much more theoretical solution would have been to put the two .py files in a folder specifically named 'Lib', which I believe the IronPython executing process looks for automatically. And again, theoretically, that would have meant putting the Lib folder in /CustomTasks/_compiled, where my IronPython.dll lives. What I saw instead was that the 'process' in this scenario appears to be msbuild.exe and the the Lib folder would instead need to be in C:\Windows\Microsoft.NET\Framework\v4.0.30128\, not a good idea.

divider element

Further ideas

I haven't actually run any related tests but I don't see any reason why multiple Task Factories could not be accessed within the same MSBuild file. No doubt there are many examples in which certain steps, as part of a larger goal, could be done most efficiently (or perhaps be done period) in C# while others would be better written in IronPython.

The PythonClassFactory currently uses a call to CreateScriptSourceFromString on the IronPython ScriptEngine but could presumably be modified to have the option to call CreateScriptSourceFromFile instead. Then to follow the structure implemented by Microsoft's original CodeTaskFactory, a Code element would store the location of the .py file in a Source attribute. The presence of that attribute would indicate CreateScriptSourceFromFile should be used within the Task Factory instead of default FromString.

See also: Custom task for reporting on recent check-ins to TFS


previous next