Thursday, May 17, 2018

Digging through a lot of files

I have a scenario where I need to scan a lot of xmlfiles from elmah-logs in the size of 300 000 files.

Powershell was a fun project for this but couldn't really do the job due to performance issues.
When reading 300 000 files using powershell like example below, the script ran for 368 minutes.
$scriptPath = $(split-path -Parent $myinvocation.MyCommand.Definition)
$path = "C:\elmahFiles"
$savedObject = "$scriptPath\savedObjectLatestWeeks.csv"
$startTime = get-date
write-output "script started at $startTime"
$counter = 0
$collection = get-childitem -path $path
$object = @()
$endtime = get-date
write-output "Found $($collection.count) files. Time before starting foreach $(($endtime-$starttime).totalseconds) seconds"
foreach ($item in $collection) {
if ([math]::IEEERemainder($counter,1000) -eq 0) {write-output "Iteration $($counter)"}
[xml]$tempxml = get-content $item.VersionInfo.filename
Remove-Variable tempAllHttp,temphttpreferer,temphttpuseragent,tempscriptname,temphost,tempip -ErrorAction SilentlyContinue
$tempALLHTTP = ($tempxml.error.serverVariables.ChildNodes |where-object {$_.name -like "ALL_HTTP"}).value.string
$tempHttpReferer = ($tempxml.error.serverVariables.ChildNodes |where-object {$_.name -like "HTTP_REFERER"}).value.string
$tempHttpUserAgent = ($tempxml.error.serverVariables.ChildNodes |where-object {$_.name -like "HTTP_USER_AGENT"}).value.string
$tempScriptName = ($tempxml.error.serverVariables.ChildNodes |where-object {$_.name -like "SCRIPT_NAME"}).value.string
$tempHost = $tempxml.error.host
$tempIP =($tempxml.error.serverVariables.ChildNodes |where-object {$_.name -like "HTTP_X_FORWARDED_FOR"}).value.string
$tempObj = new-object PSObject -Property @{message=$tempxml.error.message; Time=$tempxml.error.time;Filepath=$($item.VersionInfo.filename);details=$tempxml.error.detail;AllHttp=$tempALLHTTP;HttpReferer=$tempHttpReferer;UserAgent=$tempHttpUserAgent;ScriptName=$tempScriptName;Host=$tempHost;origin=$tempip}
$object += $tempobj
$counter++
}
$endtime = get-date
write-output "Done collecting in $(($endtime-$starttime).totalseconds) seconds ($(($endtime-$starttime).totalminutes) minutes)"
$object |Export-Csv -Path $savedObject -Encoding UTF8


I tried running parallell jobs and using dotnet to read files instead but nothing could complete with using MS Logparser.
So enter Log Parser Studio.
This neat tool managed to comb through 300 000 files in 47 minutes instead!
It is a bit tricky to formulate the queries however. Heres an example of getting elmah logs where a variable named HTTP_REFERER contains a key value

select * FROM '[LOGFILEPATH]' where  string like '%http://www.mycompany.com/subsite%' and name like 'HTTP_REFERER'

So in conclusion for same set of 300 000 files
Powershell took 368 minutes
Log Parser Studio took 47 minutes

References:

Thursday, May 03, 2018

Using hashtables to combine values

For a job I needed a good way of combining a primary value with optional subvalues in a xml-file.
Below is an example of how to extract attributes from a xmlelement into a hashtable and then join two hashtables into one. I couldn't find a method for extracting elements on the web, folks usually goes with subnodes instead of attributes. I'm partial to attributes and had to get creative, but the solution was quite simple. The Attributes from the element only show the actual elements from the xmlfile. Its not obvious when browsing the object that -name property can be lifted, but here it's used for creating a new hashtable-row



Samplescript for proof of concept
$ScriptPath = $(split-path -Parent $myinvocation.MyCommand.Definition)
Import-Module "$($ScriptPath)\Tools.psm1" -force
[xml]$xml = get-content -path "$($scriptPath)\websiteNodes.xml"
$nodes = $xml.SelectNodes("//webSites/webSite")
$collection = @()
foreach ($node in $nodes) {
$propertyDefault = @{url="test";name="test2";else="else"}
$propertyTemp = convert-xmlAttribToHash -xmlElement $node
$newProperty = Join-Hashtable -master $propertyTemp -child $propertyDefault
$object = new-object PSObject -Property $newProperty
$collection += $object
}
$collection[0]
Function Join-Hashtable {
[cmdletbinding()]
Param (
[hashtable]$Master,
[hashtable]$Child
)
#create clones of hashtables so originals are not modified
$Primary = $master.Clone()
$Secondary = $child.Clone()
#check for any duplicate keys
$duplicates = $Primary.keys | where {$Secondary.ContainsKey($_)}
if ($duplicates) {
foreach ($item in $duplicates) {
$Secondary.Remove($item)
}
}
#join the two hash tables
$result = $Primary+$Secondary
return $result
} #end Join-Hashtable
function convert-xmlAttribToHash {
param(
$xmlElement
)
$hashTable = @{}
foreach ($attrib in $xmlElement.Attributes) {
$hashTable.$($attrib.name) = $attrib.value
}
return $hashTable
}
view raw Tools.psm1 hosted with ❤ by GitHub
<webSites rootUrl="https://portal.mycompany.com/subsites/mastersite" defaultLanguage="1053" useParentTopNav="true" defaultTemplate="STS#0">
<webSite name="MerSupport" url="/support/mersupport" language="1033" />
<webSite name="Support" url="/support" language="1033" />
<webSite name="Support3" url="/support3" />
</webSites>


This eventually led to createwebsFromStructure.ps1 that uses powershell splatting to build commands with the resulting hashtable.

Add-PSSnapin microsoft.sharepoint.powershell
$scriptPath = $(split-path -Parent $MyInvocation.MyCommand.Definition)
Import-Module "$($scriptPath)\Tools.psm1"
$xmlStructureName = "webSiteNodes.xml"
$xmlStructurePath = Join-Path -Path $scriptPath -ChildPath $xmlStructureName
[xml]$xmlStructure = Get-Content -Path $xmlStructurePath
$root = $xmlStructure.SelectSingleNode("//webSites")
$rootUrl = $xmlStructure.webSites.rootUrl
$allWebSites = $xmlStructure.SelectNodes("//webSites/webSite")
#sort for order of handling first
foreach ($allWebSite in $allWebSites) {
$depth = $($allWebSite.url.Split("/").count)-1
$allWebSite |Add-Member -MemberType NoteProperty "UrlDepth" -Value $depth -Force
}
$rootProperty = @{language=$root.defaultLanguage;useParentTopNav=$root.useParentTopNav;Template=$root.defaultTemplate}
#create websites
foreach ($website in ($allWebSites|sort-object urldepth)) {
$fullUrl = "$($rootUrl)$($website.url)"
$checkExists = get-spweb $fullUrl -ErrorAction SilentlyContinue
if ($checkExists -eq $null) {
write-output "Creating site at $fullUrl"
#join parameters with defaults
$propertyTemp = @{}
foreach ($attrib in $website.Attributes) {
$propertyTemp.$($attrib.name) = $attrib.value
}
$propertyTemp.url = $propertyTemp.url.Insert(0,$rootUrl)
$newProperty = Join-Hashtable -master $propertyTemp -Child $rootProperty
if ($newProperty.useParentTopNav -eq 'true') {
write-output "Using topparentnav"
$newProperty.remove('UseParentTopNav')
New-SPWeb @newProperty -UseParentTopNav
}
else {
write-output "Not using topparentNav"
$newProperty.remove('UseParentTopNav')
New-SPWeb @newProperty
}
}
else { write-output "Site already exists. Skipping creation..." }
}


References:
https://powershell.org/2013/01/23/join-powershell-hash-tables/ - source for join-hashtable function
https://technet.microsoft.com/en-us/library/gg675931.aspx - source for details on splatting

Powershell and Uptimerobot

Uptimerobot can be quite tedious when you need to update many monitors at once. For example say you bought the license for Uptimerobot and n...