Thursday, May 17, 2018

Digging through a lot of files

I have a scenario where I need to scan a lot of xmlfiles from elmah-logs in the size of 300 000 files.

Powershell was a fun project for this but couldn't really do the job due to performance issues.
When reading 300 000 files using powershell like example below, the script ran for 368 minutes.


I tried running parallell jobs and using dotnet to read files instead but nothing could complete with using MS Logparser.
So enter Log Parser Studio.
This neat tool managed to comb through 300 000 files in 47 minutes instead!
It is a bit tricky to formulate the queries however. Heres an example of getting elmah logs where a variable named HTTP_REFERER contains a key value

select * FROM '[LOGFILEPATH]' where  string like '%http://www.mycompany.com/subsite%' and name like 'HTTP_REFERER'

So in conclusion for same set of 300 000 files
Powershell took 368 minutes
Log Parser Studio took 47 minutes

References:

No comments:

Powershell and Uptimerobot

Uptimerobot can be quite tedious when you need to update many monitors at once. For example say you bought the license for Uptimerobot and n...