Thursday, March 29, 2012

Advice on Partitioning and Processing

I'd appreciate some advice on the following:

I'm building a cube which is based on data in a SQL 2000 DB. There are around 1.2 billion rows of fact data in 26 tables. Each table uses approx 6 Gig. There are 4 Dimensions, 3 dimensions have less than a 1000 members with the 4th having 65000. For convenience I am creating 26 Cube partitions aligned to the DB tables, each partition will have around 50 million rows. Does this sound like a reasonable proposal? The reason I ask is that I am seeing very poor performance whilst calling ProcessData. I have processed various numbers of partitions in parallel and cannot achieve the same times that I can get on the existing AS2000 rig using the Parallel Processing Utility.

I have tested the data read by writing a .net application to execute the same DB queries in parallel that the partitions use (achieving 200 million rows in 20 minutes) and there are no issues related to network, disk or the DB server.

Processing seems to start well but around a minute in the Network utilisation drops down to 10 percent and CPU utilisation to 30%. It then goes back up to 100% for a short period and then drops again. This is repeated continually until processing is finished. At current speeds it will take 26 hours to complete as opposed to 10 hours on AS2000. The box is a 64-bit 4way hyperthreaded with 32 GB of RAM. AS2005 is Enterprise with SP1.

Any suggestions would be appreciated as I'm running out of ideas!

Rob.

You should check yourself against best practices you would find the project REAL implemetation http://www.microsoft.com/sql/solutions/bi/projectreal.mspx

I would say the partition size is a bit big. You should try and partition some more for and split a single partition in two. But that doesnt look like a major problem.

Question for you: in AS2005 you processing partitions in parallel. How many? Is the pattern the same for any number of partitions?

What OLEDB provider you are using to read data from SQL Server?


The reading rates looks to be slow. See where the bottleneck is. In your test application, the fair comparison not only when you execute SQL query, but you also retrevie all the data returned by the query.

Edward Melomed.
--
This posting is provided "AS IS" with no warranties, and confers no rights.

No comments:

Post a Comment