Tuesday, March 27, 2012

Advice Needed on Large Scale DataBases

We are in the process of developing a web indexing project similar to
http://www.grub.org/ . We will have a database that is going to contain
about 5 billion records. Currently I am split between MS SQL and My SQL for
this project. Some of the issues we are going to have is storage size. Based
on our math, our data base will be about 100 GB in size, is there a file
size limit with MS SQL (I heard some where there is a 2GB limit or was that
2 TB limit). A better design is a split data base but for now we would like
to use a single db.
Jay
Good luck with MySQL and 5 billion rows<g>. SQL Server does not have a
limit that you suggested except in it's free version (MSDE). THere are many
SQL Servers out there with 2TB or more of data. Even though 100GB is not
that large these days for SQL Server I can't stress enough that you do your
homework first and ensure you have a properly designed db schema and
application.
Andrew J. Kelly SQL MVP
"Jay Janarthanan" <jay@.objectcube.com> wrote in message
news:e8N7AKl2EHA.1452@.TK2MSFTNGP11.phx.gbl...
> We are in the process of developing a web indexing project similar to
> http://www.grub.org/ . We will have a database that is going to contain
> about 5 billion records. Currently I am split between MS SQL and My SQL
> for this project. Some of the issues we are going to have is storage size.
> Based on our math, our data base will be about 100 GB in size, is there a
> file size limit with MS SQL (I heard some where there is a 2GB limit or
> was that 2 TB limit). A better design is a split data base but for now we
> would like to use a single db.
> Jay
>
|||Thanks Andrew. There is only one Table that is going to contain the 5B rows,
the rest of the tables are small in size.
I am more concerned about the HW requirements.
Jay
"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:OwmvZXl2EHA.924@.TK2MSFTNGP14.phx.gbl...
> Good luck with MySQL and 5 billion rows<g>. SQL Server does not have a
> limit that you suggested except in it's free version (MSDE). THere are
> many SQL Servers out there with 2TB or more of data. Even though 100GB is
> not that large these days for SQL Server I can't stress enough that you do
> your homework first and ensure you have a properly designed db schema and
> application.
> --
> Andrew J. Kelly SQL MVP
>
> "Jay Janarthanan" <jay@.objectcube.com> wrote in message
> news:e8N7AKl2EHA.1452@.TK2MSFTNGP11.phx.gbl...
>
|||Jay Janarthanan wrote:[vbcol=seagreen]
> Thanks Andrew. There is only one Table that is going to contain the
> 5B rows, the rest of the tables are small in size.
> I am more concerned about the HW requirements.
> Jay
>
>
> "Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
> news:OwmvZXl2EHA.924@.TK2MSFTNGP14.phx.gbl...
Hardware is easy. Lots of CPU, lots of fast drives, and lots of
memory...
If you put that large table on its own RAID 10 array if it gets writtent
to a lot, or a fast RAID 5 array if it is mostly read-only (batches at
night and small updates during the day are fine), you shold be ok. Use a
lot of drives in the array for best performance and put that array on
its own channel.
Regarding the hardware, you'll need to do some homework about the size
of the table and how frequently it gets accessed to determine if you'll
need the improved memory allowance of SQL Server 2000 EE. If you mostly
accessing single rows and have the necessary indexing in place, you may
find that memory is not a big concern and a fast 2-cpu system may
suffice. 100BG is not that large for SQL Server as Andrew mentioned. But
given that large table table, you probably want to make sure it's
isolated from the other data on its own drive subsystem. Drive are
cheap, though. You could throw 10 fast drives in a RAID 5 array for a
couple thousand and then use that array for backing up if the database
was idle at night. RAID 10 (1+0) is a little more expensive in terms of
drives, but has the best write and read performance. RAID 5 is the worst
for writing, but fast at reading, and the cheapest redundant solution.
David Gugick
Imceda Software
www.imceda.com
|||"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:OwmvZXl2EHA.924@.TK2MSFTNGP14.phx.gbl...
> Good luck with MySQL and 5 billion rows<g>.
Well, to fair, the largest MySQL db I've heard of has 10 billion rows...

>
|||You will want to isolate that table and its non-clustered indexes across as
many physical disk spindles as possible. Also, consider creating two
seperate File Groups, one for the Clustered Index and one for the
non-clustered indexes. In each File Group, create as many physical files as
you have CPUs and independent disks. If you have enough disk counts, keep
the files from the Clustered Index seperated from the files of the
non-clustered indexes, but certainly use multiple files per file group, even
if you are limited by number of physical disks.
Also, create at least 2 log files of equal size and absolutely place these
on dedicated disks.
As far as capacity is concerned, the database size must be not be greater
than 1,048,515 TB, number of File Groups must not exceed 256 per database,
files per databse 32,767, data and log file sizes can not be larger than 32
TB each.
I do not think you will have a size problem; however, the row count will be
sufficient enough to warrant thoughtful database design, file layout, and
indexing strategy. You will also need to consider an appropriate Candidate
Key for the Cluster Index, absolutely DO NOT SET any surrogate Primary Key
as the clustered index, as is the default. This is usually the worst
possible candidate only superseded by not having a clustered index defined
at all.
The 2 GB limitation is for MSDE only.
Sincerely,
Anthony Thomas

"Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
news:2aQsd.37569$1u.35532@.twister.nyroc.rr.com...
"Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
news:OwmvZXl2EHA.924@.TK2MSFTNGP14.phx.gbl...
> Good luck with MySQL and 5 billion rows<g>.
Well, to fair, the largest MySQL db I've heard of has 10 billion rows...

>
|||Anthony,
I find your advice about clustered index and placement interesting, can you
point me to any good articles which covers this topic or even a book?
Thanks,
Saptagiri Tangudu
PNT Marketing Services
"AnthonyThomas" wrote:

> You will want to isolate that table and its non-clustered indexes across as
> many physical disk spindles as possible. Also, consider creating two
> seperate File Groups, one for the Clustered Index and one for the
> non-clustered indexes. In each File Group, create as many physical files as
> you have CPUs and independent disks. If you have enough disk counts, keep
> the files from the Clustered Index seperated from the files of the
> non-clustered indexes, but certainly use multiple files per file group, even
> if you are limited by number of physical disks.
> Also, create at least 2 log files of equal size and absolutely place these
> on dedicated disks.
> As far as capacity is concerned, the database size must be not be greater
> than 1,048,515 TB, number of File Groups must not exceed 256 per database,
> files per databse 32,767, data and log file sizes can not be larger than 32
> TB each.
> I do not think you will have a size problem; however, the row count will be
> sufficient enough to warrant thoughtful database design, file layout, and
> indexing strategy. You will also need to consider an appropriate Candidate
> Key for the Cluster Index, absolutely DO NOT SET any surrogate Primary Key
> as the clustered index, as is the default. This is usually the worst
> possible candidate only superseded by not having a clustered index defined
> at all.
> The 2 GB limitation is for MSDE only.
> Sincerely,
>
> Anthony Thomas
>
> --
> "Greg D. Moore (Strider)" <mooregr_deleteth1s@.greenms.com> wrote in message
> news:2aQsd.37569$1u.35532@.twister.nyroc.rr.com...
> "Andrew J. Kelly" <sqlmvpnooospam@.shadhawk.com> wrote in message
> news:OwmvZXl2EHA.924@.TK2MSFTNGP14.phx.gbl...
>
> Well, to fair, the largest MySQL db I've heard of has 10 billion rows...
>
>

No comments:

Post a Comment