|
WebApp Sec
mailing list archives
Re: Controlling access to pdf/doc files (db "better" than filesystem?)
From: Ido Rosen <ido () cs uchicago edu>
Date: Sat, 28 Feb 2004 14:54:57 -0600
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On Sat, 28 Feb 2004 11:13:21 -0800
"David Wall @ Yozons, Inc." <dwall () yozons com> wrote:
that in SQL Server is that all data in SQL Server is split over ~8k
pages. When you add a BLOB it needs to be split into 8k chunks. When you
But filesystems also store data into pages, often much smaller than 8k
chunk.
I agree that storing files with their metadata for such a solution in a database is a better solution than storing
files. It's also probably more secure, since the web developer is less likely to botch some permissions, security, or
sanity checks and since most database systems already have some sanity checks built in. Your reasoning in that last
sentence is a bit off, though: Database systems (such as MySQL, PgSQL, ThinkSQL, and MSSQL) all must use the
filesystem, so their 8k chunks may not match, and the storage may be out of phase. This is just a result of overlaying
one file storage paradigm over another, and shouldn't cause too much trouble speed-wise. By adding a layer on top of
the filesystem, you do increase the likelihood of inefficiency.
That said, there's a counterargument: Databases, or at least smart ones, are built to cache data efficiently into
memory. If your database server has enough memory, it may even become faster than serving the file off of the
filesystem directly. The reasoning for this is that the filesystem cache (if there is any at all) also includes shared
libraries and other files which are currently executing, given priority over any sort of data caching. This cache is
also limited in space, in most implementations, so as not to take too much precious RAM. Databases, however, are
generally built with the assumption that if you are using a database server for anything that could use significant
caching, or for major resource-intensive tasks (like serving hundreds of thousands of users), then the database server
will be the prime service of the machine, and therefore may take up significant amounts of resources (specifically,
cache more stuff into memory). So, in some situations I'd ima
gine database file storage would in fact be _faster_ for retrieval than filesystem storage. This is based on too many
assumptions regarding the database server's design and the operating system underlying the database server, and the
server machine being used, and so I don't give it much credit.
Then again, I may be wrong...
Our Signed & Secured application stores all files as BLOBs in a database for
all of transactional and backup capabilities, but we've never run tests of
100+ concurrent web users downloading files to see if the database or the
filesystem would be faster. In general, faster was less important to us
being able to support lots of concurrent requests because the speed of
retrieval from the db was always assumed to be faster than it could be
streamed back across typically slower Internet links. After all, the data
has to be sent back to a user's web browser, so the speed of the transfer is
limited by the slowest link between the browser and the web server.
This is the right attitude. Speed where it is useful, administrative efficiency whenever possible.
Ido
David
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFAQQAhmhQsAkXAJP0RAtsIAJ0YEU2nqXhbrrEEbjuJ6ENNPnBuGwCgo1gS
z2SccYIaCJwsvmk2bnpgZmw=
=0tLv
-----END PGP SIGNATURE-----
By Date
By Thread
Current thread:
- RE: Controlling access to pdf/doc files, (continued)
RE: Controlling access to pdf/doc files Zuech, Richard (Feb 26)
RE: Controlling access to pdf/doc files Mark Mcdonald (Feb 26)
RE: Controlling access to pdf/doc files Harper.Matthew (Feb 26)
RE: Controlling access to pdf/doc files Noah Gray (Feb 26)
Re: Controlling access to pdf/doc files siput (Feb 28)
|