[Infrastructures] Coda, to do or not to do?
Steve Traugott
stevegt@TerraLuna.Org
Sat, 19 Mar 2005 01:15:52 -0800
--RASg3xLB4tUQ4RcS
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Fri, Mar 18, 2005 at 02:37:10PM -0800, Jim Carroll wrote:
> > Just to make it interesting; assume outages can cost as much=20
> > as $30,000 per seat per minute, so your career and even the=20
> > company could depend on it; you have 6 months from today=20
> > before the floor needs to be live, and the machines aren't=20
> > purchased yet, so you'll actually only have around
> > 2-3 months to set it all up, test, debug, fix, deal with=20
> > application compatibility, etc. You also have to deal with=20
> > the normal politics of introducing new technology, which=20
> > means every little problem is magnified beyond proportion...
>=20
> I take it this is a real-world example of AFS in use...?
This example is just an average amalgam drawn from my own experience,
regardless of filesystem. I wanted to give Ivan an idea of what some of
us mean by "production", and see where he would rate Coda. For AFS and
OpenAFS, this scenario has been played out over and over, by a bunch of
people -- I'm late to the game because I was being stubborn and waiting
for the single-vendor Transarc model to die, and for things like the 2G
filesize limit to get fixed (OpenAFS 1.3.X).
Something else that's painfully obvious to me is that, even if an AFS
rollout goes well, training can decide whether the deployment sticks.
During or after rollout, you need to think about building and
maintaining an ongoing training program; a few days' orientation for
admins, maybe a day for developers, less for users. =20
The goal is to get people bootstrapped to the point where they can be
more productive than they would have been with NFS, and then use more
conventional tools like a mailing list and Wiki to keep growing them
after that. It's not just that you need to let people know how to deal
with Kerberos, AFS, and ACLs; you also need to let them know that it's
there, it's everywhere, it's reliable, and it's the corporate data
store, so they can undo bad habits like e-mailing files around or
keeping things on local disk. =20
A global filesystem with cache consistency also makes things like data
synchronization and low-volume message-passing ridiculously easy for
application developers, and they need to know that it's available to
them.
> I have a genuine question or two (not playing devil's advocate here), part
> of which you seem to have answered already.
>=20
> Assuming you were to start with 10TB+ worth of data files (not DB files) =
out
> of the gate and want to grow the environment over 1-2 years to possibly
> 100TB, would AFS fit the bill? Are there any limitations (or caveats) to
> the directory structure, or could it easily handle tens of thousands of
> directories, each one containing thousands of files?
The max number of files per directory is 64k, but might be less -- see
http://lists.openafs.org/pipermail/openafs-info/2002-September/005812.html.
Since this is still a network file system, the practical limit is lower,
depending on usage patterns -- an 'ls -l' in a directory with thousands
of files will take a long time to return the first time; after the
inode entries are cached on local disk, it will be near-local speeds. =20
AFS scales well. At worst, you might find yourself creating multiple
cells which know about each other, depending on your DR/BCP plan and/or
internal politics. I know one highly mission-critical enterprise which
has over 50 TB of space online, with over a hundred servers in dozens of
cells globally, serving tens of thousands of users, including home
directories. =20
There is no limit to total storage. Think of it like this -- right now
(with an AFS client which can see the Internet) you have access to maybe
a petabyte or more of data in /afs/*, just from the 150+ organizations
which expose their fileservers to the Internet. Each of these cells is
running their own Kerberos servers, maintaining their own volumes and
ACLs, but otherwise it's just one big global filesystem; you can just
'cd' into a surprising number of people's home directories, for
instance; in most cases these orgs and users have chosen to leave their
ACLs open for the same reason you'd publish things on a web server. The
difference in this case is that, if I wanted you to work on a project
with me, I could give you a Kerberos ID (or enable cross-realm
authentication) and open up ACLs permitting you write access as well.
Regardless of whether your organization participates in the global /afs
namespace or not, you would organize multiple cells internally the same
way -- everything in one big directory tree under /afs or /yourcorp or
whatever, which looks the same from any client in any cell. See
Campbell's AFS book for more examples.
Steve
--=20
Stephen G. Traugott (KG6HDQ)
UNIX/Linux Infrastructure Architect, TerraLuna LLC
stevegt@TerraLuna.Org=20
http://www.stevegt.com -- http://Infrastructures.Org
--RASg3xLB4tUQ4RcS
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQFCO+3I8rKIxO1Fc9MRAsIFAJ0Q0/f5AXbpdbYcyurP0Je6l4OKgACfT0nY
8Y4s7O+XTE9LxzfHQ/XNGzQ=
=iwWk
-----END PGP SIGNATURE-----
--RASg3xLB4tUQ4RcS--