OpenSolaris Storage Developer Wish List

From Genunix

This list was initially compiled from postings to storage-discuss@opensolaris dot org after a request from Tom Haynes. Please contribute to this list or pick something on the list to start working on.

Contents

ZFS

Item Description Requester Notes Date
ZFS prioritized writes The ability to have ZFS support prioritized writes such that they hit the fastest disks first, then are later re-written to slower disks "Mike Gerdts" <mgerdts@gmail.com> The intent is that then a device like Thumper could use a couple solid state disks as a write cache to speed up small file or attribute intensive NFS I/O. Is it sufficient to just put the ZIL on the SSDs?

The goal here is to minimize the time required for bits (avoiding the term data to not confuse data vs. metadata) to be committed to stable storage, thereby speeding up operations that are particularly picky about being sure that writes are committed before moving on. Small file and attribute intensive workloads, particularly over NFS, are key examples of workloads that this targets.

Primary intent for requesting this seems to be addressed by PSARC 2007/171 in build 68.

5/10/07
ZFS quotas that don't charge for snapshots If the admin of a storage server keeps snapshots for backup, compliance, or some other reason, in most cases I don't want this to impact the end user of the storage "Mike Gerdts" <mgerdts@gmail.com> Seems to be addressed by PSARC 2007/555 in build 78 5/10/07
NDMP enabled "zfs send" This seems like the most promising way to do block-level backups of ZFS in a way that cleanly integrates with enterprise backup products. Every serious backup vendor around supports NDMP for file server appliance backups. "Mike Gerdts" <mgerdts@gmail.com> Looks like this one is on the way: PSARC 2007/397, not integrated as of build 78. 5/10/07
It would be nice to be able to defragment ZFS file systems to relayout blocks in an optimal fashion. Matty <matty91@gmail.com> 5/10/07
Prioritized resilvering The ZFS resilvering process tends to be an all-out, as fast as possible process. This can in turn impact the speed at which production data is accessed and can slow the applications using ZFS as they wait for data. It would be useful for one to still do a resilver, but in a sense be able to rate limit it so to reduce the impact of normal business ops. Dale Ghent <daleg@elemental.org> How to measure or quantify what entails "rate limiting" or "prioritization' is up for discussion. For example it could mean that blocks are resilvered only when there are no pending IO requests, or if IO requests reach a certain high watermark, the resilvering thread throttles back to a default or predefined median.

Discuss this here: http://www.opensolaris.org/jive/thread.jspa?threadID=45171

11/14/2007
Add Item Add Description Add Requester Add Notes Add Date

Other File Systems

Item Description Requester Notes Date
more filesystem support XFS, Raiser, ext3, JFS, NTFS (full ACL/NTpersimissions support) "Brian Gupta" <brian.gupta@gmail.com> 5/10/07
lash friendly filesystems quahfs and jffs2 (+ future jffs3) "Brian Gupta" <brian.gupta@gmail.com> 5/10/07
Rewrite/refactor SAM/QFS to simplify ease of use/installation and management "Brian Gupta" <brian.gupta@gmail.com> 5/10/07
Support more cluster file systems "Brian Gupta" <brian.gupta@gmail.com> 5/10/07
dynamically migrate data between NFS servers pNFS along with integration with with ZFS or availability suite to dynamically migrate data between NFS servers (for load balancing or NFS server retirement) with no client outage "Mike Gerdts" <mgerdts@gmail.com> I haven't read the pNFS drafts yet, so perhaps my expectations are a bit high. 5/10/07
Block level de-duplication Maintain a database of checksum to block mappings. If checksums match, do full contents comparison. If the blocks are the same store multiple references to the same block. This should work across all file systems in a pool. If a large number of home directories (each on their on FS) each have their own copy of the same data ("copy ~cs500/dataset.tar.gz ~, then untar it") one copy should be stored. Dedup could happen on the fly or as a background re-write that uses available CPU and spindle time. "Mike Gerdts" <mgerdts@gmail.com> The RFE for this work has been filed. 5/10/07
Cluster filesystem support either with zfs or qfs. SR <sraja97@gmail.com> ZFS cluster file system support would definitely be neat.(matty91<@>gmail.com) 5/11/07
LVM/FS layers supported de-duplication "Brian Gupta" <brian.gupta@gmail.com> 5/14/07
Add Item Add Description Add Requester Add Notes Add Date

FMA Support

Item Description Requester Notes Date
FMA support for SMART, and a tool to view SMART data (it sounds like the FMA sensor project is going to add basic SMART support, but no generic tool to view SMART data). Matty <matty91@gmail.com> Have you looked at http://www.blastwave.org/packages.php/smartmontools? Not sure if that's anything like what you want to view the data, but it might be a start, at least to see if there's anything there worth seeing.

I like the default smartctl display, so maybe you could use that as a reference? If you haven't used this nifty utility before, you might be interested in the following article: http://prefetch.net/articles/diskdrives.smart.html

5/10/07
FMA hardened SAN and Ethernet drivers (since Ethernet is the most common interface type used with iSCSI, I thought I would add it). Matty <matty91@gmail.com> 5/10/07
Add Item Add Description Add Requester Add Notes Add Date

Other

Item Description Requester Notes Date
#java io-a-to-z.jre A prog like this could give one a clear graphical (or text) view of a system and the complete I/O insfrastrucure and performance throughout the system. HBA's, LUN's, Multiple paths, WWP names and phys paths ... "Louwtjie Burger" <zabermeister@gmail.com> I/O's per second generated, response times in ms from application to wait queue, onto service queue, throughput (MB/s), rise and fall of various buffers, etc.. Together with processes generating those I/O's and their impact on CPU...[Perhaps Ortera http://www.ortera.com/ might be close to your needs.]

Yes, I know using iostat, vmstat, mpstat and dtrace scripts can drill down and give one a clear indication of what is going on, and yes, writing ones own perl (kstat) application (which I did) can also help to give a "top-like" overview of I/O.

5/14/07
Better error messages in the iscsiadm and iscsitadm utilities. Matty <matty91@gmail.com> There are a number of bugs files to make the errors a bit more

readable.

5/10/07
Better out of the box performance for the iSCSI stack there are a couple bugs that deal with this, and it would be nice to get the workarounds incorporated into opensolaris). Matty <matty91@gmail.com> Yep. Doing this. 5/10/07
Ability to prioritize the system resources devoted to background scrubbing and defragment operations (assuming defragmentation support is added). Matty <matty91@gmail.com> 5/10/07
Better support of US/Firewire drives "Brian Gupta" <brian.gupta@gmail.com> (I still have to try this out, so don't flame me if I am out of date) 5/10/07
A standard OpenSolaris/open source SRM suite. (provisioning+) "Brian Gupta" <brian.gupta@gmail.com> 5/10/07
Support for holographic storage "Brian Gupta" <brian.gupta@gmail.com> (Just checking to see if you are still reading) 5/10/07
Scalable OpenSource backup software "Brian Gupta" <brian.gupta@gmail.com> 5/10/07
hardware-accelerated compression Bug the people that are working on hardware-accelerated crypto in Sun's chips to do the same for compression. Use these offload engines for file system compression so that the rest of the CPU is available for other workloads. Maybe this isn't really needed: perhaps just have the ability to prioritize compression by file system and use of processor sets for compression would be sufficient. "Mike Gerdts" <mgerdts@gmail.com> (Today it looks like shared serves with zfs compression can have one workload dominate CPU usage in a way that ignores existing CPU resource controls.) 5/10/07
HBA virtualization (i.e., crossbow for HBAs). Matty <matty91@gmail.com> Have you looked at NPIV? We are looking at this for Solaris FibreChannel stack See: http://blogs.sun.com/AaronDailey/entry/npiv_and_solaris_fibrechannel

Different than say NPIV, you would also like to see data classification and prioritization here like cross-bow? In other words QOS and virtualization?

QOS and virtualized HBAs would be super useful, since you could tie them to Solaris zones and eventually Xen DomUs.

5/10/07
Integrate the disk failure heuristics from the CMU and google papers into a disk / IO diagnosis engine. Matty <matty91@gmail.com> The google paper on disk I/O:http://labs.google.com/papers/disk_failures.pdf

Here is a link to the CMU paper:http://www.cs.cmu.edu/~bianca/fast07.pdf

5/10/07
eSATA port multiplier support Add driver support for port multipliers in eSATA cards, to build really cheap storage appliances Pablo Méndez Hernández <pablomh@gmail.com> 11/16/2007
Device Mapper Generic Device Mapper Facility similar (compatible ?) with Linux dm Cyril Plisko <cyril.plisko@mountall.com> The Linux device mapper facility http://wikipedia.org/wiki/Device_mapper

This project can be found at http://opensolaris.org/os/project/devmapper

11/16/2007
Add MTD (memory technology device) and FTL (flash transition layer) support It would be great to have support for these devices in OpenSolaris Igor Trindade Olibvira/ Jeff Cheeney Add Notes 2008-Feb-5
Add Item Add Description Add Requester Add Notes Add Date

Desired F/OSS Packages

Item Description Requester Notes Date
Zetaback ZFS backup and recovery management system

OmniTI Computer Consulting, Inc.

Add Notes 16:24, 11 June 2008 (PDT)
SnapBack SnapBack: The joys of backing up MySQL with ZFS... [1] 16:24, 11 June 2008 (PDT)
Openproj Add Description mgerdrs Cheeney 19:03, 23 June 2008 (PDT)
rsync rsync Bob Friesenhahn & hakanson rsync compiled in 64bit mode and updated to 3.0.x Cheeney 19:03, 23 June 2008 (PDT)
Nagios Nagios is an enterprise-class monitoring solutions for hosts, services, and networks released under an Open Source license. mgerdrs Cheeney 19:03, 23 June 2008 (PDT)
OpenNMS OpenNMS is the world's first enterprise grade network management platform developed under the open source model. mgerdrs Cheeney 19:03, 23 June 2008 (PDT)
Cacti Cacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. mgerdrs Cheeney 19:03, 23 June 2008 (PDT)
lsof Lsof is a Unix-specific diagnostic tool. Its name stands for LiSt Open Files, and it does just that. It lists information about any files that are open by processes currently running on the system. It can also list communications open by each process. mgerdrs the kernel needs to be enhanced to provide a stable interface

to global and non-global zones, lsof needs to be enhanced to use that data.

Cheeney 19:03, 23 June 2008 (PDT)
Add Item Add Description Add Requester Add Notes Add Date