Linode vps磁盘速度实测

事实证明Linode无愧于众多业界人士对其的推崇,今天实测了一下其磁盘速度真的不俗:

[root@li229-25 ~]# hdparm -tT /dev/xvda

/dev/xvda:
 Timing cached reads:   25536 MB in  1.99 seconds = 12843.60 MB/sec
 Timing buffered disk reads:  340 MB in  3.00 seconds = 113.20 MB/sec

[root@li229-25 ~]# dd if=/dev/xvda of=/root/dump bs=1024k count=1000
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 18.9223 seconds, 55.4 MB/s

以上为Linode vps的成绩,dd的速度为55MB/s

一下为笔者的台式机电脑,使用普通的西数硬盘

[root@rh2 ~]# cat /proc/scsi/scsi 
Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: ATA      Model: WDC WD3200AAJS-0 Rev: 01.0
  Type:   Direct-Access                    ANSI SCSI revision: 05

[root@rh2 ~]# hdparm -Tt /dev/sda

/dev/sda:
 Timing cached reads:   9132 MB in  2.00 seconds = 4569.83 MB/sec
 Timing buffered disk reads:  306 MB in  3.01 seconds = 101.72 MB/sec
[root@rh2 ~]# dd if=/dev/sda of=/root/dump bs=1024k count=1000  
1000+0 records in
1000+0 records out
1048576000 bytes (1.0 GB) copied, 22.6009 seconds, 46.4 MB/s

Linode虚拟服务器的磁盘速度略优于普通pc的磁盘速度,作为vps性能还是不错的;
如果作为网页服务器的话,结合memcached等缓存技术,一般来说IO性能不会成为主要瓶颈。

Google DataWiki如何区别于FluidDB

谷歌公司最近在其Google Lab上启动了数据维基(DataWiki)的项目。据谷歌官方称DataWiki将会是”一种数据结构化的维基”。根据其页面介绍,该项目理念来自于2010年海地地震期间发展起来的人物搜索(Person Finder)应用。谷歌开发者看到了创建结构化数据共享系统的急切需求。

该项目乍听起来与FluidDB十分相似,FluidDB常被形容为”一种被托管的维基核心数据库”,FluidDB的Nicholas H.Tollervey很愿意为大家解释这2个项目有何种不同。

DataWiki是用来快速构建简单且特定用途的数据库-例如Person Finder。而FluidDB则试图构建大型数据库所需要的一切。

就Tollervey提出的,这2个项目间的存在主要差别有:

  • 结构:DataWiki的每一页都将遵循某种预定义的结构。而FluidDB则不会将某种模式强加给用户,并且事物总是以对象的形式表达出来而非列表”。
  • 审核:DataWiki似乎不准备提供任何访问控制机制。FluidDB有一个权限系统以控制那些用户有权去使用特定的标签或命名空间。
  • 搜索:我们只能搜索特定的DataWiki页面。而在FluidDB中,我们可以在权限允许的情况下跨越数据集地搜索数据。

想了解更多关于FluidDB的消息可以阅读<FluidDB in a Nutshell>:

Script:Monitoring Memory and Swap Usage to Avoid A Solaris Hang

Applies to:

Solaris SPARC Operating System – Version: 8.0 and later   [Release: 8.0 and later ]
Solaris x64/x86 Operating System – Version: 8 6/00 U1 and later    [Release: 8.0 and later]
Oracle Solaris Express – Version: 2010.11 and later    [Release: 11.0 and later]
Information in this document applies to any platform.

Goal

Shortage of memory and virtual swap can result in slow system performance, hang, failure to start new process (fork failure), cluster timeout and thus unplanned outage. It is critical for system availability to monitor resource usage.

Solution

Physical Memory Shortages

Memory shortages can be caused by excessive kernel or application memory allocation and leaks. During memory shortages, the page daemon wakes up and starts scanning and stealing pages to bring the freemem, kernel global variable, value over the lotsfree kernel threshold. Systems with memory shortages slow down because memory pages may have to be read from the swap disk in order for processes to continue executing.

High kernel memory allocation can be monitored by using mdb’s memstat command. It reports kernel, application and file system memory usage:

# echo "::memstat"|mdb -kPage Summary       Pages      MB    %Tot
————    ———– ——  —-
Kernel            18330      143     7% < Kernel Memory
ZFS File Data         4        0     0% < ZFS cache (see below)
Anon              36405      284    14% < Application memory: heap, stack, COW
Exec and libs      1747       13     1% < Application libraries
Page cache         3482       27     1% < File system cache
Free (cachelist)   3241       25     1% < Free memory with vnode info.intact
Free (freelist)  195422     1526    76% < Free memory

Total            258627     2020
Physical         254812     1990

 

If system is running ZFS, then ZFS cache will also be listed. ZFS uses kernel memory to cache filesystem blocks. You can monitor ZFS cache memory usage using:

# kstat -n arcstats

kstat reports kernel memory usage in pages [8k(sparc), 4k(intel)]. It also reports memory in use by kernel and pages locked by applications.

# kstat -n system_pagesmodule: unix instance: 0
name: system_pages class: pages

freemem         8337355 < available free memory
..
lotsfree         257271 < Paging starts when freemem drops below lotsfree
minfree           64317 < swapping will start if freemem drops below minfree
pageslocked     4424860 < pages locked excluding pp_kernel (kernel pages)
pagestotal     16465378 < total pages configured>
physmem        16487075 < total pages usable by solaris
pp_kernel       4740398 < memory allocated in kernel

kmstat reports memory usage in kernel slab caches. These caches are used by various kernel subsystem and drivers for allocating memory.

# echo "::kmastat"|mdb -kcache                    buf     buf     buf       memory   alloc      alloc
name                     size    in use  total     in use   succeed    fail
———————-  ——   ——  ——    ——   ———  —–
..
kmem_slab_cache            56     2455     2465    139264       2571     0
kmem_bufctl_cache          24     5463     5763    139264       6400     0
kmem_bufctl_audit_cache   128        0        0         0          0     0
kmem_va_8192             8192       74       96    786432         74     0
kmem_va_16384           16384        2       16    262144          2     0
kmem_va_24576           24576        5       10    262144          5     0
kmem_va_32768           32768        1        8    262144          1     0
kmem_va_40960           40960        0        0         0          0     0
kmem_va_49152           49152        0        0         0          0     0
kmem_va_57344           57344        0        0         0          0     0
kmem_va_65536           65536        0        0         0          0     0
kmem_alloc_8                8    97210    98649    794624    3884007     0
kmem_alloc_16              16    29932    30988    499712    9786629     0
kmem_alloc_24              24    43651    44409  1073152    69596060     0
kmem_alloc_32              32    11512    12954    417792   71088529     0

To isolate issues with high kernel memory allocation and leak, one needs to turn ON kernel memory auditing by setting a tunable below in /etc/system file and reboot:

set kmem_flags=0x1

Continue to run kmastat on a regular basis and monitor the growth of kernel caches. Force a system panic when kernel memory allocation reaches an alarming level. Send the kernel core dump located in /var/crash directory to oracle support for analysis:

To monitor application memory usage consider using:

$prstat -s rss -can 100$ps -eo ‘addr zone user s pri pid ppid pcpu pmem vsz rss stime time nlwp psr args’

To see which memory segment in the process has high memory allocation:

$pmap -xs <pid>

Continued growth in application memory usage is a sign of a memory leak. You may request the application vendor to provide you tools or consider linking to libumem(3LIB) that offers a rich set of debugging facilities. See article on how to use it. You can monitor application malloc() using DTrace scripts.

Process allocation (via malloc()) requested size distribution plot:

dtrace -n 'pid$target::malloc:entry { @ = quantize(arg0); }' -p PID

Process allocation (via malloc()) by user stack trace and total requested size:

dtrace -n 'pid$target::malloc:entry { @[ustack()] = sum(arg0); }’ -p PID

 

Virtual Memory Shortages:

Processes use virtual memory. A process’ virtual address space is made up of a number of memory segments: text, data, stack, heap, cow segments. When a process accesses the virtual address, it results in a page fault that brings the data into physical memory. The faulted virtual address is then mapped to physical memory. All pages reside in the memory segment and have backing store where the pages within the segment can be migrated during memory shortages. Text/data segments are backed by executable file on the file system. Stack, heap, COW (copy-on-write) and shared memory pages are anonymous (Anon) pages and they are backed up by virtual swap.

ISM segment does not require swap reservations considering all pages are locked in memory by kernel and are not candidate for swapping.

DISM requires swap reservation considering memory can be locked and unlocked by the process.

When process use DISM it selectively increases the size of SGA by locking the ranges. Failure to lock the DISM region and continue using it as SGA for DB block caching may result in slow Oracle DB performance because accessing these pages result in page fault and that will slow down the oracle. See Doc: 1018855.1

When a process starts touching pages then anon structures are allocated, there is no physical disk swap allocated. Swap allocation in Solaris only happens when memory is short and pages need to be migrated to the swap device to keep up with workload memory demand. That is the reason, “swap -l” that reports physical disk swap allocation shows same value in “block” and “free” columns during normal conditions.

Solaris can run without physical disk swap and that is due to swapfs abstraction that acts as if there is a real swap space backing up the page. Solaris works with virtual swap and it is composed of physical memory and physical disk swap. When there is no physical disk swap configured, swap reservation happens against physical memory. Swap reservation against memory has a draw back and that is the system cannot do malloc() bigger than the physical memory configured. Advantage of running without physical disk swap is that the malicious program unable to do huge mallocs and thus cannot cause the system to crawl due to memory shortages.

Virtual swap = Physical memory + Physical Disk swap
Available virtual swap is reported by:

  • vmstat: swap
  • swap -s

Disk back swap is reported by:

  • swap -l


Per process virtual swap reservation can be displayed:

 

  •  pmap -S <pid>

prstat can provide virtual memory usage (SIZE) of the process, however it contains all virtual memory used by all memory segment not just anon memory:

  • prstat -s size -can 100 15″
  • prstat -s size -can -p <pidlist> 100 15

You can dump the process address space showing all segment using:

  • pmap -xs <pid>

 

When a process calls malloc()/sbrk() only virtual swap is reserved. Reservation is done against the physical disk swap first. If that is exhausted or not configured then reservation is done against physical memory. If both are exhausted then malloc() fails. To make sure malloc() won’t fail due to lack of virtual swap configure large physical disk swap in the form of disk or file. You can monitor swap reservation via “swap -s” and “vmstat:swap”, as described above

On a system with plenty of memory, “swap -l” reports the same value for “block” and “free” column

“swap -l” reporting a large value in “free” does not mean that there is plenty of virtual swap available and thus malloc will not fail because “swap -l” does not provide information about virtual swap usage, it only provides information about physical disk swap allocation. It is “swap -s” and “vmstat:swap” that reports information about how much virtual swap available for reservation.

Script to monitor memory usage:

#!/bin/ksh

# Script monitors kernel and application memory usage

PATH=/bin:/usr/bin:/usr/sbin; export PATH
trap “killall” HUP INT QUIT KILL TERM USR1 USR2
killall()
{
for PID in $PIDLIST
do
kill -9 $PID 2>/dev/null
done
exit
}

restart()
{
for PID in $PIDLIST
do
kill -9 $PID 2>/dev/null
done
}

DIR=DATA.`date +%Y%m%d-%T`
TS=`date +%Y%m%d-%T`

mkdir $DIR
cd $DIR

while true
do
TS=`date +%Y%m%d-%T`
echo $TS >> mem.out
echo “output of ::memstat” >> mem.out
echo ::memstat|mdb -k >> mem.out
echo “output of kstat -n ZFS ARC memory usage” >> mem.out
kstat -n arcstats >> mem.out
echo “output of ::kmastat” >>mem.out
echo “::kmastat”|mdb -k >> mem.out
echo “output of swap -s and swap -l” >>mem.out
echo “swap -s” >>mem.out
swap -s >>mem.out
echo “swap -l” >>mem.out
swap -l >>mem.out
echo “output of ps” >>mem.out
/usr/bin/ps -eo ‘addr zone user s pri pid ppid pcpu pmem vsz rss stime time nlwp psr args’ >>mem.out
#
# start vmstat, mpstat and prstat in the background
#
PIDLIST=””
echo $TS >>vmstat.out
vmstat 5 >> vmstat.out &
PIDLIST=”$PIDLIST $!”
echo $TS >>mpstat.out
mpstat 5 >> mpstat.out &
PIDLIST=”$PIDLIST $!”
echo $TS >>prstat.out
prstat -s rss -can 100 >>prstat.out &
PIDLIST=”$PIDLIST $!”

sleep 600 # every 10 minutes

restart
done

《让子弹飞》:三个老男人一台戏

晚上陪女友去看了《子弹》,算得上是近几年来最值得一看的国产大片。三个老男人一台戏,这部戏里三个男人的角色还都算比较本色。值得一提的还是葛优的“买官”县长,起到了很好地润滑整部剧的效果。建议还没进电影院的朋友有空去看一看,笑一笑。

[Repaste]The Underlying Technology of Facebook Messages

Facebook engineers have a new post on note portal as below:

We’re launching a new version of Messages today that combines chat, SMS, email, and Messages into a real-time conversation. The product team spent the last year building out a robust, scalable infrastructure. As we launch the product, we wanted to share some details about the technology.

The current Messages infrastructure handles over 350 million users sending over 15 billion person-to-person messages per month. Our chat service supports over 300 million users who send over 120 billion messages per month. By monitoring usage, two general data patterns emerged:

  1. A short set of temporal data that tends to be volatile
  2. An ever-growing set of data that rarely gets accessed

When we started investigating a replacement for the existing Messages infrastructure, we wanted to take an objective approach to storage for these two usage patterns. In 2008 we open-sourced Cassandra, an eventual-consistency key-value store that was already in production serving traffic for Inbox Search. Our Operations and Databases teams have extensive knowledge in managing and running MySQL, so switching off of either technology was a serious concern. We either had to move away from our investment in Cassandra or train our Operations teams to support a new, large system.

We spent a few weeks setting up a test framework to evaluate clusters of MySQL, Apache Cassandra, Apache HBase, and a couple of other systems. We ultimately chose HBase. MySQL proved to not handle the long tail of data well; as indexes and data sets grew large, performance suffered. We found Cassandra’s eventual consistency model to be a difficult pattern to reconcile for our new Messages infrastructure.

HBase comes with very good scalability and performance for this workload and a simpler consistency model than Cassandra. While we’ve done a lot of work on HBase itself over the past year, when we started we also found it to be the most feature rich in terms of our requirements (auto load balancing and failover, compression support, multiple shards per server, etc.). HDFS, the underlying filesystem used by HBase, provides several nice features such as replication, end-to-end checksums, and automatic rebalancing. Additionally, our technical teams already had a lot of development and operational expertise in HDFS from data processing with Hadoop. Since we started working on HBase, we’ve been focused on committing our changes back to HBase itself and working closely with the community. The open source release of HBase is what we’re running today.

Since Messages accepts data from many sources such as email and SMS, we decided to write an application server from scratch instead of using our generic Web infrastructure to handle all decision making for a user’s messages. It interfaces with a large number of other services: we store attachments in Haystack, wrote a user discovery service on top of Apache ZooKeeper, and talk to other infrastructure services for email account verification, friend relationships, privacy decisions, and delivery decisions (for example, should a message be sent over chat or SMS). We spent a lot of time making sure each of these services are reliable, robust, and performant enough to handle a real-time messaging system.

The new Messages will launch over 20 new infrastructure services to ensure you have a great product experience. We hope you enjoy using it.

Kannan is a software engineer at Facebook.

Facebook选择了使用Hbase来替代MYSQL或者Cassandra驱动他们目前的Messages应用;除去后来加上的一大堆应用,不知道扎克伯格当年自己写的代码还有多少在被使用:).

利用pagespeed插件优化网站css层叠样式文件

“不务正业”的google最近发布了pagespeed插件和apache 2专有的mod_pagespeed页面优化模块;pagespeed插件目前仅有firefox版的,该插件要求预安装有Firebug页面debugger插件,你可以通过Tools->Add-ons->Get Add-ons菜单添加Firebug插件,之后登陆pagespeed在code.google.com的官方页面安装pagespeed插件。

pagespeed插件的使用十分简单,只要在打开你希望优化的页面后,点选Firefox工具栏上的Tools->FireBug->Open Firebug in New Window选项;如我在我的www.askmac.cn页面上打开Open Firebug in New Window就会出现以窗口:
[Read more…]

关于本博客的feed订阅

有网友反映说订阅的feed显示有问题,实际我使用firefox或opera等浏览器测试是可以正常显示的:
[Read more…]

如何使用MOS风格的代码背景?

很多使用wordpress的技术博客主都喜欢用一些HighLight Syntax的高亮语法插件,让文章中的代码段显得比较醒目和清晰;大约1个月前我也是HightLight Syntax插件众多拥垒中的一员。但今天我要说高亮插件的成本还是太高了,以我的blog为例(之前的www.askmac.cn),highlight syntax插件包含的多个语法JavaScript脚本导致单个页面的载入需要交互多出大约60-70k的数据,这一因素直接影响了网站打开的速度(往往一个只有几十字的页面打开也需要3秒左右)。实际上绝大多数技术博客主仅会用到这些高亮语法插件中的部分语法JavaScript脚本,好比我一般只用Java,SQL这2中语言代码,而一旦启用了插件,它就会一股脑地把C#,C++,Perl,Shell一大家子的语法脚本在页面上调用;你当然会说这些脚本会在首次加载后被浏览器缓存,但如果所有的用户都仅仅浏览一页呢?

为了优化页面,我考虑到了使用和MOS(也就是Metalink)一致的代码显示风格,如果你经常和我一样去那里看文档的话,想必十分熟悉这种代码显示风格了:

MOS style code sample

为了实现这种代码显示风格,我们需要手动修改您当前使用的主题(theme)的style.css层叠文件,因为代码高亮插件引用的方式一般为”<pre class=brush:codetype>CONTENT</pre>”,所以我们只需要修改pre的属性,即可以完美修改现有文章的代码显示风格,而无需再去文章中修改。

一般我们直接到wp-content/themes/%themename%目录下即可找到主题相关的style.css文件,其默认的pre标记属性为:

pre {
        font-family:'Courier New', Courier, Monospace, Fixed;
	line-height: normal;
        overflow: auto;
	padding-bottom: 25px;
	margin: 0px;

	background-image:url('images/bg_pre_dots.png');
	background-repeat: repeat-x;
	background-position: bottom left;
}

我们需要将pre标记的默认属性修改为:

pre {
        font-family:"Courier New",Courier,monospace;
        background-color:#EEF3F7;
        overflow:auto;
        border-width:1px;
        border-style:solid;
        border-color:#C4D1E6;
        padding:0.5em;
        margin:0px;margin-top:5px;        

}

如果你在wordpress中使用了Super-cache插件则需要在后台删除cache页面,一般来说再次刷新页面就可以看到和我这里一样的代码显示风格了。

生病了。。。

中秋节那天陪妈妈去麦德龙采购,出门时候已经觉得有些天秋了,还说回来先要把长袖衣服找出来。

结果麦德龙卖场里卖蔬菜和肉类的地方空调打得特别冷,导致回来就生病了,伤风咳嗽一下好不了。

今天在家养病。。。 并歇笔几天看看闲书!

IXwebhosting suck me!!

很久之前就想写这样一篇文章了,购买IXwebhosting的虚拟空间服务是从09年的8月份开始(很长的一个熟悉过程)。

但是今天我要说IXwebhosting糟透了,糟到连基本的可用性都无法达到。从这个月初(10年的9月份)到23号,我的站点已经大大小小经历了七八次的短期无法访问,这绝不是因为网络抽风所致,我一直有用pingdom的网站监测服务,在我正式启用这一服务监测我的网站后,我几乎每天要收到1-2封关于site down的警告,有些会是在凌晨时分。我想继Sep 9的那次cp9服务器大规模outage后,有很多正在使用IXwebhosting的owner都彻底对IX失望了。
[Read more…]

沪ICP备14014813号-2

沪公网安备 31010802001379号