Saturday, May 12, 2012

Partitioning of Zabbix Database (PostgreSQL)

 We are using highloaded Zabbix database (currently PostgreSQL 9.1). The database contains tables with approximately 1700 million (2 billion) rows. The common technique to optimize bulk INSERTs and DELETEs with tables like this is to use Table Partitioning. I managed to do this by this guidelines (in Russian): http://www.zabbix.com/wiki/non-english/ru/partitioning_in_postgresql
 Although the manual is extensive, during the production usage some problem have arose. 

 First, it was a problem with SELECT queries to partitioned tables that was fixed by a separate patch before Zabbix v.1.8.13. As of this version the patch is no more required. It brings a significant performance gain. 

 Secondly, it's about the fact that procedures in the manual above have an unpleasant effect. The data of first INSERT query that triggers a partition creation are always lost. I didn't have any desire to rewrite the proposed SQL, instead I implemented the obvious fix. To make INSERTs into history* tables beforehand. This also helps to solve the problem with table load. Partitioning procedures put a lock on tables and this can easily results in Zabbix malfunction. Planing will help you to avoid being awaken at 0.00 when Partitioning usually starts.
Example:
INSERT INTO history_uint (itemid, clock, value) VALUES ('1', extract( epoch FROM date_trunc('hour', now() + interval '12 hour')), '1')";

  The third fix is related to conditions that are wrongly used in the recipe above. I just put the corrected versions here. Corresponding lines are in the check_condition definition and I made them bold.


  Hope, this notes will help someone.

CREATE OR REPLACE FUNCTION "public"."partition_every_day" (in parentoid oid, in scheme varchar, in clock int4) RETURNS text AS
$BODY$
declare
  parent text := parentoid::regclass;
  suffix text := to_char (to_timestamp(clock), '_YYYY_MM_DD');
  child  text := scheme || (select relname from pg_class where oid  = parentoid) || suffix;
  check_beg varchar;
  check_end varchar;
  check_condition varchar;
  check_field     varchar := null;
  tmp record;
  script text := '';
  i int := 0;
  j int := 0;
begin
  
  perform child::regclass;
  return child;
exception
  when undefined_table then
    
    check_beg = extract(epoch FROM date_trunc('day', to_timestamp(clock)));
    check_end =  extract(epoch FROM date_trunc('day', to_timestamp(clock) + interval '1 day'));
 
    
    j = (select count(*) from pg_attribute where attrelid = parentoid  and attnum >0);
    for tmp in
    select attname from pg_attribute where attrelid = parentoid and attnum >0 order by attnum loop
      i = i + 1;
      script = script || 'NEW.' || tmp.attname || case i when j then '' else ',' end;
      
      if (col_description (parentoid, i) ~* 'partition') and (check_field is null)
        then check_field = tmp.attname; end if;
    end loop;
    script = script || ')';
 
    
    check_condition =
    '(
       ' || check_field || ' >= ' || quote_literal (check_beg) || ' and
       ' || check_field || ' < ' || quote_literal (check_end) || '
     )';
 
    
    execute
    'create table ' || child || '
    (
     constraint partition' || suffix || ' check '
      || check_condition || '
    )
    inherits (' || parent || ')';
 
    
    execute
    'create rule route' || suffix || ' as '
      || ' on insert to ' || parent || ' where '
      || check_condition
      || ' do instead insert into ' || child
      || ' values (' || script;
 
    
    perform copy_constraints(parent, child);
    perform copy_indexes(parent, child);
    execute 'GRANT SELECT ON ' || child || ' TO some_other_user';
    execute 'GRANT ALL ON ' || child || ' TO zabbix';
    
  return child;
end;
$BODY$
LANGUAGE 'plpgsql'

-----


CREATE OR REPLACE FUNCTION "public"."partition_every_month" (in parentoid oid, in scheme varchar, in clock int4) RETURNS text AS
$BODY$
declare
  parent text := parentoid::regclass;
  suffix text := to_char (to_timestamp(clock), '_YYYY_MM');
  child  text := scheme || (select relname from pg_class where oid  = parentoid) || suffix;
  check_beg varchar;
  check_end varchar;
  check_condition varchar;
  check_field     varchar := null;
  tmp record;
  script text := '';
  i int := 0;
  j int := 0;
begin
  
  perform child::regclass;
  return child;
exception
  when undefined_table then
    
    check_beg = extract(epoch FROM date_trunc('month', to_timestamp(clock)));
    check_end =  extract(epoch FROM date_trunc('month', to_timestamp(clock) + interval '1 month'));
 
    
    j = (select count(*) from pg_attribute where attrelid = parentoid  and attnum >0);
    for tmp in
    select attname from pg_attribute where attrelid = parentoid and attnum >0 order by attnum loop
      i = i + 1;
      script = script || 'NEW.' || tmp.attname || case i when j then '' else ',' end;
      
      if (col_description (parentoid, i) ~* 'partition') and (check_field is null)
        then check_field = tmp.attname; end if;
    end loop;
    script = script || ')';
 
    
    check_condition =
    '(
       ' || check_field || ' >= ' || quote_literal (check_beg) || ' and
       ' || check_field || ' < ' || quote_literal (check_end) || '
     )';
 
    
    execute
    'create table ' || child || '
    (
     constraint partition' || suffix || ' check '
      || check_condition || '
    )
    inherits (' || parent || ')';
 
    
    execute
    'create rule route' || suffix || ' as '
      || ' on insert to ' || parent || ' where '
      || check_condition
      || ' do instead insert into ' || child
      || ' values (' || script;
 
    
    perform copy_constraints(parent, child);
    perform copy_indexes(parent, child);
    execute 'GRANT SELECT ON ' || child || ' TO xxx';
    execute 'GRANT ALL ON ' || child || ' TO zabbix';

    
  return child;
end;
$BODY$
LANGUAGE 'plpgsql'

Friday, May 11, 2012

Linux ACLs vs Solaris ACLs

 Today I studied the difference between Access Control List implementation in Linux and Solaris. Generally the latest specification of ACL is contained in the long suffered POSIX 1003.1e. The man page for acl is dated by 2002. Not so fresh indeed! To preserve a compatibility with traditional rwx approach, an IEEE crowd jumped over it's head and invented something unusable. Linux community implemented that in even more unusable manner.

 Let's look at this crap. Here is [Linux] manual cite:

An ACL entry contains an entry tag type, an optional entry tag qualifier, and a set of permissions. ... The qualifier denotes the identifier of a user or a group, for entries with tag types of ACL_USER or ACL_GROUP, respectively. Entries with tag types other than ACL_USER or ACL_GROUP have no defined qualifiers.


 This qualifier-with-no-identifier thing means that you write u:jack or g:staff to denote user jack and group staff. It's intuitive.

 Then we have an obscure permission inheritance behavior. Look at "OBJECT CREATION AND DEFAULT ACLs" section of acl manual:


1. The new object inherits the default ACL of the containing directory as its access ACL.

Ok, that is fine.


2. The access ACL entries corresponding to the file permission bits are modified so that they contain no permissions that are not contained in the permissions specified by the mode parameter.

Huh?! Thus only specified permissions remains or what?

Other parts of Linux ACL manual made me crazy too. Correspondence between traditional model and new is non-linear, has conditions and behavior modifiers. It depends on Effective User ID, special ACL mask or presence of other bits. In addition this permissions are managed with special utilities setfacl and getfactl. Standard chmod, ls and other tools are not aware of ACLs!

Now let's look at the Solaris variant. Refer to "Solaris ZFS Administration Guide". The give up with 10 year old standard and armed Solaris with NFSv4 ACL model, which is closer to the NT-style ACLs. This eventually makes me think, that Microsoft engineers invented something that is still lacking in UNIX.


# chmod A+user:gozer:read_data/execute:allow test.dir
# ls -dv test.dir
drwxr-xr-x+  2 root     root           2 Aug 31 12:02 test.dir
     0:user:gozer:list_directory/read_data/execute:allow
     1:owner@::deny
     2:owner@:list_directory/read_data/add_file/write_data/add_subdirectory
         /append_data/write_xattr/execute/write_attributes/write_acl
         /write_owner:allow
     3:group@:add_file/write_data/add_subdirectory/append_data:deny
     4:group@:list_directory/read_data/execute:allow
     5:everyone@:add_file/write_data/add_subdirectory/append_data/write_xattr
         /write_attributes/write_acl/write_owner:deny
     6:everyone@:list_directory/read_data/read_xattr/execute/read_attributes
         /read_acl/synchronize:allow





Just use ls -v and you already know, who may what. For ACL managment just use chmod and other tools with additional parameters.

Why there is no NFSv4 ACLs on Linux? I found that SGI made patches in 2008. See excellent presentation
Unfortunately SGI died.